24 days of Hackage, 2015: day 12: json-autotype: inferring types from data
Dec 12, 2015 · 6 minute read · CommentsHaskellHackageaesonjson-autotypeJSONinferencetype providersFsharp
Table of contents for the whole series
A table of contents is at the top of the article for day 1.
Day 12
Today we revisit a problem from day 4, in which we grabbed JSON off the Web in order to extract information about it. I mentioned then that we were using an untyped representation for JSON. Ideally, if it’s not too onerous, we want to use a typed representation for our data, to avoid common errors such as trying to access a nonexistent field. The Aeson library allows us to write our own data types and convert them from and to JSON. If we were in charge of the data and data format, then this would be the way to go.
But what if, for whatever reason, we do not already have a typed data model, but are consuming, for example, JSON from a source that has not given us a spec of the data types, maybe with JSON Schema? Then we have to reverse-engineer the data types, possibly from an informal text spec of the format (which is a real pain).
Or we could take a lazy way out, and infer plausible types from a
corpus of representative data. This is what the useful
json-autotype
package does (documentation on the GitHub repo page).
Video (update of 2016-01-08)
It just came to my attention that there is a video on json-autotype
that may be of interest:
Generating a module of types from JSON data
I saved off a sample JSON document,
pittsburgh-code-and-supply-events.json
from a query of the Meetup
events API.
The easiest way to use json-autotype
is to use its command line
tool, installing it globally first.
$ stack install json-autotype
For our quick example, I just manually generated Haskell source code (ideally we should integrate this process into a fully automated build):
$ json-autotype pittsburgh-code-and-supply-events.json -o
MeetupEventsJSON.hs
$ mkdir generated-src
$ mv MeetupEventsJSon.hs generated-src/
Note that although for this example, we only ran the inference on a single document, running it on more documents may give more precise types.
A quick look at the generated code
Here is a sample of what was generated (reformatted and edited for clarity):
{-# LANGUAGE TypeOperators #-}
import Data.Aeson.AutoType.Alternative ((:|:))
data TopLevel = TopLevel {
topLevelResults :: [ResultsElt],
topLevelMeta :: Meta
} deriving (Show,Eq,Generic)
data ResultsElt = ResultsElt {
resultsEltStatus :: Text,
resultsEltGroup :: Group,
resultsEltTime :: Int,
resultsEltWaitlistCount :: Int,
resultsEltVenue :: (Maybe (Venue:|:[(Maybe Value)])),
resultsEltCreated :: Int,
resultsEltUtcOffset :: Int,
resultsEltEventUrl :: Text,
resultsEltYesRsvpCount :: Int,
resultsEltHeadcount :: Int,
resultsEltFee :: (Maybe (Fee:|:[(Maybe Value)])),
resultsEltVisibility :: Text,
resultsEltMaybeRsvpCount :: Int,
resultsEltName :: Text,
resultsEltId :: Text,
resultsEltRsvpLimit :: (Maybe (Int:|:[(Maybe Value)])),
resultsEltUpdated :: Int,
resultsEltDuration :: (Maybe (Int:|:[(Maybe Value)])),
resultsEltDescription :: (Maybe (Text:|:[(Maybe Value)]))
} deriving (Show,Eq,Generic)
instance FromJSON ResultsElt where
parseJSON (Object v) = ResultsElt
<$> v .: "status"
<*> v .: "group"
<*> v .: "time"
<*> v .: "waitlist_count"
<*> v .:?? "venue"
<*> v .: "created"
<*> v .: "utc_offset"
<*> v .: "event_url"
<*> v .: "yes_rsvp_count"
<*> v .: "headcount"
<*> v .:?? "fee"
<*> v .: "visibility"
<*> v .: "maybe_rsvp_count"
<*> v .: "name"
<*> v .: "id"
<*> v .:?? "rsvp_limit"
<*> v .: "updated"
<*> v .:?? "duration"
<*> v .:?? "description"
parseJSON _ = mzero
data Venue = Venue {
venueRepinned :: Bool,
venueState :: Text,
venueCountry :: Text,
venueZip :: (Maybe (Text:|:[(Maybe Value)])),
venueLat :: Int,
venueName :: Text,
venueCity :: Text,
venueId :: Int,
venueLon :: Int,
venueAddress1 :: Text
} deriving (Show,Eq,Generic)
Isn’t this much better than inferring and writing the boilerplate manually? Now we can parse raw JSON directly into our set of types.
The funny :|:
operator is an Either
-like type constructor used to
deal with the fact that when a field is missing from some data
examples, we cannot infer whether it could potentially be a complex
object that we just don’t know about from our corpus of examples.
Using the generated types
Imports:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators #-}
module JSONAutoTypeExample where
-- Reuse what we did earlier.
import WreqExample (EventInfo(..), GroupId, getMeetupEventsJSONBytes)
-- Module automatically generated using json-autotype.
import qualified MeetupEventsJSON as Meetup
import qualified Data.Aeson as Aeson
import Control.Arrow ((>>>))
import Data.Aeson.AutoType.Alternative ((:|:)(AltLeft, AltRight))
import qualified Data.Text as Text
Our replacement for the old getMeetupEventInfos
:
getMeetupEventInfos :: GroupId -> IO (Either String [EventInfo])
getMeetupEventInfos groupId =
getMeetupEventsJSONBytes groupId
>>= (Aeson.eitherDecode
>>> fmap extractEventInfos
>>> return
)
Note already the differences from before. We are now using
Aeson.eitherDecode
to turn JSON bytes into a full-fledged
Meetup.TopLevel
typed data object, and therefore can detect up front
whether the JSON we got was valid (in the sense of, conforms to our
prior inferred types).
Extracting events:
extractEventInfos :: Meetup.TopLevel -> [EventInfo]
extractEventInfos =
Meetup.topLevelResults
>>> map extractEventInfo
This is also different from before, in that we are not just using
hardcoded string keys to hopefully find what we want in the JSON
result. We have a Meetup.TopLevel
object and just use ordinary typed
record access to dive into the data.
Extracting information from a single event:
extractEventInfo :: Meetup.ResultsElt -> EventInfo
extractEventInfo event =
EventInfo { eventName = Meetup.resultsEltName event
, venueName = extractVenueName (Meetup.resultsEltVenue event)
}
Again, we just use the typed record fields from Meetup.ResultsElt
,
instead of string keys into an object.
Finally, bottoming out as we try to get a venue name from the venue object of an event:
-- | Trickier because json-autotype apparently found events without a venue.
extractVenueName :: Maybe (Meetup.Venue :|: [Maybe Aeson.Value]) -> Text.Text
extractVenueName Nothing = ""
extractVenueName (Just (AltLeft venue)) = Meetup.venueName venue
extractVenueName (Just (AltRight jsonValues)) =
Text.pack ("(unexpected JSON venue: " ++ show jsonValues ++ ")")
Here we recognize that maybe there is no venue, or the venue
JSON
field is not actually a Venue
. In practice, recognizing this fact
might result in changing our EventInfo
type so that it doesn’t
assume a Text
as its venueName
, but for here, for now, we just
stay in the stringly-typed world and try to make to with return a
useful string in all cases.
But you see how having typed JSON can drive rethinking of our original
assumptions leading to our definition of EventInfo
and maybe improve
our design. The cool thing about json-autotype
is that you can make
use it without having to write the JSON types by hand.
Type providers
The F# language includes support for type providers, which are ways to get types from somewhere without having to write them yourself. This is a really cool feature I’d like to see put into more typed language ecosystems. For example, there is a library for type providers for Idris.
I’m not aware of how much work there has been on a type provider
ecosystem for Haskell, but I imagine that you could use Template
Haskell, for example, to automate some of what we did here with
json-autotype
, as a crude form of type provider.
Conclusion
json-autotype
is a nice library that helps in making sense of
JSON that comes your way but without an associated set of types
already written for you. It infers types and writes out a Haskell
module for you.
All the code
All my code for my article series are at this GitHub repo.