24 days of Hackage, 2015: day 4: wreq: Web client programming; with notes on lens and operator syntax

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 4

(Reddit discussion)

In the late 1990s, I eagerly bought the book “Web Client Programming with Perl” and used the LWP library to scrape the Web in automated fashion. I continued doing that into the 2000s. I am happy that nowadays, I can just use Haskell to do this kind of programming, in a succinct way also.

Today’s topic is wreq, Bryan O’Sullivan’s high-level library for doing Web client programming designed specifically for usability.

wreq makes use of the aeson ecosystem for JSON and lens and ecosystem, including lens-aeson, so you may want to check out Ollie’s 2012 Days of Hackage posts on aeson and lens.

Since wreq already has an extensive tutorial and reference documentation, I’m not going to repeat its explanations. Instead, I’m going to give an example of use that should be simple enough to be understood from context, then discuss the issue of using operator syntax in Haskell.

The task

I’m a member of many groups on Meetup. It’s often useful for me to get information using the official Meetup API rather than go around clicking on a Web site on or a mobile app. Why do by hand what I can do much more efficiently and correctly with code?

Here’s a very simplified example of something I might want to do with Meetup. I’ve been active in the Pittsburgh Code and Supply community, which has a Meetup site with a packed calendar of events (it’s on hiatus now in December for the holidays, but is otherwise very active). Maybe I want to find out what upcoming events they are, and search for events of interest according to some criteria. For our toy example here, let’s say I want to find the ten upcoming events and get their names and venue names, and make sure there’s at least one event that has a name and venue name already set up (sometimes, an event is proposed but no venue has been found yet).

A test

Yesterday, day 3 of this article series, I mentioned liking using HSpec, so let’s use HSpec.

{-# LANGUAGE OverloadedStrings #-}

import WreqExample (GroupId, eventName, venueName, getMeetupEventInfos)
import Test.Hspec ( Spec, hspec, describe, it
                  , shouldSatisfy, shouldNotSatisfy
                  )
import qualified Data.Text as Text

We are using the text packed Unicode string type, because that’s what wreq uses. OverloadedStrings is a convenient GHC extension that allows string literals in code to be treated as Text values rather than String. Ollie discusses this extension in his 2014 Days of GHC Extensions.

Also, since I’m operating in test-driven development style, I wrote this test first, before writing the WreqExample module: I only wrote the imports for what I need for the test.

spec :: Spec
spec =
  describe "wreq" $ do
    it "there are named, located Pittsburgh Code and Supply events coming up" $ do
      -- Warning! This is a stateful test going out to the Web.
      events <- getMeetupEventInfos pittsburghCodeAndSupplyId
      events `shouldNotSatisfy` null
      events `shouldSatisfy` any
        (\event -> (not . Text.null . eventName) event
                   && (not . Text.null . venueName) event)

pittsburghCodeAndSupplyId :: GroupId
pittsburghCodeAndSupplyId = "13452572"

Module signatures

If Haskell had module signatures, like Standard ML and OCaml do, I would write an explicit module signature for the module I intend to implement that will conform to that signature, but Haskell doesn’t, so the best we can do is operate in “duck typing” manner at the module level, relying implicitly on compilation to fail on import of a conforming module implementation rather than on matching against an explicit signature without the need for an implementation.

Here are the types we need (in a pseudo-syntax as though Haskell had module signatures):

type GroupId    -- abstract

type EventInfo  -- abstract

-- abstract type accessors
eventName :: EventInfo -> Text
venueName :: EventInfo -> Text

getMeetupEventInfos :: GroupId -> IO [EventInfo]

Implementation

Imports

import Network.Wreq (Options, defaults, param, getWith, asValue, responseBody)
import Data.Text (Text)
import Data.Aeson (Value)
import Control.Lens (view, set, toListOf)
import Data.Aeson.Lens (key, _Array, _String)

Types

-- | Information that we care about from a Meetup event.
data EventInfo =
  EventInfo { eventName :: Text
            , venueName :: Text
            }
  deriving (Show)

-- | A valid Meetup group ID.
type GroupId = Text

The Web client part

Since we’re only making one request, and are not doing any error handling, but letting wreq throw exceptions instead, the Web client part is very brief. The Meetup API allows returning information as JSON.

meetupEventsUrl :: String
meetupEventsUrl = "https://api.meetup.com/2/events"

We perform a GET with query parameters. wreq uses lens as its domain-specific language for creating options for GET, so let’s create a wreq Options value, by setting the parameters one after another using a builder pattern starting with the wreq defaults:

eventsOptions :: GroupId
              -> Options
eventsOptions groupId =
  set (param "page") ["10"] (
    set (param "order") ["time"] (
      set (param "status") ["upcoming"] (
        set (param "group_id") [groupId] (
          set (param "format") ["json"] defaults))))

We begin by going out to the Web to get back a response, which is a lazy ByteString:

getMeetupEventInfos :: GroupId -> IO [EventInfo]
getMeetupEventInfos groupId = do
  response <- getWith (eventsOptions groupId) meetupEventsUrl

The JSON part

Then we parse the lazy ByteString response, including the headers and the body, into an untyped JSON object, an aeson Value:

  jsonResponse <- asValue response

More precisely, Value is unityped:

type Object = HashMap Text Value

type Array = Vector Value

data Value = Object !Object
           | Array !Array
           | String !Text
           | Number !Scientific
           | Bool !Bool
           | Null

The lens part

It was annoying figuring out from the official Meetup API site what fields I needed from the response and what their types were supposed to be. In practice I just saved off JSON from a representative query and looked at some events to see what I wanted. I was told where to find the automatically generated documentation of all the API methods but it was not ideal. A later Day of Hackage will discuss what I did about this problem.

We extract the list of events, using a traversal to get the whole list, which is encoded as a JSON array in the top level JSON object’s results field:

  let events = toListOf (responseBody
                         . key "results"
                         . _Array . traverse
                        ) jsonResponse

Here we use toListOf from lens with a traversal and a JSON object to pull out everything from that traversal.

Finally, since we only want, for each event, its name and its venue’s name (the venue’s name is actually a field in a venue object):

  return (map jsonToEventInfo events)

We again use lens, at the level of an individual event object, to extract what we want from it:

-- | Extract our typed data model from an untyped JSON object.
jsonToEventInfo :: Value -> EventInfo
jsonToEventInfo json =
  EventInfo { eventName = view (key "name" . _String) json
            , venueName = view (key "venue"
                                . key "name" . _String) json
            }

Here we use the view function of lens, to apply a lens to the JSON object to pull a field out of it.

And we’re done! We’ve written a script that looks pretty much like what you’d write in Perl or Python. It will also “fail” in similar ways, because we’re basically not using any types at all; even the final result just has strings, which may or may not be empty, whatever that’s supposed to mean. For example, if you try to find a field by a string key that doesn’t exist, the particular code here will just silently give back an empty string. Can we do better? Yes, there are various ways to do better. Stay tuned for a later Day of Hackage.

Lens operator syntax

If you’ve already used wreq or lens, you may have noticed something strange above: I didn’t use any lens operator syntax. This was deliberate. Although the wreq tutorial gives a little bit of background on lens, the reality is that when some friends who were not experienced lensers or Haskellers asked me how I do Web client programming in Haskell, and I pointed to wreq as being pretty cool, they got immediately stuck on the lens stuff. Looking back at the tutorial, I do see that it jumps straight into operator soup. This is unfortunate. You can immediately use libraries like wreq without having the lens operators memorized already. You have to understand some facts (such as the use of the function composition operator to compose lenses) and have an idea of how the types work out, but one thing you don’t need is the funny operators. I think it’s best to understand how to do things without operators before starting to use them as a convenient shortcut.

For example, an idiomatic way to set the options object, as presented in the “whirlwind tour” section of the wreq tutorial, is:

import Control.Lens ((&), (.~))

eventsOptions :: GroupId
              -> Options
eventsOptions groupId = defaults
  & param "format" .~ ["json"]
  & param "group_id" .~ [groupId]
  & param "status" .~ ["upcoming"]
  & param "order" .~ ["time"]
  & param "page" .~ ["10"]

I don’t like the idea of newcomers to this library just copying and pasting stuff without understanding what it does, or getting the impression that these operators are somehow built into the Haskell language or required for using the library. People really do get these impressions.

I happen to like the reverse function operator & a lot, although it’s not as suggestive as the exact same reverse function operator in many other languages (such as F#, OCaml, Elm, Elixir) in the form of a pipe instead |>, so I feel OK about using it.

But the .~ is I think not very suggestive to newcomers to lens. Is set lens newValue object so much worse to write or read than object & lens .~ newValue?

(Update of 2014-12-12) Thinking compositionally

One thing that is unfortunately lost if you use pipeline application operators such as & is the compositionality that underlies the power of lenses. So here is a refactoring of eventsOptions that shows how to best think of what we are doing, which is creating a “builder” and applying it:

eventsOptionsRefactored :: GroupId -> Options
eventsOptionsRefactored groupId = builder defaults
  where builder = eventsOptionsBuilder groupId

-- | Recall: type is sugar for GroupId -> (Options -> Options)
eventsOptionsBuilder :: GroupId -> Options -> Options
eventsOptionsBuilder groupId =
  set (param "page") ["10"]
  . set (param "order") ["time"]
  . set (param "status") ["upcoming"]
  . set (param "group_id") [groupId]
  . set (param "format") ["json"]

Note the separation of concerns here: instead of thinking of building an Options object as

we think of

Partial application in functional programming is used here to implement the builder pattern: eventsOptionsBuilder takes one argument, and returns an Options transformer of type Options -> Options.

Code golf?

To illustrate both the up sides and down sides of using operators (but in this case mostly down sides, I think), here is a code golf version of the entire code:

import Network.Wreq (Options, defaults, param, getWith, asValue, responseBody)
import Data.Text (Text)
import Control.Lens ((&), (.~), (^.), (^..))
import Data.Aeson.Lens (key, _Array, _String)
import Control.Arrow ((>>>), (&&&))

meetupEventsUrl :: String
meetupEventsUrl = "https://api.meetup.com/2/events"

-- | A valid Meetup group ID.
type GroupId = Text

-- | For searching for events in a Meetup group.
eventsOptions :: GroupId
              -> Options
eventsOptions groupId = defaults
  & param "format" .~ ["json"]
  & param "group_id" .~ [groupId]
  & param "status" .~ ["upcoming"]
  & param "order" .~ ["time"]
  & param "page" .~ ["10"]

-- | Code golf version. Don't do this?
getMeetupNameAndVenues :: GroupId -> IO [(Text, Text)]
getMeetupNameAndVenues groupId =
  getWith (eventsOptions groupId) meetupEventsUrl
  >>= asValue
  >>= ((^.. responseBody
        . key "results"
        . _Array . traverse)
       >>> map ((^. key "name" . _String)
                 &&& (^. key "venue"
                      . key "name" . _String)
                 )
       >>> return
      )

In a way, this looks cool because the piping left to right reads well and naturally, if you know all the operators and are happy with operator sectioning syntax and point-free combinators. But when I showed this to friends who are not so fluent in Haskell, they didn’t like this. Also, note that I made concessions in order to arrange this pipeline. I lost the comments, the intermediate named sub-computations (very useful for finer-grained testing), and even my custom result type (resorting to just tupling). I feel something has been lost by writing in this style even though part of me secretly likes it.

An interview with Bryan O’Sullivan

Recently (September 2015), The Haskell Cast interviewed Bryan O’Sullivan. I highly recommend listening to the whole thing. He had stories to tell about how he got into Haskell, how he ended up writing all these libraries, and how he goes about designing them and what his goals are when implementing them. Note that aeson and text, which everyone uses, are his creations. Thank you, Bryan, for all you’ve done for the Haskell community!

Lens resources

Gabriel Gonzalez wrote a lens tutorial that is useful. Thank you, Gabriel, for writing tutorials not only on your own libraries, but for others as well!

Conclusion

For day 4, I presented a tiny example of use of wreq with aeson and lens to perform a simple task of getting information from the Web, and tried to make wreq more accessible by not requiring use of lens operators up front.

All the code

All my code for my article series are at this GitHub repo.

comments powered by Disqus