# 24 days of Hackage, 2015: day 14: Earley: a promising newer parser library for Haskell

## Table of contents for the whole series

A table of contents is at the top of the article for day 1.

## Day 14

On
day 10,
I showed how to use S-expressions to avoid having to write a custom
parser. But writing parsers isn’t too bad in Haskell, or is it? The
popular `parsec`

library
has many problems,
because it requires hand-hacked backtracking that causes weird error
messages and difficulty in reasoning about your grammar. There’s an
improved fork of `parsec`

called
`megaparsec`

, but
it’s still the same kind of technology. How about something completely
different.

The recent `Earley`

is
intriguing and I’ve begun using it for new projects where I don’t need
the monadic power of something like `parsec`

but are OK with an
applicative API instead and don’t need the performance of something
like `attoparsec`

. Apart from good error messages, it allows handles
online parsing and ambiguity.

Today I’ll give two small examples of using `Earley`

.

## Installation

Since Stackage LTS is behind right now, and `Earley`

keeps moving,
I decided to use the latest version of `Earley`

by modifying our
`stack.yaml`

:

```
- Earley-0.10.1.0
```

### (Update of 2016-01-06)

Stackage LTS 4 has caught up, no no more need for this modification.

## Parsing into day 10’s AST

Let’s go back to the symbolic differentiation problem on day 10, and create a math-like infix syntax to parse.

### Tests

Here are some HSpec/QuickCheck tests to illustrate what we want when
parsing a string into an `Exp`

.

#### Imports

`Text.Earley`

is the main module of the `Earley`

package; `Report`

is
used for return a report on the progress of the parse.

```
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE LambdaCase #-}
module SymbolicDifferentiation.EarleySpec where
import SymbolicDifferentiation.AlphaSyntax (Exp(N, V, Plus, Times))
import qualified SymbolicDifferentiation.Earley as Earley
import Text.Earley (Report(..))
import Test.Hspec (Spec, hspec, describe, it, shouldBe, shouldSatisfy)
import Test.Hspec.QuickCheck (prop)
import Test.QuickCheck (NonNegative(..))
import Data.String.Here (i)
```

#### QuickCheck tests

Some QuickCheck tests that show that some sample expressions such as
`x*a + y*b * (z+c)`

parse into the expected ASTs:

```
spec :: Spec
spec =
describe "Custom syntax for expression parsed by Earley" $ do
-- For simplicity, don't support negative numeric literals now.
prop "x + a" $ \(NonNegative (a :: Int)) ->
fst (Earley.parses [i|x + ${a}|]) `shouldBe`
[Plus (V "x") (N a)]
prop "x*a + y*b * (z+c)" $
\(NonNegative (a :: Int))
(NonNegative (b :: Int))
(NonNegative (c :: Int)) ->
fst (Earley.parses [i|x*${a} + y*${b} * (z+${c})|]) `shouldBe`
[Plus (Times (V "x") (N a))
(Times (Times (V "y") (N b))
(Plus (V "z") (N c)))]
```

#### Expected parse errors

Finally, one example of how to check for expected parse errors. The error tokens are user-defined and attached to grammar productions, as we will see.

```
it "x + y * + 5" $
Earley.parses "x + y * + 5" `shouldSatisfy`
\case
([], Report { position = 8
, expected = ["number", "identifier", "("]
, unconsumed = "+ 5"
}) -> True
_ -> False
```

### Implementation

The implementation involves `Applicative`

idioms that will be familiar
to you if you have used `parsec`

.

#### Imports

```
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE RecursiveDo #-}
{-# LANGUAGE FlexibleContexts #-}
module SymbolicDifferentiation.Earley where
import SymbolicDifferentiation.AlphaSyntax (Exp(N, V, Plus, Times))
import qualified Text.Earley as E
import Text.Earley ((<?>))
import Control.Applicative (many, some, (<|>))
import qualified Data.Char as Char
import Control.Monad.ST (ST)
import Data.ListLike (ListLike)
-- | What to report for something expected.
type Expected = String
```

The `<?>`

operator is used to attach an expectation (which we have
decided to specify as a string, with type synonym `Expected`

) to a
production.

#### Drivers

What we want for our particular problem is a parser that takes a string as input and expects to fully parse it. We construct it from a more generic parser that comes from processing our grammar.

```
-- | Return a list of all possible `Exp` parses, and also a status report
-- regardless of how many successes.
parses :: String -> ([Exp], E.Report Expected String)
parses = E.fullParses expParser
-- | Parser created from the grammar.
expParser :: ListLike input Char =>
ST state (input -> ST state (E.Result state Expected input Exp))
expParser = E.parser grammar
```

#### Grammar

Our grammar is straightforward. `Earley`

uses a monad to maintain its
internal state, and we use the `RecursiveDo`

GHC extension (covered in
a
2014 Day of GHC Extensions)
in order to be able to refer to a rule within the grammar
recursively. Note that left recursion in the grammar is just fine for
`Earley`

.

`Prod`

is the type constructor for a production, and you build up
productions using combinators such as `satisfy`

and `symbol`

.

```
-- | Basically taken from <https://github.com/ollef/Earley/blob/master/examples/Expr2.hs Earley example expression parser>
grammar :: forall r. E.Grammar r (E.Prod r Expected Char Exp)
grammar = mdo
whitespace <- E.rule $
many $ E.satisfy Char.isSpace
let token :: E.Prod r Expected Char a -> E.Prod r Expected Char a
token p = whitespace *> p
sym x = token $ E.symbol x <?> [x]
ident = token $ (:) <$> E.satisfy Char.isAlpha
<*> many (E.satisfy Char.isAlphaNum)
<?> "identifier"
num = token $ some (E.satisfy Char.isDigit) <?> "number"
-- For now, just handle unsigned numeric literals.
atom <- E.rule $
(N . read) <$> num
<|> V <$> ident
<|> sym '(' *> term <* sym ')'
factor <- E.rule $
Times <$> factor <* sym '*' <*> atom
<|> atom
term <- E.rule $
Plus <$> term <* sym '+' <*> factor
<|> factor
return $ term <* whitespace
```

For more examples of grammars, see the
examples directory in the `Earley`

GitHub repo.

### (Update of 2015-12-17)

The inverse of parsing is pretty-printing, covered on day 17.

## For fun: solving the “number word” problem

The ability to handle ambiguity and return all possible parses is a useful one in many situations. Here I show a solution to the “number word” problem. In the past, I have managed ambiguity using Happy’s GLR support, but I don’t like writing parsers using Happy.

The “number word” problem:

```
Given a positive integer, return all the ways that the integer can be
represented by letters using the mapping 1 -> A, 2 -> B, ..., 26 ->
Z. For instance, the number 1234 can be represented by the words ABCD,
AWD and LCD.
```

This is a toy version of an actually serious problem, that of segmentation in natural language.

### Test

The test reflects the problem statement:

```
module EarleyExampleSpec where
import EarleyExample (grammar, NumberWord, Expected)
import qualified Text.Earley as E
import qualified Data.List.NonEmpty as NonEmpty
import Test.Hspec (Spec, hspec, describe, it, shouldMatchList)
spec :: Spec
spec =
describe "EarleyExample" $ do
it "returns all possible parses of number words" $ do
let (result, _) = parseNumberWord 1234
map NonEmpty.toList result `shouldMatchList`
["ABCD", "AWD", "LCD"]
parseNumberWord :: Integer -> ([NumberWord], E.Report Expected String)
parseNumberWord = E.fullParses (E.parser grammar) . show
```

Note that I am using a `NonEmpty`

list of `Char`

because an empty
string is not a valid solution to the “number word” problem. (I
covered `NonEmpty`

on day 7.)

### Solution

The solution is just to write a grammar that tries to pick off valid
consecutive digits to make a letter. We create a production for each
possible letter that we care about, using `numberLetterFor`

, combine
those productions with alternation using `asum`

to get a composite
production `numberLetter`

, then use that for `numberWord`

which is the
grammar.

```
{-# LANGUAGE RecursiveDo #-}
module EarleyExample where
import qualified Text.Earley as E
import Text.Earley ((<?>))
import Control.Applicative ((<|>))
import qualified Data.Foldable as Foldable
import qualified Data.Char as Char
import qualified Data.List.NonEmpty as NonEmpty
import Data.List.NonEmpty (NonEmpty((:|)))
-- | Result wanted.
type NumberWord = NonEmpty NumberLetter
-- | 'A' to 'Z'.
type NumberLetter = Char
-- | What to report for something expected.
type Expected = String
grammar :: E.Grammar r (E.Prod r Expected Char NumberWord)
grammar = mdo
numberWord <- E.rule $
NonEmpty.cons <$> numberLetter <*> numberWord
<|> (:| []) <$> numberLetter
return numberWord
numberLetter :: E.Prod r Expected Char NumberLetter
numberLetter = (Foldable.asum . map numberLetterFor) ['A'..'Z'] <?> "number"
-- | Return a production for a given letter.
--
-- 1 is 'A', 2 is 'B', .. 26 is 'Z'.
numberLetterFor :: NumberLetter -> E.Prod r Expected Char NumberLetter
numberLetterFor c = c <$ E.word (show (toNumber c)) <?> [c]
-- | 'A' is 1, ... 'Z' is 26
toNumber :: NumberLetter -> Int
toNumber c = (Char.ord c - Char.ord 'A') + 1
```

## Conclusion

I only recently discovered the `Earley`

parser library and started
using it. I’m pretty excited by its friendliness.

## All the code

All my code for my article series are at this GitHub repo.

comments powered by Disqus