24 days of Hackage, 2015: day 14: Earley: a promising newer parser library for Haskell

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 14

(Reddit discussion)

On day 10, I showed how to use S-expressions to avoid having to write a custom parser. But writing parsers isn’t too bad in Haskell, or is it? The popular parsec library has many problems, because it requires hand-hacked backtracking that causes weird error messages and difficulty in reasoning about your grammar. There’s an improved fork of parsec called megaparsec, but it’s still the same kind of technology. How about something completely different.

The recent Earley is intriguing and I’ve begun using it for new projects where I don’t need the monadic power of something like parsec but are OK with an applicative API instead and don’t need the performance of something like attoparsec. Apart from good error messages, it allows handles online parsing and ambiguity.

Today I’ll give two small examples of using Earley.

Read On →

24 days of Hackage, 2015: day 13: hint: runtime eval for Haskell

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 13

(Reddit discussion)

One hallmark of a “dynamic language” such as Lisp and JavaScript is the ability to evaluate code at runtime inside a running process. Since runtime loading of classes is a fundamental feature of Java, is Java a “dynamic language” then? I think the terms “static language” and “dynamic language” are not very useful terms, and a comparison of language and compiler and development environments should focus on specific features of the user experience and where the boundaries lie in the semantics. One tricky thing is that a lot of what is interesting is actually implementation-dependent.

For example, the Haskell standard says nothing about runtime eval, so there is some sense in which Haskell considered as a strictly defined “language” has no support for it. But if we consider the GHC implementation and its ecosystem, which is dominant today despite the existence of other Haskell implementations, there’s a lot of tooling that is “dynamic”, in the sense of being able to access GHC APIs in one of many different ways.

Edward Yang recently wrote an interesting blog post “The convergence of compilers, build systems and package managers” on a subset of the general issue of what can access what for the sake of tooling. It didn’t touch on runtime evaluation, which is an entire topic in itself.

For today, I decided to mention that you can already do runtime eval of GHC Haskell code using the package hint and offer the thought that maybe we might want something less ad hoc than third-party packages like this.

Read On →

24 days of Hackage, 2015: day 12: json-autotype: inferring types from data

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 12

(Reddit discussion)

Today we revisit a problem from day 4, in which we grabbed JSON off the Web in order to extract information about it. I mentioned then that we were using an untyped representation for JSON. Ideally, if it’s not too onerous, we want to use a typed representation for our data, to avoid common errors such as trying to access a nonexistent field. The Aeson library allows us to write our own data types and convert them from and to JSON. If we were in charge of the data and data format, then this would be the way to go.

But what if, for whatever reason, we do not already have a typed data model, but are consuming, for example, JSON from a source that has not given us a spec of the data types, maybe with JSON Schema? Then we have to reverse-engineer the data types, possibly from an informal text spec of the format (which is a real pain).

Or we could take a lazy way out, and infer plausible types from a corpus of representative data. This is what the useful json-autotype package does (documentation on the GitHub repo page).

Read On →

24 days of Hackage, 2015: day 11: monad-loops: avoiding writing recursive functions by refactoring

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 11

(Reddit discussion)

I often dislike writing recursive functions. Recursion is a form of goto and therefore recursion is, conceptually, the assembly language low-level infrastructure of functional programming, a fundamental building block but not necessarily something I want to see every day.

An analogy: imperative programming in the 1960s was altered significantly when brand new control structures such as for and while loops changed the face of imperative programming by removing the boilerplate of using the low-level goto everywhere; for and while are the higher-order control structures of imperative programming.

In functional programming, recursion boilerplate is avoided through writing and using higher-order functions such as map,filter, foldr, and traverse. The cool thing is that in functional programming, these are ordinary user-space functions, whereas in imperative programming, the control structures are built into the language and it’s typically not so easy to invent your own.

Why would you want to define control structures yourself? I’ll show examples and a useful little library monad-loops that you can use in order to avoid reinventing the wheel.

Read On →

24 days of Hackage, 2015: day 10: s-cargot: using S-expression syntax

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 10

(Reddit discussion)

There are times when I’m jealous of the Lisp world.

One of those times is when defining some domain-specific language, because in the Lisp world, the natural thing to do is to represent it using S-expressions as the concrete syntax, and not fuss with defining yet another special syntax, along with writing a parser for that syntax into the abstract syntax as well as a pretty-printer from the abstract syntax back to the concrete syntax. Maybe in the long run users might want a special syntax that is not just S-expressions, but for quick initial prototyping, at least, it seems worthwhile to not commit to any special syntax and just use S-expressions. Although there is nothing magical about S-expressions (or XML or JSON or any other generic representation of a tree data structure), they are particularly concise and flexible.

S-expressions are so useful that in my first job as a software engineer in the 1990s, we actually had a C++ S-expression library for input and output of a format that amounted to a domain-specific language (this was before XML was invented) that was processed by many tools (including a validator that I wrote in Standard ML).

A library on Hackage for working with S-expressions in Haskell is s-cargot. There have been many others, but most of them have gone sadly unmaintained, whereas this one is new and comes with bells and whistles.

Today I’ll give an example of how to use this library, in the context of a problem domain in which having a concrete syntax is important.

Read On →

24 days of Hackage, 2015: day 9: Template Haskell goodies: here, interpolate, file-embed

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 9

(Reddit discussion)

A stray negative remark I made on day 8 regarding Haskell and its unergonomic support for multi-line string literals and interpolation led to good comments that I was being misleading because there actually exist good solutions. I already use one of them, but had not brought them into the picture because I didn’t want to be distracting in that post by bringing in other libraries, especially since they are implemented in Template Haskell, the GHC extension that is “macros for Haskell”, enabling compile-time metaprogramming (see the 2014 Day of Hackage article about Template Haskell. On day 2 I already introduced a package that uses Template Haskell, so it looks like I’ll be continuing to do that today.

These packages require only that you turn on QuasiQuotes in modules where you use them, so our example source code will have the header

{-# LANGUAGE QuasiQuotes #-}
Read On →

24 days of Hackage, 2015: day 8: multiset; I wish this were in the standard containers package

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 8

(Reddit discussion)

I don’t remember when it was, but one day I got sick and tired of reimplementing in Haskell the same boilerplate logic when keeping track of many objects classified under the same key. This is a data structure known as a multiset, also known as a bag, in contrast with a regular set, which only keeps track of one unique object per key.

I was surprised that by the lack of a multiset module in the standard containers package, although not completely surprised because you can implement a multiset on top of a set, so in some sense a multiset is “superfluous”. Still, I was used to having a multiset available without any work (however trivial), because of long using a multiset in C++, which I’d used back in the 1990s from the original implementation of the Standard Template Library, and also Python’s collections library has a Counter class which serves a similar purpose.

Today I’ll briefly show some code using multiset and talk about why something like this maybe should be in the standard library.

Read On →

24 days of Hackage, 2015: day 7: semigroups; NonEmpty list and a case study of types and tests

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 7

(Reddit discussion)

How often has the following runtime error happened to you, whether in Haskell or in some other language?

*** Exception: Prelude.head: empty list

Basically, code blew up that assumed a list was nonempty but it wasn’t.

In fact, posted on Reddit recently was a question about code that failed with

** Exception: Prelude.foldl1: empty list

which is the same problem.

Today, I present a case study: refactoring that code to take advantage of an important data type that is going to be part of the Haskell standard library, the NonEmpty list type. This is part of Edward Kmett’s semigroups package that is going into the standard library.

Read On →

24 days of Hackage, 2015: day 6: finding utilities with Hoogle and Hayoo: MissingH, extra

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 6

(Reddit discussion)

It will never be the case that everything everyone will find useful will already be in the “standard library” for any language ecosystem. However, one of the coolest features of the Haskell ecosystem (which wows all non-Haskellers when I show them), is the ability to search for useful functions by type signature, using Hoogle or Hayoo. You can use other criteria also, such as names; this can be useful if you have a guess at what some useful function might be named.

There seem to be two philosophies as far as using other people’s utility libraries (or even making one’s own to share between different projects):

  • reuse is great, let’s do it
  • every dependency is a potential liability, so it’s better to reinvent, or copy and paste, rather than use something of uncertain quality or maintainability

I tend to prefer reuse, but there have been times when I have copied (and even modified) only just what I need, because I don’t want the rest of what is inside a sprawling library that depends transitively on a whole lot of stuff I don’t need. I think this is a granularity issue. Many people have proposed the idea that since we have a Web now, in theory the concept of “library” should go obsolete in favor of micro-libraries, so to speak, maybe sometimes even to the level of single standalone functions, and maybe even having a unique identifier, but this topic is outside the scope of this article. (For just one idea, check out Gabriel Gonzalez’s “The internet of code”.)

The situation is also complicated by the fact that often, so much can be reinvented with only a couple of lines of Haskell code, so why even bother looking for someone’s implementation of it?

But let’s assume for this article that you are interested in finding and using utility libraries. I show how to find some example functions and reach two utility libraries that I use, very cleverly and informatively named MissingH and extra.

Read On →

24 days of Hackage, 2015: day 5: should-not-typecheck: making Haskell sort of dynamically typed with deferred type errors

Table of contents for the whole series

A table of contents is at the top of the article for day 1.

Day 5

(Reddit discussion)

Have you ever been frustrated when using a statically typed language because there’s a type error somewhere in your code base but you want to run your program anyway, either because you don’t care about that remote type error that has nothing to do with what you’re working on, or because you want to step through your code and debug what the type error really is? I certainly have.

Also, have you ever wanted to write a unit test to verify that your typed code disallows code you want to disallow, but you are frustrated because how do you write code in a typed language that says, “This code (that you won’t typecheck) won’t typecheck” and passes the typechecker and runs?

Welcome to the land of GHC’s “deferred type errors”, a feature that has been part of GHC since version 7.6.1 in 2013. Since this was not covered in Ollie’s 2014 series “24 Days of GHC Extensions”, I decided to bring it up here, and in the context of a cute package, should-not-typecheck that hooks up with HSpec to make assertions that something won’t typecheck.

Read On →