Understanding Clojure transducers through types

Yesterday, Rich Hickey published a blog post, “Transducers are Coming”, which attracted a lot of attention.

I have a confession to make, which I have made before: I find it very difficult to understand ideas or code not presented with types. So I decided that the only way I could possibly understand what “transducers” are would be to actually implement them in a typed language. I ended up doing so and am sharing my findings here.

Vague types in the original blog post

Rich informally gave some type signatures in his blog post:

;;reducing function signature
whatever, input -> whatever

;;transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)

This was, unfortunately, not very helpful. It is hard to make sense of this pseudo-notation for types. What is quantified over what? And what is bound to what? I’ll explain later what I mean by these questions.

First discussion thread I saw

There was much tweeting online about transducers after Rich Hickey’s initial announcement; the tweets did not help me, except for links posted to discussion elsewhere.

One of them was on Hacker News. I browsed through it but found it mostly not useful. The problem was that although a lot of interesting Haskell code was thrown around, it tended to be related to transducers but not an exact translation of the concept. I already had my own intuitions about transducers being related to well-known types such as foldables, iteratees, lenses, etc. That “ordinary function composition” was involved immediately suggested the connections, because function composition is huge in these existing Haskell libraries.

But what I wanted was to understand transducers as they are, before even thinking about generalizations and comparisons.

What are the types?

Rich Hickey informally offered some types (which he said were “a la Haskell”) to try to help out:

    ;;reducing fn
    x->a->x

    ;;transducer fn
    (x->a->x)->(x->b->x)

OK, by using type variables a, b, and x, that indicates what is bound to what. The blog post should have used this notation rather than

;;transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)

Sample Clojure code

He also posted some sample Clojure code:

Second discussion thread I saw

Then today, I saw a discussion thread on Reddit, titled “Clojure’s Transducers are Perverse Lenses”.

Actual runnable Haskell code

Rich finally posted some actual type-checked, runnable Haskell code!

-- Transducers in Haskell

mapping :: (a -> b) -> (r -> b -> r) -> (r -> a -> r)
-- Original was (b -> a) -> (r -> a -> r) -> (r -> b -> r)
-- but Michael O'Keefe in comment pointed out this is misleading
mapping f xf r a = xf r (f a)

filtering :: (a -> Bool) -> (r -> a -> r) -> (r -> a -> r)
filtering p xf r a = if p a then xf r a else r

flatmapping :: (a -> [b]) -> (r -> b -> r) -> (r -> a -> r)
flatmapping f xf r a = foldl xf r (f a)

-- for exposition only, yes, conj is gross for lazy lists
-- in Clojure conj and left folds dominate
conj xs x = xs ++ [x]
xlist xf = foldl (xf conj) []

-- build any old list function with its transducer, all the same way
xmap :: (a -> b) -> [a] -> [b]
xmap f = xlist $ mapping f

xfilter :: (a -> Bool) -> [a] -> [a]
xfilter p = xlist $ filtering p

xflatmap :: (a -> [b]) -> [a] -> [b]
xflatmap f = xlist $ flatmapping f

-- again, not interesting for lists, but the same transform
-- can be put to use wherever there's a step fn

xform :: (r -> Integer -> r) -> (r -> Integer -> r)
xform = mapping (+ 1) . filtering even . flatmapping (\x -> [0 .. x])


print $ xlist xform [1..5]
-- [0,1,2,0,1,2,3,4,0,1,2,3,4,5,6]

After this post, I knew it would not take me long to figure out transducers.

Refactoring his Haskell code

Two things to notice about the original code:

Type synonyms and higher-rank types

Defining lots and lots of types (whether synonyms or newtypes is standard practice when programming in a modern typed language. OK, so I defined a type synonym

-- Left reduce
type Reducer a r = r -> a -> r

But what about transducer? This is trickier.

An invalid attempt at a type would be

-- Illegal!
type Transducer a b = Reducer a r -> Reducer b r

because the type variable r is not bound in the type definition. And it would be incorrect to just randomly add r on the left hand side as an extra parameter to the Transducer type, because in fact it is critical that a transducer does not care about the underlying reducer’s return type r. How do we write the desired type?

It turns out you need higher-rank types. Rank-1 types are not sufficient; we need a rank-2 type to quantify r, to say that a transducer from a to b is a transformation that takes a reducer to a specific r and returns another reducer to the same r.

-- Here's where the rank-2 type is needed
type Transducer a b = forall r . Reducer a r -> Reducer b r

Now we can see more clearly some completely generic ways to create a transducer:

mapping :: (a -> b) -> Transducer b a
mapping f xf r a = xf r (f a)

filtering :: (a -> Bool) -> Transducer a a
filtering p xf r a = if p a then xf r a else r

A bit of history

Higher-rank types are a powerful technique for expressing “hiding” of unnecessary details about types going on somewhere. My first recollection of the real world use of rank-2 types is from 1994 (the year I started using Haskell, although I did not actually use it in my work as a software engineer until 1995), when I was excited to read a paper by John Launchbury and Simon Peyton Jones that solved, using a rank-2 type, a specific, important, practical problem, “Lazy Functional State Threads”; twenty years later, their ST monad is still part of the standard library!

Introducing type classes

Clojure uses protocols as an abstraction mechanism, and the “magic” of transducers uses protocols. In Haskell, type classes are the major abstraction mechanism (this is true of Scala also).

So I abstracted away from the hardcoded list-oriented functions and values in Rich Hickey’s Haskell code:

-- Left fold
class Foldable f where
  fold :: Transducer a (f a)

class Conjable f where
  empty :: f a
  conj :: Reducer a (f a)

Note our reliance on transducing and reducing from one type a to another, f a.

Foldable constraint

Unlike mapping and filtering, flatmapping is not completely generic, because it depends on something being Foldable (implementing a fold):

flatmapping :: Foldable f => (a -> f b) -> Transducer b a
flatmapping f xf r a = fold xf r (f a)

Conjable constraint

Finally, here’s the originally list-specific code that now depends only on Foldable and Conjable:

-- I changed Rich Hickey's code to be more general than just list
-- but accept anything Conjable
xlist :: (Foldable f, Conjable f) => Transducer b a -> f a -> f b
xlist xf = fold (xf conj) empty

-- build any old Foldable function with its transducer, all the same way
xmap :: (Foldable f, Conjable f) => (a -> b) -> f a -> f b
xmap f = xlist $ mapping f

xfilter :: (Foldable f, Conjable f) => (a -> Bool) -> f a -> f a
xfilter p = xlist $ filtering p

xflatmap :: (Foldable f, Conjable f) => (a -> f b) -> f a -> f b
xflatmap f = xlist $ flatmapping f

List-specific stuff

Here is the list-specific code:

-- Stuff specialized to lists.
-- To use another type, just make it a Foldable and Conjable.
instance Foldable [] where
  fold = foldl

-- for exposition only, yes, conj is gross for lazy lists
-- in Clojure conj and left folds dominate
instance Conjable [] where
  empty = []
  conj xs x = xs ++ [x]

-- Note: the type does not say anything about Foldable or Conjable,
-- even though the implementation just happens to use a list!
xform :: Transducer Integer Integer
xform = mapping (+ 1) . filtering even . flatmapping (\x -> [0 .. x])

-- Again, this can munge anything Foldable and Conjable, not just a list.
munge :: (Foldable f, Conjable f) => f Integer -> f Integer
munge = xlist xform

Notice some very important properties of this code:

Example:

-- munge a list
-- [0,1,2,0,1,2,3,4,0,1,2,3,4,5,6]
example1 :: [Integer]
example1 = munge [1..5]

Implementing another type to illustrate the genericity of transducers

To illustrate Rich Hickey’s main point, I implemented instances of Foldable and Conjable for a standard Haskell Vector library as an alternate “collection-like” type.

-- For example using Vector instead of list
import qualified Data.Vector as V

-- Implement Foldable, Conjable type classes for Vector
instance Foldable V.Vector where
  fold = V.foldl

instance Conjable V.Vector where
  empty = V.empty
  conj = V.snoc

And we can run munge directly on a vector instead of a list, without making any changes:

-- return a vector rather than a list; note the fact that munge actually
-- internally uses a list
example2 :: V.Vector Integer
example2 = munge $ V.enumFromN 1 5

This is code reuse at its best.

Note that there is nothing that ties transducers to any concrete “collection” type. We could write instances of Foldable and Conjable for some kind of “channel” abstraction, for example, and instantaneously be able to munge data coming from it and to another. In fact, this is already what is done in the real world, where Haskell and Scala are used in production at places like Facebook and Twitter to efficiently handle large amounts of data.

My code repository

My complete code is available on GitHub.

Conclusion

It was pretty exciting to see the announcement of the transducers library for Clojure, because it represents a level of abstraction that I think has not been expressed much in the world of dynamically typed languages, although the techniques are two decades old in the Haskell community in a statically typed setting. And I hope that I was able to convey the sheer elegance of Haskell as a way to express interesting types with practical ramifications for abstraction and pluggability.

comments powered by Disqus