24 days of Hackage, 2015: day 19: ghc-core-html, list-fusion-probe; checking GHC's fusion rewrite rules for erasing intermediate data from existence
Dec 19, 2015 · 10 minute read · CommentsHaskellHackagefusionGHCghc-core-htmllist-fusion-probe
Table of contents for the whole series
A table of contents is at the top of the article for day 1.
Day 19
The single coolest feature of using Haskell, for me, has to be fusion. The GHC compiler performs this remarkable optimization that can erase entire intermediate data structures from existence.
What does that mean, and how can we know it happened? Today I’ll show
how ghc-core-html
and list-fusion-probe
can sort of help in
determining what the compiler actually did to your intermediate data
structures.
I’ll give examples with lists as the intermediate data, but also mention vectors because yesterday, day 18, I briefly mentioned fusion in the context of vectors.
Imports for code
First, let’s get some imports out of the way before showing some HSpec and QuickCheck tests:
{-# LANGUAGE ScopedTypeVariables #-}
module ListFusionProbeSpec where
import Data.List.Fusion.Probe (fuseThis)
import Test.Hspec ( Spec, hspec, describe, it
, shouldBe, shouldSatisfy
, shouldThrow, errorCall
)
import Test.Hspec.QuickCheck (prop)
import Test.QuickCheck.Function (Fun(..), apply)
import Control.Exception (evaluate)
import Control.Arrow ((>>>))
import Data.Function ((&))
What is intermediate data in a pipeline?
When programming in a compositional way, often we create pipelines of data flow, in which data from one stage gets transformed into data for the next stage, and so on until the final output. The problem is that a naive implementation of a pipeline will result in construction of some data structure that exists only for the purpose of being consumed by the next stage.
Here’s an HSpec test illustrating a pipeline in which we name each intermediate list:
spec :: Spec
spec =
describe "list-fusion-probe" $ do
it "runs a chain of maps, filters" $
let list1 = ["Hello", "my", "world!"]
-- ["Hello", "world!"]
list2 = filter ((> 2) . length) list1
-- [5, 6]
list3 = map length list2
list4 = map (*3) list3
in list4 `shouldBe` [15, 18]
In a typical implementation of a typical programming language, code looking like this will result in allocating and creating four lists (or arrays, or whatever collection type is idiomatic and desired), one after another, and traversing three lists using a filter and two maps. To do something cleverer and avoid creating intermediate data, one uses a special “stream” collection type, or more radically, “transducers”. These techniques enable collapsing the computation of the final data into a single traversal and no allocation of intermediate data.
The Haskell ecosystem contains many libraries for performing these
kinds of optimizations, such as
foldl
, but they are
outside the scope of this article. Instead, for simple examples such
as the one above, GHC already automatically applies fusion, through
special
rewrite rules
that are included in the standard libraries.
A note on pipeline syntax and composition
We can write the pipeline in many different ways. Here’s an OO-style
way using &
(reverse function application) and >>>
(left-to-right composition):
it "runs a chain of maps, filters (OO-style)" $
let list4 = ["Hello", "my", "world!"] &
(filter (length >>> (> 2))
>>> map length
>>> map (*3)
)
in list4 `shouldBe` [15, 18]
Here is the same thing but with the compositionality factored out. This is my preferred way of writing pipelines as first-class values:
it "runs a chain of maps, filters, written compositionally with >>>" $
let pipeline = filter (length >>> (> 2))
>>> map length
>>> map (*3)
in pipeline ["Hello", "my", "world!"] `shouldBe` [15, 18]
Using traditional right-left composition .
:
it "runs a chain of maps, filters, written compositionally with ." $
let pipeline = map (*3)
. map length
. filter ((> 2) . length)
in pipeline ["Hello", "my", "world!"] `shouldBe` [15, 18]
What does rewriting mean?
The thing that rewriting does that the other approaches do not: the rewriting is a compiler preprocessing pass that rewrites your source code in order to optimize it. We’re not going into exactly how it works. There are rewrite rules that match source code constructs and replace them with semantically equivalent constructs, and basically fusion happens when during the course of repeatedly rewriting, certain constructs cancel themselves out, and “poof” goes your intermediate data.
If you look at the source code for
map
you will see scary-looking comments and compiler directives that look
like
map :: (a -> b) -> [a] -> [b]
{-# NOINLINE [1] map #-} -- We want the RULE to fire first.
-- It's recursive, so won't inline anyway,
-- but saying so is more explicit
map _ [] = []
map f (x:xs) = f x : map f xs
-- Note eta expanded
mapFB :: (elt -> lst -> lst) -> (a -> elt) -> a -> lst -> lst
{-# INLINE [0] mapFB #-}
mapFB c f = \x ys -> c (f x) ys
-- The rules for map work like this.
--
-- Up to (but not including) phase 1, we use the "map" rule to
-- rewrite all saturated applications of map with its build/fold
-- form, hoping for fusion to happen.
-- In phase 1 and 0, we switch off that rule, inline build, and
-- switch on the "mapList" rule, which rewrites the foldr/mapFB
-- thing back into plain map.
--
-- It's important that these two rules aren't both active at once
-- (along with build's unfolding) else we'd get an infinite loop
-- in the rules. Hence the activation control below.
--
-- The "mapFB" rule optimises compositions of map.
--
-- This same pattern is followed by many other functions:
-- e.g. append, filter, iterate, repeat, etc.
{-# RULES
"map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)
"mapList" [1] forall f. foldr (mapFB (:) f) [] = map f
"mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)
#-}
Some resources on fusion
Here are some articles illustrating more of the low-level workings of fusion:
But how do we know whether fusion worked?
The thing that has always bothered me is that there has not been an easy way to know whether fusion actually worked or tell the compiler, “I want the result of this expression fused, so report an error if you can’t fuse it for me because I care about maximum efficiency here”. For example, here is an open bug report about something that someone expected to be fused.
When writing high-abstraction code and expecting the compiler to do important optimizations for me, I consider it important to be notified if they are not happening as expected.
So it was interesting when I came across the list-fusion-probe
library that does its own rewrite trickery to detect whether a list
has fused or not. The way you use it with an expression of list type
is you just wrap it with fuseThis
, and the library will rewrite the
call such that if fusion is happening, it just returns the list, but
if not, it generates code that at runtime throws an exception. This is
extremely primitive (I would never want to ship code that threw an
exception “cannot fuse”), but is a useful proof of concept for
testing.
So here are a few tests illustrating how insertion of fuseThis
into
pipelines verifies that fusion does happen (because there is no run
time exception):
it "fuses a chain of maps, filters" $
let list1 = ["Hello", "my", "world!"]
-- ["Hello", "world!"]
list2 = fuseThis $ filter ((> 2) . length) list1
-- [5, 6]
list3 = fuseThis $ map length list2
list4 = map (*3) list3
in list4 `shouldBe` [15, 18]
it "fuses a chain of maps, filters" $
let list4 = ["Hello", "my", "world!"]
pipeline = filter (length >>> (> 2)) >>> fuseThis
>>> map length >>> fuseThis
>>> map (*3)
in pipeline list4 `shouldBe` [15, 18]
it "Prelude foldl fuses" $
let list = fuseThis [0..1001] :: [Int]
in foldl (+) 0 list `shouldBe` 501501
We’ve verified here that list2
and list3
never exist as
materialized data.
Note, for example, that the foldl
example’s fusion amounts to the
list [0..1001]
never being created; instead, the generated machine
code is basically a loop with index from 0
to 1001
adding to an
accumulator.
Here’s a fusion failure:
it "handwritten myFoldl fails to fuse" $
let list = fuseThis [0..1001] :: [Int]
in evaluate (myFoldl (+) 0 list) `shouldThrow`
errorCall "fuseThis: List did not fuse"
where we provided myFoldl
written in such a way that it does not
match the fusion rewriting rules:
-- | This example taken straight from `list-fusion-probe` tests directory.
myFoldl :: (b -> a -> b) -> b -> [a] -> b
myFoldl f = go
where go a [] = a
go a (x:xs) = go (f a x) xs
Finally, for fun, let’s write a QuickCheck test that tests arbitrary predicates and transformation functions in a filter and map pipeline:
prop "Prelude foldl fuses the result of a filter, map pipeline" $
\(list :: [Int]) (predicate :: Fun Int Bool) (f :: Fun Int Int) ->
let pipeline = fuseThis . filter (apply predicate)
. fuseThis . map (apply f)
in
-- Just to force evaluation.
foldl (+) 0 (pipeline list) `shouldSatisfy` (<= maxBound)
Why not report a fusion failure at compile time instead?
There’s a user interface issue when it comes to optimizations such as fusion. Ideally, I’d like to be able to declare desired fusion and have a failure be reported as a compile-time error.
However, the situation is tricky, because then the code can become
brittle in the face of different optimization settings. For example,
if you run the tests above within GHCi, you will find that GHCi never
reports fusion failure. This is because it’s just not doing fusion at
all: it’s ignoring rewrite rules, both in the standard library and in
list-fusion-probe
.
Maybe the solution is to have conditional directives. I do think that there has to be an option to treat a known performance bug as a compile-time error. It is not practical to live in fear and defensiveness when writing performance-oriented code and deciding, in the absence of clear compiler feedback, to simply manually write low-level ugly code because of lack of confidence that the elegant code will not perform properly.
What do you think? What kind of feedback would you like to get from compilation about optimizations you expect to be performed?
Checking generated code in yesterday’s vector
example
Yesterday, I mentioned that fusion for vectors happens, but I wasn’t
confident about how much actually happens. I found a useful tool
ghc-core-html
that outputs in somewhat prettified fashion the GHC
Core code generated within GHC.
To use it, install it globally:
$ stack install ghc-core-html
I ran it on a sample modified source file to get HTML output:
$ stack exec ghc-core-html src/VectorFusionExample.hs > VectorFusionExample.html
The sample file:
-- | Extracted from VectorExample for minimal GHC Core output file.
module VectorFusionExample (makeCountsPurely) where
import qualified Data.Word as Word
import qualified Data.Vector.Unboxed as V
-- | Assume 8-bit unsigned integer.
type Value = Word.Word8
-- | Number of occurrences of an 8-bit unsigned value.
-- We assume no overflow beyond 'Int' range.
type CountOfValue = Int
-- | NOT the real thing!! With fake constants 123, 789
-- to help in reading the GHC Core.
makeCountsPurely :: V.Vector Value -> V.Vector CountOfValue
makeCountsPurely =
V.unsafeAccumulate (+) (V.replicate numPossibleValues 123)
. V.map (\v -> (fromIntegral v, 789))
-- | 256, in our case.
numPossibleValues :: Int
numPossibleValues = fromIntegral (maxBound :: Value) + 1
I’m not going to show the output, but just note that looking at it, I
saw two calls to newByteArray#
. Since one of the created arrays is
the final array, but there is another allocation, we didn’t manage to
get down to creating just a single array without allocating any other,
unlike the implementation yesterday using an internal MVector
.
This is good to know, but I shouldn’t have to generate low-level code and inspect it in order to obtain high-level information about what is fused where.
For more information
A tutorial on understanding the GHC compilation process is here. Here’s another.
A video by Simon Peyton Jones on Core.
Read anything and everything by Johan Tibell on Haskell performance.
Conclusion
Today I mentioned the importance of fusion and my frustration at not
being able to get the kind of detailed information I would like about
whether GHC performed fusion where I want it. However,
list-fusion-probe
helps with writing tests to detect list fusion
failure, and ghc-core-html
is a useful general-purpose tools for
examining generated code for any purpose.
All the code
All my code for my article series are at this GitHub repo.