24 days of Hackage, 2015: day 21: hood, GHood, Hoed: observation oriented debugging in Haskell
Dec 21, 2015 · 7 minute read · CommentsHaskellHackagegoodGHoodHoeddebugging
Table of contents for the whole series
A table of contents is at the top of the article for day 1.
Day 21
How do you debug your Haskell code?
I have to confess up front that I don’t have a good answer to the question of how I debug in Haskell.
I’m not really the right person to talk about debuggers, because the
last time I used an official debugging tool was when I was developing
in C and C++ and used tools such as gdb
and higher-level interfaces
to that, and since then, my debugging process for most languages has
involved looking at stack traces and logs, insertion of “print”
statements, writing finer-grained tests, and refactoring code to find
out the root cause of a problem. I do not use official debugger
applications any more (with breakpoints, stepping, etc.). But should
I?
The question becomes even more complicated when working in Haskell, because I think it honest to say that Haskell does not have a great story for debugging.
I’ve taken a look into a family of debugging tools for Haskell,
including hood
,
GHood
, and
Hoed
, which are all based on the
same concept: manual annotation of source code in order to generate
traces that can later be analyzed in interesting ways.
Hood
Let’s take a look at hood
first, standing for “Haskell Object Observation Debugger”. It’s all
about observation through a type class Observable a
and an
instrumenting function observe
:
observe :: Observable a => String -> a -> a
Secret side effects!
Warning! Despite its type signature, this function performs
effects underneath using unsafePerformIO
, so you have to be careful
how you write code that uses observe
, in order to get the traces you
want.
Example of instrumenting a pipeline
Let’s instrument the word counting pipeline from
day 8. We
import Debug.Hood.Observe
, and copy and paste the original code with
modifications to insert calls to observe
.
{-# LANGUAGE OverloadedStrings #-}
module HoodExample where
-- | hood
import Debug.Hood.Observe (Observable(..), observe, observeBase, printO)
import qualified Data.MultiSet as MultiSet
import qualified Data.Text.Lazy as LazyText
import qualified Data.Text.Lazy.Builder as LazyBuilder
import Data.Text.Lazy.Builder.Int (decimal)
import Data.Ord (Down(..))
import qualified Data.Char as Char
import qualified Data.List as List
import Control.Arrow ((>>>))
import Data.Monoid ((<>))
-- | Break up text into "words" separated by spaces, while treating
-- non-letters as spaces, count them, output a report line for each
-- word and its count in descending order of count but ascending order
-- of word.
wordCount :: LazyText.Text -> LazyText.Text
wordCount = observe "(1) wordCount"
>>> LazyText.map replaceNonLetterWithSpace >>> observe "(2) map replaceNonLetterWithSpace"
>>> LazyText.words >>> observe "(3) LazyText.words"
>>> MultiSet.fromList >>> observe "(4) MultiSet.fromList"
>>> MultiSet.toOccurList >>> observe "(5) MultiSet.toOccurList"
>>> List.sortOn (snd >>> Down) >>> observe "(6) List.sortOn (snd >>> Down)"
>>> map summarizeWordCount >>> observe "(7) map summarizeWordCount"
>>> mconcat >>> observe "(8) mconcat"
>>> LazyBuilder.toLazyText >>> observe "(9) LazyBuilder.toLazyText"
replaceNonLetterWithSpace :: Char -> Char
replaceNonLetterWithSpace c
| Char.isLetter c = c
| otherwise = ' '
summarizeWordCount :: (LazyText.Text, MultiSet.Occur) -> LazyBuilder.Builder
summarizeWordCount (word, count) =
LazyBuilder.fromLazyText word <> " " <> decimal count <> "\n"
(I apologize for the mindless boilerplate in the String
description
of each observation point: one can imagine writing a Template Haskell
wrapper, I guess, if desired.)
Here I’ve chosen just to instrument the natural data “stages” in the
pipeline. If we wanted to, we could instrument at any other level as
well, e.g. the result of calling summarizeWordCount
(with
(summarizeWordCount >>> observe "summarize word count"
), or many
other more powerful levels, such as instrumenting a function, not just
the result of a function call, e.g., (observe "summarizeWordCount
function" summarizeWordCount)
, which results in collecting all the
calls to the function.
Implementing the Observable
type class
Annoyingly, because everything’s based on the Observable
type class,
we ignore GHC warnings and create orphan
instances for various types
(the alternative, adding a newtype
wrapper for everything, is rather
onerous in this situation):
-- | Some orphan instances of 'Observable'.
instance Observable LazyText.Text where
observer = observeBase
instance (Observable a, Show a) => Observable (MultiSet.MultiSet a) where
observer = observeBase
instance Observable LazyBuilder.Builder where
observer = observeBase
Sample output
For simplicity, let’s use printO :: Show a => a -> IO ()
to perform a
run that outputs a trace. More generally, there’s runO
that executes
an arbitrary IO
action.
exampleRun :: IO ()
exampleRun = printO $
wordCount "I have all-too-many words; words I don't like much!"
The output (I just run this in GHCi):
-- (1) wordCount
"I have all-too-many words; words I don't like much!"
-- (2) map replaceNonLetterWithSpace
"I have all too many words words I don t like much "
-- (3) LazyText.words
"I" : "have" : "all" : "too" : "many" : "words" : "words" : "I" :
"don" : "t" : "like" : "much" : []
-- (4) MultiSet.fromList
fromOccurList [("I",2),("all",1),("don",1),("have",1),("like",1),("many",1),("much",1),("t",1),("too",1),("words",2)]
-- (5) MultiSet.toOccurList
("I", 2) : ("all", 1) : ("don", 1) : ("have", 1) : ("like", 1) :
("many", 1) : ("much", 1) : ("t", 1) : ("too", 1) : ("words", 2) : []
-- (6) List.sortOn (snd >>> Down)
("I", 2) : ("words", 2) : ("all", 1) : ("don", 1) : ("have", 1) :
("like", 1) : ("many", 1) : ("much", 1) : ("t", 1) : ("too", 1) : []
-- (7) map summarizeWordCount
"I 2\n" : "words 2\n" : "all 1\n" : "don 1\n" : "have 1\n" :
"like 1\n" : "many 1\n" : "much 1\n" : "t 1\n" : "too 1\n" : []
-- (8) mconcat
"I 2\nwords 2\nall 1\ndon 1\nhave 1\nlike 1\nmany 1\nmuch 1\nt 1\ntoo 1\n"
-- (9) LazyBuilder.toLazyText
"I 2\nwords 2\nall 1\ndon 1\nhave 1\nlike 1\nmany 1\nmuch 1\nt 1\ntoo 1\n"
This can definitely be a useful way to debug pipelines, or just to generate traces for use in teaching. For example, here we might see immediately that we didn’t get the answer we wanted (classify “don’t” as a word) because at the third stage, we got “don”, which meant that something went wrong in a previous stage, of which there happened to be only one here, the removal of non-word characters.
There’s a lot more you can do with hood
, but this gives a taste.
GHood
Next, we briefly move to
GHood
, which has the
same Observable
interface as hood
, except we import
Debug.Observe
instead of Debug.Hood.Observe
. So I won’t show the
example code because it is literally just copy and paste and changing
the import. Module copy/paste issues like this (which also arose on
day 15 on IOSpec
)
really make me wish Haskell had parameterized modules like ML. (And
the Observable
orphan instance issue as well, where ML-style modules
are the natural way to plug in different ways to observe the same data
type as desired.)
GHood
is a Java-based graphical back end to hood
. When you
install GHood
, it comes with a Java JAR file. Java is executed to
read a log file ObserveEvents.log
that is generated from running
hood
-instrumented code. You can animate, step back and forth, and
the traces are shown in a tree structure, including showing evaluation
of thunks. It’s an interesting proof of concept from 2001 but is kind
of primitive. I decided not to try to include a screenshot of a sample
run because the Java Swing app’s output scrolls way to the right in
its window and is not really customizable.
Hoed
Hoed
has an API based on hood
but
is a much more modern, sophisticated, active project. It enables
“algorithmic debugging” by providing an interactive Web app where it
uses a tree structure to repeatedly prompts you for whether results
were correct, to narrows down and identify the source of the error
based on your feedback.
Furthermore, it comes built in with support for hooking up to QuickCheck for property-based debugging. Check out the documentation and screen shots.
Hoed
launches a local Web server you can access with a browser
at port 10000. The GitHub repo contains a
lot of
examples.
Unfortunately, I ran out of time to get it working the way I wanted to
show here in the context of using the interactive debugging system, so
I’ll just have to say, Hoed
is very interesting and I plan to get
more into it when I have time. I will update this post later.
Conclusion
I haven’t used debugging tools much in recent years, but this could
change as I find ways to make use of new tools that are easy to
use. hood
-based systems seem useful as one way to collect
information during execution without radically restructuring code.
All the code
All my code for my article series are at this GitHub repo.