---
title: "Haskell Monads for Java Developers"
date: 2012-11-04 12:00:00
comments: true
categories: ['Programming']
publish: false
---
# Haskell Monads For Java Developers
Last updated: 2010-04-15
## Introduction
I mostly program in c/c++/java/perl/ruby. Just for fun, I decided to learn a little Haskell programming, and found it
interesting and challenging but not mind-bending - except for Monads. For some reason, the way Monads are described
in all the standard documentation just made my head spin.
After much pondering, I think I've got it figured out - and it isn't so hard after all. Below is an explanation aimed
at object-oriented programmers in general (with examples in Java).
## A Monad is a Design Pattern
A monad is simply a "design pattern". However the way the design pattern is done is clever, and
gives a number of useful features.
There is nothing actually built-in to Haskell's compiler or interpreter for monads; they can
be written using normal Haskell syntax. However Haskell has added some syntactic sugar (the
"do" keyword) to make writing code that interacts with something that obeys this design pattern
easier.
## What is a Design Pattern?
Hopefully you already know about design patterns; they were one of the big topics of the 90s, and now provide
one of the standard ways to describe code.
Examples of design patterns are:
* Singleton
* Factory
* Decorator
* Proxy
See for further information.
## What is the Monad Design Pattern
This design pattern enables:
* enforced "labelling" of all code that interacts with a particular monad (esp. for IO)
* modular code, ie data structures with private members
* implementing new "flow control" patterns
* support for easily cloning the state of an object
Although "monad" is simply a design pattern, there is in fact a class in Haskell's
standard library called Monad, and all the standard classes that implement the monad pattern
*do* subclass the Monad class. If you write your own code that uses the monad pattern, it is not
absolutely necessary to subclass Monad, but it is recommended - for readability if nothing else.
People expect code that obeys the monad pattern to subclass Monad. In addition, there are some
helper functions in Haskell that work with subclasses of Monad, and the convenient "do" syntax
also works only for types that subclass Monad.
While some design patterns (eg model-view-controller) can be useful in widely different languages,
some other design patterns are useful only in languages that are similar. The monad design pattern
is used in many functional languages, but is not applicable to object-oriented languages as they
usually already have built-in mechanisms for private data, and use state *mutation* rather than
cloning. The "control flow" feature is something that could be useful in Java, .Net, etc. And
in fact the "strategy" or "template" design patterns which are often used in object-oriented
languages have a similar purpose.
Normally in Haskell, data structures are all public, and any function passed a data-structure can
access any part of it. In addition Haskell is extremely strictly typed, so that structure is
exposed in the type signature of any function that accesses it. The key to understanding monads
is that this pattern uses a trick to deliberately restrict access to a datastructure by requiring
that operations on a monad return a monad.
Here's an analogy: you are president of a country, and are very rich, but want to reassure voters
that your political decisions will not be influenced by personal profit. So you invest your
money through a "blind trust". At your first investment, you deposit $100k, and get back a sealed
envelope. A month later you take the envelope back and deposit another $100k; you get back another
sealed envelope.
envelope into another room and shortly returns with another envelope that he gives to you. After many
years of this, you want to take a holiday so you return with the envelope, and ask him to sell all
your food-related investments; again he disappears into a back room, and returns with a new envelope
and $4300. The information about all the investments *is* inside the envelope, but you can't see it
because of the simple rule that any operation requires that the results are returned in a sealed
envelope. Any algorithms and data hidden inside the envelope
The monad pattern works the same; you can wrap an original value up as a monad (the original investment).
And you can combine two monads. But the output is always a monad. A monad can also *choose* to return
selected pieces of data via non-monad operations (the value of investments sold). But the actual
internal representation is hidden by the simple trick of always returning the result in a monad (envelope)
form.
## Monads As Java
~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~
## Monads Are Still Functional
A monad is a value in Haskell like any other primitive value or data-structure. It can be passed as a
parameter to a function, and is handled lazily - ie the "<-" operator is applied to the monad (triggering
execution of the procedural code) only if the returned value is actually needed.
Therefore given
~~~~~~~
a = some-monad
f a b = b+1
~~~~~~~
the monad instance is passed to function f, but as the parameter is never used, the monad is never invoked;
if the monad contains statements that print to the screen, then no output is generated. However if the monad
is ever invoked, then all the internal statements within the monad are guaranteed to be executed in the
originally-specified order.
## Monads Support Modular Code
A monad can be used as "an environment in which to run user-provided code blocks":
* The environment can contain hidden read-only variables (eg state) which are not accessable to
code not being executed inside the environment.
* When multiple user-provided code blocks are given, the enviroment can sequence the calls to
those code blocks (flow control).
A very rough Java analogue might be the ClassLoader; it controls what static vars user code sees,
etc. Or perhaps java.lang.Thread, which executes user-provided code in the form of a Runnable object.
Imagine something that combines the features of these two, then put it on steroids.
## Monads Support Labelling of Code
Because data can be hidden from users of a monad, but exposed to code running "inside" the monad
means that "one-way" monads can be created that can process data but never expose it to the
outside world.
This is used for input and output operations in that non-functional code can be passed data, but
the data can never be extracted outside the monad. IO uses this to ensure that any function which
does non-functional IO is clearly "tainted" in the compile-time type system; any code that access
an IO function must mark itself as an IO function or it won't compile.
The fact that data can be hidden allows:
* simplified handling of "state" data structures. All datastructures in functional programs are
immutable, and "changes" are handled by creating new copies with the necessary changes. But passing
this data around as parameters gets tiring quickly; instead the data can be attached as hidden
data to an "enclosing monad". It is still copied rather than modified-in-place, but the copying
is taken care of by the monad environment.
## Monads Can Define Flow Control
In Java, c, c++ etc we are used to having a hard-wired set of control structures: if/then/else, for,
while/do, try/catch etc. But these aren't always sufficient.
It would often be nice to be able to build a list of statements to execute, and then pass that list
to some "executor module" that figures out which ones to execute, based on whatever rules it likes.
The simplest rule would be: execute each one in order, and return the output of the last function.
Another interesting rule would be: execute each one in order but if an exception is thrown then
execute a special "handler" function and return its output.
This kind of thing *is* possible to do in c++, Java via anonymous subclasses of java.lang.Runnable, etc
It is even possible in C by using function-pointers. Executing functions in this way can be thought of
as some variant of the "Strategy" pattern. But it is so inconvenient that it is rarely worth-while.
The monad design pattern has an API which makes implementing new flow-control patterns possible.
And because Haskell supports lambda-expressions, this kind of thing is fairly elegant to use. In fact,
Haskell leaves out all the basic control-structures from the language, and leaves it up to Monads in
the standard library to provide these. Different Monad types provide different flow-control functionality.
Monads are in most cases just normal Haskell code which are so widely used that they are part of the
standard library. You can define your own Monads if you wish, in order to apply clever new ways of
sequencing calls to a list of functions.
One of the most useful "execution strategies" is simply executing a list of expressions in the order that
the programmer defined them, and returning the result of the last function in the list. Well, actually in
functional languages it turns out that this is NOT something that is so commonly needed. In general, it is
better to just write your logic and let the Haskell compiler figure out what is needed when. But there is
one case where order is critical: when interacting with users or the operating-system. Users like seeing
their prompts come out in order, and operating systems like calls to open a file to come before calls to
write to the file, etc. Therefore all such interactions are actually done by creating an IO monad instance
(ie an object that implements the "IO execution strategy"), adding expressions to its list of
expressions-to-execute, then asking the IO object to execute the expression list.
Actually, a monad does not have a "list" of expressions to execute; instead it has a maximum of two.
The case where a monad has just one expression is simple; when evaluated it returns the result of
evaluating that expression. Often, the expression is just a literal data-value. In the case where
there are two, the second expression is expected to return a monad instance (which contains either
one or two expressions). This allows monad expression to "chain", which is effectively a list. In
practice, the subsequent expression usually returns the same type of monad. So when we talk about
a monad that just "executes the list in order", we actually mean that monads of that type always
execute their first child, then their second (if there is one) which returns a monad. That returned
monad is then executed, and so the sequential processing continues.
Because a monad can wrap a simple value, it can also be thought of as a "container" or "proxy" for
some value. This behaviour is useful as a "marker" for datavalues; the IO monad uses this for
example, to make sure that the wrapped data is never actually extractable except by another IO monad.
For some reason I haven't quite figured out, many haskell standard functions that use monads define the
result of the function as a monad-wrapper around some type rather than just the plain result. Maybe this
is so that these functions can be chained together without too many extra "return" calls needed to wrap
them up again? Examples are the "sequence" function, whose result is `m [a]` rather than `[a]`.
A monad instance (containing a list of expressions to execute) can be passed around just like any other
data-value, and lazy-evaluation also applies; it won't execute its internal list until something tries to
retrieve the output value of the object.
Because a monad instance returns a value when executed (might be the return value of the last expression
on its internal list, or something else depending on the control-flow that the monad implements), a monad
can also be thought of as a "wrapper" or "container" for a value. This can be useful in itself, as the
monad can control which parts (if any) of the data-fields in the monad can be retrieved. The IO monad
uses this to hide *all* the contents, while the State monad uses this to hide the state but expose the
result of computation on that state.
The Haskell language defines an external API that *all* monads must implement:
* one or more constructors
* a "return" operator which "wraps" the specified value up as a monad with just one child. This is rather
like a factory-method for the enclosing monad type.
* a "bind" operator which takes an existing monad, and an expression, and returns a new monad with the
expression added to its internal list.
When writing a class method, a function name is first looked up as a method on the enclosing type. Therefore
when a class defines ">>=" and "return", any references to those methods resolve to the ones in the enclosing
class. ??? And when doing a bind, this lookup also applies, so in lambda-expressions, >>= and return are
the ones from the class doing the binding. Interesting scope-resolution rules!!!
Note that when we said above "add expressions to the monad's internal list", that would involve *mutating*
a value, which is not functional-programming style. Instead, the correct functional way is to create a new
object with the new (extended) list. Yes, it would probably also be possible to build a monad by passing
the completed list straight-up at the start, but that is less flexible.
Various monad subclasses then can add more methods to access extra features of that particular monad. For
example:
* the *type class* Monad adds ">>" and "fail".
* the type-class Functor adds fmap.
Usually, the expressions in the list are not independent; instead, an expression uses the results of
preceding expressions. Therefore each expression in the list is expected to take at least one parameter
(the result of the preceding expression). And for some reason that isn't entirely clear to me, the
expression is required to return a *monad* rather than just the raw result value. Well, actually there
is a reason: the execution strategy really just applies to pairs of expressions, not lists. It is
possible to change strategies as the list of expressions executes. So each expression effectively
returns a value AND a strategy for executing the following expression. While extremely flexible, this
isn't often used so the convenient "do" syntax doesn't support this; it automatically wraps the result
of each expression in the same monad type.
Remember that haskell is lazy, ie values are only computed when needed. So when any monad is asked for its value,
it really needs to first say "is my predecessor computed? If not, compute it. Now run my own computation using
that input value, and return the result to the caller". So a monad really needs to point to the *last* expression
in a sequence, with each expression holding a pointer to its predecessor. This explains why the >>= syntax looks
the way it does.
Note that invoking the "return" method on a monad does NOT necessarily prevent the following expressions from
being executed; it just calls a method on the enclosing monad. In fact, strangely, "return" is effectively
a *factory method* for the monad which is quite a different meaning from c/java/etc. All that the standard monad
implementations do is wrap the argument up in a new monad instance, then keep on going. Monads generally
give back to the caller whatever value is the result of the last expression in their list, regardless
of any "return" calls that may have occurred on the way. In fact, quite often "return" is used as the first
expression in the expression-list:
~~~~~~~
foo :: x -> SomeMonad y
foo a = (return a) >>= op1 >> op2
~~~~~~~
The >>= operator is expecting a monad instance on the left-hand-side, not a literal data-value, so return is
needed to wrap it.
When using the do-syntax, an expression with "<-" automatically gets an appropriate return call wrapped around
it, so most expressions do not need to call return explicitly. However the very last expression in the list
does need it.
The way that the base Monad class is defined, instances can be *created* (via "return"), but there is no
guarantee that data can ever be extracted again. This allows the creation of "one-way" monads, where no
data can ever escape. The IO monad uses this; the only values returned from an IO monad are other IO
monads. This isolates the functional parts of haskell from non-functional code, rather like "tainting" in
perl. Of course specific monads can *choose* to allow data to escape if they wish, and those who obey
the functional principles (eg the monads Maybe and List) do provide for this.
In order for the "do" syntax to work correctly, the monad functions "return" and ">>=" need to obey certain
restrictions. These do limit the kinds of flow-control that can be implemented. And the fact that each monad
only really has a first and second child, rather than access to the whole expression list, also limits the
kinds of flow-control that can be implemented. But in practice, what is supported is enough.
Other useful flow-control behaviours:
* parsing: repeat the first expression as long as it succeeds. Only when it fails, execute the next expr.
## The Maybe datatype
The Maybe monad just takes advantage of the "flow control" part of the monad design pattern.
A Maybe monad:
* Has two constructors Nothing and "Just x".
* Nothing means "unknown or uncomputable". Just is a wrapper for a real value.
* The bind operator:
+ takes a Maybe instance which wraps an x
+ takes a function which takes an x returns a Maybe.
+ returns a Maybe which is a "compound" of the two.
* The return operator wraps the value as a Just(x), ie makes the result be of type Maybe.
* The "execution strategy" of the Maybe monad is to execute the first expression, and if the result is Nothing
then just return Nothing immediately (skipping all the others). If the result is something else
(ie is "Just x") then execute the next expression, passing x as the first argument.
Note that it "unwraps" the internal value before passing it to the next expression in the chain. In other words, the
bind operator takes an expression of type "x -> Maybe y", not "Maybe x -> Maybe y". It can unwrap the input arg
because it never *calls* the next expr in the chain if the value is Nothing. But it requires the called expression
to return a Maybe, and not a raw value, because the called code might want to return the answer "Nothing".
In Java, the value "null" is often used to emulate the "unknown" value. For example, when
doing a lookup in a Map, if the key matches nothing then null is returned. But sometimes
null is also a valid answer, in which case some kind of hack is used to return the "no
match" result. Maybe is a more elegant answer for this kind of problem. The Maybe monad
also has this nice "short circuit" behaviour.
~~~~~~~
class Result {public boolean ok; public String val;}
String calculateSomething(int arg1) {
Result r;
r = step1(arg1);
if (!r.ok) return null;
r = step2(r.val);
if (!r.ok) return null;
r = step3(r.val);
if (!r.ok) return null;
return r.val;
}
~~~~~~~
Data can be extracted out of a Maybe monad via pattern-matching:
~~~~~~~
showmaybe Nothing = "is unknown"
showmaybe (Just x) = "is " ++ show x
~~~~~~~
## The List datatype
The List monad just takes advantage of the "flow control" part of the monad design pattern.
The List constructor is a special-casetype represents the result of computations that might return 0..n values.
The return method takes an element, and returns a list containing that element (ie a list of size 1). I guess you
can think of this as wrapping the single data element in a List monad, like Maybe does.
The execution strategy for this monad is to apply its function argument to each value in the list in turn.
The result *of each call* is another list, though possibly of a different type. The returned list may be of
any length. The resulting lists are then concatenated together (flattened).
Note that although the resulting list must be in the same order as the original, ie the result of applying
the function to element 0 of the input list must come before the result of applying the function to element 1
of the input list, the computations are not *dependent*. Therefore, the computation on each element of the
list can potentially be executed in parallel, ie for a list of 5 elements, and 5 CPUs, the result can be
computed fairly quickly.
When chaining multiple expressions in a list monad, things get interesting. It is like the effect of
~~~~~~~
results = new List();
l1 = getList1();
foreach e1 in l1
l2 = getList2(e1);
foreach e2 in l2
l3 = getList3(e2);
result.append(l3);
~~~~~~~
An example is walking directories on a filesystem.
/*
Note also that the result should be computed *lazily*, ie when asked for the first element of the result
list, the function should only be applied to the input list while the result list is empty. Once a result
has been returned, computation should suspend and the "result so far" returned to the caller.
The List monad therefore needs to choose quite how to implement its processing internally, depending on
whether running on a small or large computer, on a small or large dataset, etc. This is one of the nice
things about a monad: as long as the external API promised by the ">>=" operator for list is obeyed,
the internal execution strategy can be tweaked either at compile-time or even at runtime.
*/
## The State Datatype
In Maybe and List, the "return" statement doesn't do much, because they are both effectively "functional" from
the outside.
In any computation that needs access to state, you encapsulate that state in some object with lots of fields,
and pass that as an input parameter to functions that need it.
But some of those functions may affect the state. It is possible to model this in pure functional programming,
simply by having every function that modifies the state return an updated copy of that state, and pass the
*updated* value into the next function. However this can get rather ugly.
In fact, this is exactly what the State monad does. However it hides the state data-structure from most of
the code, making it a field of the monad itself so the individual expressions can mostly ignore it. When
the expressions do want to access the state, they can call the "get" and "put" functions. The get function
is obviously still functional. The "put" function is also functional; it creates a new copy of the state
with the specified change in it. And the "return" function creates a new monad which has a copy of the
current enclosing monad's state, so that when the "returned" monad is later invoked, it can continue
where it last left off.
The ">>=" binding operator can be thought of as "propagating state" at the same time as executing the
two expressions. Any state changes made by the first expression are implicitly available to the second
bound expression in the monad pair, because a *copy* of the state has been passed as part of the
execution of the ">>=" method. This is one important reason why each expression must return a *monad*,
and not just a plain data-value; the monad provides places to attach hidden data.
The result is that there is just one thing for the calling code to handle: the fact that the "stateful"
object is now of type "State a" rather than just "a". Otherwise, the details of the state are now
irrelevant to the caller.
Note that although this datatype is called "State", it is still fully functional, and does not involve
modifying data "in place" at any time. It simply automates the process of creating a slightly-altered
copy of the earlier state when needed, and setting it as the "default state" variable used for all
subsequent function calls.
Parser implementations for various datafiles (whether languages, config files, etc) are implemented
like the State monad, so that the caller doesn't need to care about the internal state variables.
## The IO datatype
Datatype IO is of class Monad.
Its implementation of ">>=" returns a compound monad that, when evaluated, executes its child
monad members in the original order. So the IO monad uses the "control flow" feature of the
monad pattern.
And just as importantly, its "return" operation creates an IO instance which gives absolutely no way
for code outside an IO monad to access the wrapped data, ie the IO monad is a "one way" dataflow, where
data can be passed in from the functional world, but never passed back. This ensures that the calling code is
never tainted by data which is "indeterminate", ie which could be different on each call. Simply by looking at the
type-signature of a function, you can tell if it is purely functional or not - a function which is pure can *never*
have any argument of type IO. And a function that does not have an argument of type IO can never invoke an IO function.
Note that this is not the case with all monads; some are "pure" in the sense that there output is always just a transformation of
their input (even though the monad may not obey other functional rules). In these monads, it is allowed to extract
data from the monad.
All of the standard-library input and output functions are defined as methods of the IO monad. Therefore they
can only be invoked from code that is executed via the IO monad (ie the code to be invoked must be written as
an expression that is then itself wrapped in an IO monad). And because of the above "one-way" data flow, the
"tainted, non-functional" result of an IO call can never be mixed with pure functional code without causing
a compile error.
When interacting with the user, input and output order is critical. Therefore all input and output should be performed from within a
monad. In other words, if user-interaction X must occur before interaction Y, then both X and Y should be inside the same monad to
ensure that their ordering is correct. The same constraint occurs for accessing the file-system, creating GUI windows etc.
In addition, data that is retrieved from the system (whether user or operating-system) is "unrepeatable", and therefore
leads to violations of the functional rules which state that the same function given the same input should always
return the same result. The IO monad "taints" all results from IO calls, so that code which is "pure" and code which
is not pure are clearly separated and don't accidentally mix.
Note that the name "IO" for this monad is perhaps a little misleading. There is nothing specifically about input or output;
it is simply a monad class that executes its nested statements in order, and keeps the results from escaping. However when
performing user (or filesystem) interaction, the "ordering" feature is absolutely necessary to get sane results, and the
second is needed to keep the sanity of the programmer. So all interaction ends up wrapped in an IO monad.
Q: is there any type of computation where ordering is critical, which is not input/output? Probably not; computation
either affects ram or something else, and the something else is then IO.
The IO datatype can be thought of as a kind of "proxy" or "wrapper" or "container" that acts similar to the following Java code:
~~~~~~~~~~~~~
class IO {
static IO return_(String literalVal) {
return new IOValue(literalVal);
}
static IO bind(IO first, expr-returning-String) {
return new IO(first, newProxy(expr));
}
IO apply(args) {
val x = first.apply(args); // returns an IO
val y = second.apply(args, unwrap(x)); // second *must* return an IO
return y;
}
}
~~~~~~~~~~~~~
Therefore the statements inside the monad are executed *in their originally-specified order*.
Within an IO method, calls can be made to normal haskell functions; as soon as the function starts evaluating we are back in the normal
haskell world where evaluation order is undefined.
To build one of these monads (proxies), the "binding" operators >> or >>= are used. These operators can be used to wrap a single
expression. They can also be used to "append" a normal expression to some other monad, returning a new monad that will run the
first monad followed by the new expression. In fact, if a sequence of N expressions should run as a single monad, then N "binding"
operations are needed, each time appending one expression to the monad representing the preceding instructions. Another way to
view this is that a monad containing N expressions is in fact internally a tree or chain of monads; a "simple" monad contains one
expression while a "compound" monad contains two child monads that it runs in order.
The `>>=` operator's type is: `>>= :: m a -> (a -> m b) -> m b`
All this initially scary description means is that given
* a Monad that returns a value of type a each time it is invoked (ie a procedure returning type A, wrapped as a Monad)
* another statement that takes a param of type a and returns a value of type b [1].
A "compound" Monad is created which holds references to both of the above. What it does with them is then up to the
definition of the ">>=" operator. In the case of the IO monad, >>= runs the two expressions in order, waiting for the
first to finish then passing its return value as one of the params to the second expression.
[1]: actually, the return value must be of type monad-returning-b, which is done by writing "return b" rather than just "b".
See below for an explanation of the return statement.
Building the compound monad effectively "binds" a monad and a statement together, so >>= is called the "binding" operator.
The form ">>" is used when a statement does not need the result of the previous statement - usually because the previous statement
returns unit (void), eg a function to print to the screen.
Using the binding operator multiple times is rather clumsy; therefore there is a convenient syntactic sugar called a "do expression" which
has exactly the same effect.
A "do expression" returns a value, like any haskell function. The value it returns must be a Monad (proxy), eg "IO String" or similar.
Therefore the last statement executed in a do-block must return a monad object, not a plain value. This can be done via the
return function:
~~~~~~~
-- normal function returning a string
getstr :: String
getstr = "hello"
-- normal function returning a monad-proxy wrapping a string.
getstrio :: IO String
getstrio = do return "hello"
~~~~~~~