Catamorphisms
Up to this point I created various articles about fold
, in my Series I
also created a category named Fold (Catamorphisms) but up till now I didn't explained
how this articles related to each other, or what Catamorphisms mean. In this article
I want to talk about the remaining parts.
Table of Content
The List
Catamorphisms is a generalization that emerged from the list datastructure. The list datastructure, how it is found in functional programming, is usually build as a single linked list. Or to be more precise, it is build as a recursive data type expressed as a Discriminated Union. That is the reason why Algebraic DataTypes is the very first entry.
Catamorphisms is the idea that we also implement fold
and foldBack
functions for
other discriminated unions besides list. Because of this it is important to first
understand how to define datatypes, especially recursive discriminated unions.
To get a better understanding of the concept, this time we implement our own list type.
1: 2: 3: 

I also create additionally constructor functions for each case:
1: 2: 

Cons
this dates back to Lisp. For example
in Racket (a Lisp dialect) you can
build a list in such way.
1:


with the helper functions we defined in F# it almost looks the same.
1:


As soon we have any kind of discriminated union, working with such a type follows
a straight pattern. Usually we create a function that matches on our type, and
we must provide code for every case we have. In our list case that means
we must match on the Empty
case and on the Cons(h,t)
case and do something
with every case.
But the Cons
case is special, because it is recursive. So how do we work
with it? We just write a recursive function that recurs! Once you notice
this pattern, writing any kind of function for a recursive discriminated union
becomes easy. First, let's define some example data that we will use from now on:
1: 2: 3: 

And our first example function listLength'
1: 2: 3: 4: 5: 6: 7: 8: 

listLength'
returns the amount of elements in our list. We just need to handle
both cases to achieve that. If we have an Empty
case, the length is obvious,
then we have zero elements. If we have Cons
then we have one element plus
the amount of elements of the remaining list. So we call listLength' t
to
get it.
1: 2: 3: 4: 5: 6: 

function
is a shortcut. Instead of defining the last argument
and Pattern Match on it. We can directly use function
that does the same.
This way we can omit the argument, and the match
line.
listSum'
is just a simple sum function that adds a list of int
together.
Probably at this time you start to see that listLength'
and listSum'
are
very similar. But let's create some more examples.
1: 2: 3: 4: 5: 6: 7: 

If you are not used to recursion then this looks a little bit more complicated, but it
is still the same. We expect that map
runs a function on every element. So what do
we do we do with an empty list? We just return empty. Otherwise we have a single
element h
and another list t
. In that case we just call (f h)
to transform
our h
element, and how do we transform the remaining list t
? With listMap'
,
we only need to cons
the result of both function calls.
In the next function we want to append an element to a list. Just think for a
moment for yourself how you achieve that. The answer: In the Cons
case we do nothing, as
this is not the end of the list. Instead we transform an Empty
with our element
appended.
1: 2: 3: 4: 5: 6: 

Okay, at this point we have enough examples. When we look at our examples, how do we work with a discriminated union in general? All examples have one recurring pattern that we do all over again.
 We must pattern match on every case of the discriminated union.
 In a nonrecursive case we just do whatever needs to be done.

In a recursive case like
Cons
we have two datafields.h
andt
. We just work withh
however we need, exactly like a nonrecursive case. For the recursive datumt
we just call our function again and recurs.
This is a general pattern how we can work with any discriminated union and provide any kind of transformation for it. But there can be two problems with this approach:
 The function feels repetitive, or more important, always rewriting the whole recursion logic feels not like Don'tRepeatYourself.
 None of the functions we have, are tailrecursive.
Let's address those problems separately.
Introducing Cata
We already identified our Pattern, so what we usually do is to create a cata
function
that abstract those repetition. To describe the repetition in one sentence: A cata
function abstracts the recursion over a datastructure.
We handle the recursion inside of cata
. cata
then expects a function to handle
every case. New functions can then be created out of cata
.
Our first version of cata
could look like this.
1: 2: 3: 

Before we look closer in how it works, let's see how we can create a new length function
defined with cata
instead.
1: 2: 3: 4: 5: 

listCata'
just expects two functions. The first functions handles the Empty
case,
and the second functions handles the Cons
case. But which arguments do we pass those function
exactly?
We pass the data that is attached to every case to the provided function. As the Empty
case
don't contain any data, we just call fEmpty ()
and pass it the unit value.
The Cons
case contains two datums. It contains the head and the tail element. But we
do not pass the tail element directly. Just think about it for a minute. The purpose of
the cata
function is to abstract the recursion, so the function passed to cata
don't need to handle the recursion. If we would pass t
directly to fCons
then
fCons
again would need to handle the recursion. Instead of passing t
, we pass the
result of the recursive call.
If this transformation looks strange. Actually we have written this kind of code multiple
times already. Let's look again at the first listLength'
and lets think how we could
transform listLength'
to the more abstract listCata'
. When we look at the Cons
line
It looked like this:
1:


At first we can treat +
just as a function. Instead of writing it infix between two arguments
we also can write it prefix before its arguments. Then it looks like a normal function call.
1:


But in our listCata'
function we don't want to calculate (+)
, a hardcoded function, we
want to execute the function the user provided, so we write:
1:


Additionally, we don't want to pass 1
. 1
was the replacement for h
for the length function.
In the abstracted version we just pass h
to fCons
and fCons
decide what to do
with h
. And the last thing, our function is named listCata'
so we need to recurs on
listCata'
not listLength'
. So we end with:
1:


Now let's improve listCata'
step by step. In the list example this isn't so obvious,
but as we see later when we have other discriminated unions with much more cases and a lot
more recursive cases, then calling a cata
function can become annoying. We can fix that by
creating a partial applied function before we match
.
1: 2: 3: 4: 5: 

In the case of a list this isn't a big improvement, we also cannot use the function
keyword anymore.
But usually it is a good idea and it makes the code a little bit cleaner, especially when we create
a cata
function for more complicated discriminated unions.
A second improvement. Actually functions that take unit
as a value are bad! A pure function that
expects unit
always only can return the exact same value when it is called. F# is not a
purefunctional language, so theoretically fEmpty
could do some kind of sideeffects and always
return something different. But, I don't encourage such things. A cata
function is really
the idea to transform a datastructure, there shouldn't be sideeffects in it. So instead
of a function, we just expect a direct value that should be used for the Empty
case. On top,
I name the return type 'State
. This is just a generictype, but by having a more descriptive
name as 'a
, 'b
and so on, it can help in understanding the function signatures. Now our
third listCata'''
version looks like this:
1: 2: 3: 4: 5: 

When we look at the typesignature of our function the typesignature should look familiar!
1:


It is nearly the same as foldBack
! The only difference is that the arguments are in another
order. Let's compare it with the signature of List.foldBack
:
1:


It just expects the fCons
function first, here named folder
, then the list to operator
on, and finally the value for the empty case, here just named state
. So let's also
do this kind of reorder, and finally we end up with our final listCata
function.
1: 2: 3: 4: 

Usually, i wouldn't do such a reorder for a cata
function. We also lost the ability
to use partial application with the recurs
function. But because our list type is anyway
so small, the reorder doesn't hurt much. Here, it is more an example to show more clearly
the relation that cata
always has the same behaviour as foldBack
.
foldBack
not like
fold
! I will later go more deeply into this topic and show how fold
and foldBack
differ, and why that difference is important.
Our goal why we created a cata
function was that we have an abstraction, instead of writing
functions that do the recursion all over by themselves, we now can use listCata
that abstract
this kind of thing for us. Now, we also should use listCata
and rewrite all our functions
we created so far by using our abstraction. Here are the final list functions rewritten with
listCata
instead.
1: 2: 3: 4: 

And some examples to see that they work like expected:
1: 2: 3: 4: 5: 6: 7: 8: 9: 

Let's summarize what we have done so far:
 We usually start with a recursively defined discriminated union.
 When we work with such a type, we need to write recursive functions.

Instead of writing functions with recursion directly, we create a
cata
function that abstracts the recursion for us.  The
cata
function expects a function for every case.  Cases without data can be simple values instead of functions.
 We just pass all data associated with that case to the correct function.

We don't pass a datum that is recursive to the functions. Those datum must be
first passed to
cata
itself.  The behaviour of
cata
is the same asfoldBack
. cata
is not tailrecursive.
Tail Recursion with FoldBack
At this point we should ask ourselves if we really need tailrecursion. The answer is not always yes. We really should think of the usecases we have, and what kind of datastructure we defined. And in the most cases, the answer is No.
It is important to understand that we only run into problems with recursion when we have a datastructure with a linear depth. In our list example, this is the case. What does it mean exactly? It means that it is pretty normal to have very deep recursion, usually a case where every additional element increases the depth by one.
For a single linked list this is the case. A list with 10,000 elements will also
create a stack depth of 10,000. So it is important to create tailrecursive
functions. But does that mean all the work on cata
was wasted?
Absolutely not. We just take cata
as the starting point and we just try to make
cata
tailrecursive. The tail recursive version then is what we call foldBack
.
And this leads to the next articles in my series. Converting functions into tailrecursive functions is a task on its own that needs proper explanation.
One technique that is most often used is the idea of an accumulator. Instead of doing a calculation once a function finished, we do the calculations immediately and pass the result to the next function call. I explain this conept in more detail in: From mutable loops to immutability
Another idea is to use a continuation function, I already provide two articles explaining
the ideas behind this technique. In Continuations and foldBack
I explain in deep how a tailrecursive foldBack
works. And in my article
CPS fold  fold with early exit I explain
the idea of a continuation function a second time with fold
.
But it doesn't mean both ideas are interchangeable. When we directly want to create
a tailrecursive foldBack
function then we need to use the continuationfunction
approach. We cannot create a tailrecursive foldBack
with an accumulator approach.
As I already have three articles on those topics I don't go into much further
detail, so I just provide a quick explanation. We just start with the cata
function.
In cata
we see something like this:
1:


So, the second argument to fCons
is a recursive call. Let's shortly rethink what it does.
It calls the function and it will return some kind of data, this data is then passed to
the fCons
function as the second argument. With the continuation approach we just assume
we already have that data. So we just replace every recursive call with some variable
we don't have yet.
1:


Sure, that wouldn't compile now, because we didn't define racc
anywhere, so we wrap it
inside a function, that functions then becomes racc
somewhere in the future.
1:


But that isn't anything, we still need to somehow traverse our list, and call that function.
In short, we change our previously defined cata
all in all to something like that.
Let's see the cata
and foldBack
directly to each other.
1: 2: 3: 4: 

1: 2: 3: 4: 5: 6: 

The implementation of our list functions also didn't change at all. Instead of
listCata
they just use listFoldBack
.
1: 2: 3: 4: 

Binary Trees
Up so far we explored the concept of the cata
function only with the list and when we make that
function tailrecursive we call it foldBack
. But as said before, Catamorphisms are a generalization.
That means, the concept of writing a cata
function and creating tailrecursive version out of it
should also be done for other discriminated unions, not just for a list.
For the next example we will look at a binary tree. A binary tree is quite interesting because it is very similar to a list. But let's see that in more detail:
1: 2: 3: 

Once again I also introduce some helper functions to create the cases.
1: 2: 

And a simple tree that contains the numbers 1 to 7 ordered.
1: 2: 3: 4: 

So, why is a Tree similar to a list? Because the definition is nearly the same. If you look closer
you see that a tree has two cases exactly like a list has. Leaf
marks the end exactly like
Empty
did for the list. Instead of Cons
with two datums, we have Node
with three datums.
The only difference between a list and a binary tree is that every element in a list only has one
child, while a binary tree has two child's. Those child's are often named Left and Right.
That's also the reason why I named the variables l
and r
in the functions.
Cata for Tree
So, let's start by creating a cata
function for our tree.
1: 2: 3: 4: 

We started by just turning every case into a function. This time we name them fLeaf
and fNode
, after the cases. The pattern is the same. The Leaf
case has no data, so
we just call fLeaf
with the unit value ()
. We should remember that we can eliminate
the function in a next version.
The difference in the Node
case to the previous Cons
case in the list is, that we have
three datums. So fNode
will also receive three arguments. But the second and third
argument is a tree again. So before we pass those, we need to recursively call treeCata'
on those trees again.
The two recursive calls looks quite long, so we first create a partial applied recurs
function. Now we have:
1: 2: 3: 4: 5: 

Next, we eliminate the function for the leaf case.
1: 2: 3: 4: 5: 

As a tree also only has two cases, and one of them is not a function, we already see that
we already have a signature like foldBack
. So let's reorder the function arguments.
Previously I said we loose the ability for the partial applied recurs
function. But
we still can write a recurs
function. We only need to know which argument changes.
In the code above you see that we call (recurs l)
and (recurs r)
. So we only want
to pass the next tree it should work on. So we create a recurs
function that only
expects the remaining tree. All in one, we now end with:
1: 2: 3: 4: 5: 

Let's create a length
, sum
and a map
function with our cata
function.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 

Fold vs. FoldBack
Before we talk about how to turn cata
into a tailrecursive function we should talk
about the difference between fold
and foldBack
. I explained that if we turn cata
into a tailrecursive function we get back foldBack
. If I mention cata
or foldBack
i use the terms interchangeable. The fact that one is tailrecursive and the other not,
is not important right now, it is more important how they behave.
But it opens up an important question. When we forget for a moment the mechanical
implementation to create the cata
function. How do we know how to implement fold
and
foldBack
and how do we know how they should behave? Or what is anyway the exact
behaviour of fold
and foldBack
?
If the question is unclear, let's look again at a list and lets see how fold
and foldBack
behaves.
We can visualize a singlelinked list like boxes, and every box points to the next element
in the list. Until the last element points to the end Empty
. In the visualization above
represented as /
.
The functions fold
and foldBack
are also often named foldLeft
and foldRight
in
other languages. They are named like this, because they describe how a list will be
traversed. fold
(or foldLeft
) traverses a list from lefttoright, while foldBack
(or foldRight
) traverses the list from righttoleft.
So when we use fold
the function we provide fold
first sees 1
, then 2
, then 3
and so on. While when using foldBack
we first encounter 5
, then 4
, then 3
and so
on. This is easy to understand.
But how do they anyway translate to something like a binary tree? Our tree
that we used
so far looks like this:
When we think of fold
as lefttoright and foldBack
as righttoleft, how do we translate
that to a tree? The problem we have, there doesn't exists only one way to traverse a tree.
There are many way to traverse a tree, and even then the question
is which traversal we identify as left or right.
Up so far I made it easy, as I just said that cata
is foldBack
without further describing
the idea behind it why that is so. So let's relook at fold
and foldBack
for the list
and let's see if we can describe the operation slightly different.
Previously we already noticed that a list and a binary tree are very similar. We can think of
a list that contains oneelement and a one recursive argument, or onechild. A binary
tree on the other hand is oneelement and twochild's. When we visualize a tree we usually
show the deeper (recursive) layers underneath an element. In the above visualization we
have 4
and underneath it 2
and 6
. But we also can think of a list in such a way.
We have 1
and we have the recursive child 2
. Instead of thinking of traversing a list
from lefttoright or righttoleft, we look at the folderfunction, and we describe
what the folderfunction sees. So when we sum all elements in a list like this:
1:


What does the folderfunction (fun acc x > acc + x)
sees exactly? Let's say fold
is at the element 1
, which values do we have?
We have 0
and 1
. Or more precisely, we get the accumulator sofar, and oneelement
of our datastructure. What do we see when fold
is at element 2
? We get 1
and 2
.
Once again we get the accumulator so far, and the current element.
Generally speaking, with fold
we get the current element and an accumulator that
is the combination of all the things we already have seen. When fold
hits 3
,
then we get 3
and 3
. The first 3
the accumulator was computed by the already seen
elements 1
and 2
(1 + 2).
We also can think of fold
as looping. Because in looping we usually start with some
mutable initial value, and when we loop over a datastructure we combine the current element
with some outer element.
1: 2: 3: 

But when we look at foldBack
it behaves differently. When we write:
1:


We get the same result, because the order of +
operation doesn't matter. But the behaviour
is different. When our folderfunction hits the element 1
, which data, do we get?
We get 1
and 9
. 1
is the current element. But what is 9
? 9
is the result of the combination
of the child we have at this point. In foldBack
we don't get an accumulation of the things
we already have seen, we get the combination of the things we didn't have seen so far!
With foldBack
we get the current element, and the combination of all its child elements. When
we hit 2
for example, then we just see 2
and 7
. Because the child of 2
is 3 + 4
. But
we didn't see 1
, because 1
is on top of 2
.
All in one we can say that foldBack
is a structurepreserving function. We not get
the current element and an accumulation so far, we get the current element including one
value for each child. And with a tree this distinction becomes more clear. When we look again
at our tree.
Which arguments does the folderfunctions sees when we are at the top element 4
? We see
4
the current element, 6
for the left child (1 + 2 + 3) and 18
(5 + 6 + 7) for the
rightchild. In foldBack
we always get the exact same amount of arguments a case has.
The definition of Node
was Node of 'a * Tree<'a> * Tree<'a>
so we also get three
arguments in the folder function. But instead of two trees, we already get the result
of them. That means, while fold
is like iteration/looping, foldBack
is like recursion.
Consider how we would write a recursive sum
function without cata
, the Node
case
would look something like that.
1:


As (sum l)
is a function call it just starts calculating a value, but when it returns
it contains the sum of the left child. So once (sum l)
and (sum r)
completes we have
a line like x + y + z
. We just add three values together. foldBack
is exactly that
behaviour. foldBack
always works like recursion. That is why cata
is always
like foldBack
. foldBack
only ensures that we have tailrecursion.
So how does fold
for a tree look like? In fact the folderfunction only sees two
arguments not three. Why is that so? Because fold
only sees the things it already
have seen.
The Node
case only contains a single nonrecursive datum. That means the fold
function only
sees an accumulator so far and all the current nonrecursive elements. For our specified binary
tree that are only two arguments.
But how does fold
traverse a tree? The answer is, it doesn't matter. The purpose
of fold
is not to provide a specific order. The purpose of fold
is just to visit
every element. fold
is ideal for things that behave like Monoids. fold
is in general a good choice if the operation you have doesn't depend on the structure
itself only on the elements itself.
You also can compare fold
with foreach
in C#. With foreach
in C#, you just iterate
through a datastructure. You also can iterate through a dictionary, and you get the
key and value of every element, but you don't get any information of the structure
of the Dictionary itself. When you loop over a dictionary with foreach you just expect
to somehow get all the values, but you don't expect a particular order.
But if you need the additional information of the structure and somehow work with the
full tree, then you must use foldBack
. Because of that, foldBack
is more powerful than
fold
as you always can use foldBack
instead of fold
. But the reverse is not true.
FoldBack for Tree
I will implement foldBack
with the Continuation approach, but I don't go into
much detail how the implementation works exactly, you can read more of those
details here:
First, we look again at cata
.
1: 2: 3: 4: 5: 

Instead of a recursive treeCata
we will create an inner loop
function that
is used for recursion.
1: 2: 3: 4: 5: 6: 

As we now have an inner recursive loop we also need to explicitly
start the recursion with loop tree
. In the next step we expand the
Node
case and remove the nested loop
calls and put each on its own line.
1: 2: 3: 4: 5: 6: 7: 8: 9: 

Finally, we add cont
(the continuation) to the loop
function and
rename the function to treeFoldBack
.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 

Before, we had code like:
1:


it was recursive and it meant: Recurse on loop l
. Somewhere in the future
(after many more recursive calls) it will return a result that we save in lacc
.
Then we executed:
1:


It was recursive again and after many more recursive calls we got the result
and saved it in racc
. But all of this is not tailrecursive. The new
treeFoldBack
function really only has one function call.
1:


The idea of continuations is like this: Please execute loop l callback
. When
you finished calculating the result, please call the callback
function
and pass it the result. Callback or Continuation really means the same.
But in this case we call loop
that is in tail position. So we end up with a
tailrecursive function.
FoldBack examples
Instead let's focus on things we can do with foldBack
but not with fold
. As foldBack
preserves the structure, we can actually very easily convert a Tree into a string representation.
We just convert a Leaf
node into the String "Leaf"
, and a Node
will be converted with
Node(%d, %s, %s)
into a string. Because we get the string results instead of the recursive
values, this kind of task is pretty easy.
1: 2: 

treeCata
and foldBack
both return the same string:
"Node(4, Node(2, Node(1, Leaf, Leaf), Node(3, Leaf, Leaf)), Node(6, Node(5, Leaf, Leaf), Node(7, Leaf, Leaf)))"
I used the tree
variable so far as a binary search tree. That means it is ordered. The left child's are
smaller, the right child's are bigger then the current element. We also can create an ordered list from
our tree. A Leaf node must be converted into an empty list. Otherwise we just need to concat the left,
current and the right node.
1:


or any other order we like:
1: 2: 3: 

Let's turn the Tree into Lisp code. In Lisp a tree is just represented as a list with tree elements. The first element is the current node, the second and third element represent the left and right node, and are just lists themselves. The Leaf node is represented as the empty list.
1: 2: 

Let's test it in Racket.
1: 2: 3: 4: 

and in the REPL:
1: 2: 3: 4: 

Nice, this is correct. For the last example let's look again at our example tree:
Let's say we want to create a path to a specific element. For example when we search for 5
, we want
the steps to find 5
. When we start at 4
we first must go right, and then Left. So we want "Right Left"
as a result. When we search for 1
we get "Left Left" and so on.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 

In the solution I just transform every node into a tuple that contains two informations. A boolean that
contains the information if the child contains the searched element. And when it is true
the
node above prepend either "Left"
or "Right"
to the list. As an example, when we search for
5
we get the following transformations:
Fold for Tree
At last we want to look at fold
. As we learned so far, we don't need to implement a particular
tree traversal order. For fold
it is only important that we visit every node and we treat an
accumulator through the calculation. For implementing fold
we should just pick the easiest
or fastest way we can come up with.
Up so far, including the other blog posts, I showed two ways how to achieve tailrecursion. Either way through an accumulator or through a continuation function. But both ideas don't work with our tree. The problem is that we don't just have a simple calculation that we can forward as an accumulator, we always must traverse two child's for every node.
The solution to fix that is that we manage the stack ourselves. This is the typical solution how languages without proper tailcalloptimization handles recursion. But in the F# case we don't need to switch completely to looping. We just make the stack part as an additional value on the recursive inner loop function. We also could say, we use two accumulators. One for the value we computed so far, and another that keeps track of task we still need to do later.
So here is the idea. At first we identify what we actually need to do in the Node
case.
And they are three things we need to do:
 Process the current element
 Recurs on the left child
 Recurs on the right child
We should pick a order of the operation so that only one task remains open. And this task is pushed onto a stack that can be later processed. One way to achieve that is.
 We process the current element
 We put the right child onto the stack
 We loop on the left child
Let's give it a first try:
1: 2: 3: 4: 5: 6: 

Let's test it:
1:


Okay, that's not quite right, but I wrote it in this way so we can discuss what happens. This makes it
easier to understand the full solution. At first, if you find the short Node
case hard to understand,
you can expand it. The line:
1:


is the same as:
1: 2: 3: 

But all in one, here are the things that are happening:
 We enter the tree.
 We process element
4
, by printing it(folder acc x)
 The right child of
4
is added to the stackr :: stack
 We loop on the leftchild
 We process element
2
, by printing it  The right child of
2
is added to the stack  We loop on the leftchild
 We process element
1
, by printing it  The right child of
1
is added to the stack (a Leaf)  We loop on the leftchild
 We hit a Leaf, and we return
acc
.
So what are we missing? Sure, we forgot to process the rightchild's. We put them all
onto the stack, but we never look at them. So what we need to do is to extend the Leaf
case.
Instead of immediately returning acc
, we first need to check if there are pending
trees in the stack
. If yes, we need to loop on those. Only if the stack
is empty
we can return acc
. So our final treeFold
looks like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 

Now we get all numbers printed.
1:


And if you noticed, this kind of treetraversal is a preorder tree traversal. But it is only preorder
traversal by accident. I just thought of an easy way to traverse so we need to put as few things as possible
onto the stack
variable. Someone should not expect a specific tree traversal order for fold
.
Some Benchmarking
I think some simple benchmarks are quite good. At first, we always talk about tailrecursion and I provided ways to achieve it, but never looked at performance. And there are two things to say here.
First, tailrecursion is not about performance. Yes, sometimes it also improves performance, but the main point of tailrecursion is that we don't end up with stack overflows. We first care for correctness, only then comes speed. If you think otherwise, then answer me the following question: What exactly do we get out of a function that is theoretically fast, but practically we cannot execute it because it crashes with a stack overflow? So the main point of tailrecursion is Correctness, that it is sometimes faster is more an additional benefit.
Second, tailrecursion with a continuation approach how i used it with foldBack
is usually
slower than pure recursion!
All of those are important. At first, you shouldn't implement tailrecursion just because someone
told you it is better or faster. It not only can be slower, it also can be harder to understand
and to maintain. So you really should ask yourself if you really need tailrecursion. And
for a binary tree this question is already legit. If you create a binary tree that always balance
itself, then you will less likely run into any kind of problems with a non tailrecursive cata
function.
With a stack depth of 1 you can handle two values (empty and one value), with a stack depth
of 2 you can handle 4 values. 3 is already 8 values. So the amount of values doubles by just
increasing the depth by one. With a stack depth of 32, what is just peanuts, you already
can handle 4.294.967.296 values. Before you run into problems with the stack depth you run
into completely other problems! Even if you just save 32bit integers just the integers alone
already need 16 GiB of memory, and that does not include the Node(x,l,r)
objects that also
consumes memory. So you should ask yourself if you really need a fold
or foldback
function
that are harder to develop and probably even can be slower!
But let's see some benchmarks. First I create some helper functions for the creation of some trees.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 

So let's create some small trees with 10K nodes.
1: 2: 

Those trees are not balanced, but they can still be handled by cata
on my machine. But just benchmarking
one call is still too fast, so I create a bench
function that calls some code a specific amount of time.
1: 2: 3: 4: 5: 6: 

So, when we sum up every 10.000 nodes with treeCata
and do that 10.000 times, which timings do we
get for treeCata
and foldBack
? I run every bench
line twice.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 

So overall, the foldBack
result are disastrous. At first, creating a lot of continuation
functions creates a lot of garbage, all those closure functions needs to be managed on
the heap. As a result the garbage collector runs quite often. 1600 gen0 and 1600 gen1 cleanups!
Overall foldBack
is tailrecursive, but takes the double of time to finish compared
to the cata
functions. On top, the cata
functions trigger not a single garbage collection
cleanup, that is probably also the reason why they are faster.
But we also should consider treeFold
. Actually just summing up the nodes is also a task
that can be done by treeFold
, so how fast is treeFold
?
1: 2: 3: 4: 5: 6: 7: 

Those results are quite interesting. They are faster as treeCata
and treeFoldback
. I
expected that it is faster as foldBack
, because just handling a stack of trees should
be way more efficient as handling a lot of closure functions. But even the fact that
they still trigger quite a lot of garbage cleanups, it is still nearly twice as fast
as the cata
function! By the way, we can even get the amount of GCs down. Actually
there is no point in using an immutable stack. We also can use an mutable stack
in fold
. This makes the implementation a little bit harder again.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 

Let's see how it compares to treeFold
1: 2: 3: 4: 5: 6: 7: 

Also, this is not what I expected, it becomes slower to the speed of the cata
function. And
in the smallL
case it still triggers a lot of GC cleanups. On a tree only with right nodes
we don't have any cleanups because we don't save anything in the stack. It is still quite interesting
to see that the timing with and without GC cleanups are the same. So it seems the GC cleanups are
so fast that they overall don't matter at all for the overall timing. At least on my machine.
So, how are the timings for really big but balanced trees? A balanced tree with a depth of 25 contains 33.554.431 entries. How are the timings here?
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 

All in one, the cata
function overall is usually the easiest to implement, in its performance it is
also quite good, at least better as a naive implementation with continuation functions. Because
most memory get handled by the stack, it also don't causes garbage collection.
An implementation with a mutable stack also can be efficient in terms of garbage collection, but at least on my machine it is still slower compared to the pure recursive version.
As an overall result you shouldn't abandon the cata
function, just because you fear that
non tailrecursive functions are automatically slower. Usually they are very easy to implement
and the speed is quite good. Instead you should consider if you expect problems with the stack depth.
When you create balanced trees this is quite uncommon that you run into problems with the stack depth.
But with a type like a list that has linear recursion, where every element increases the stack depth by one, you should consider more time in writing tailrecursive functions.
Markdown
Up so far I only talked about lists and binary trees, but both types basically contain everything you need to know. Any other type is basically just repetition of what was said so far. As a last example I want to show the small Markdown example that I created in the AlgebraicData Types article. It is not a full description of the Markdown definition, but is good enough as an example.
1: 2: 3: 4: 5: 6: 

Our Markdown definition contains 4 nonrecursive cases and one recursive case. We just did what we did
so far. We create a cata
function that expects a function for every case. As NewLine
contains no
data, we can just expect a plain value.
The recursive element is quite different from what we have seen so far. Instead of a single recursive
element we have a list of recursive elements. But that shouldn't be much of a difference. We just
call the recurs
function for every element in the list with List.map
. Overall we end up
with the following cata
function.
1: 2: 3: 4: 5: 6: 7: 8: 

Do we need a fold
or foldBack
? Well, I don't know you, but I don't think we ever see a markdown
document that has some ten thousand of nested blocks so recursion becomes a problem. Probably even
a nesting more than 5 is already rare. So overall I think writing fold
and foldBack
is probably
just a waste of time. So let's once again write a function that turns a markdown document into
HTML.
1: 2: 3: 4: 5: 

Probably there isn't much to say here. The escape
function correctly escapes HTML characters.
As I always need to wrap string into tags I just created a wrap
function that I can pass a
tag and a string that does that. But I need a version that escapes the string, and one that
doesn't. The last one is important for a recursive case. Because we don't want to escape the HTML
tags itself. The arguments of the wrap
and wrapEscape
function are chosen in a way so I can
use currying, so I don't need to create a lot of lambda expressions.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 

Summary
We have seen how to create a cata
function, and we learned that foldBack
is just cata
written tailrecursive. For the fold
implementation of the tree, i choosed another way to
create a tailrecursive function that manages the stack directly.
In benchmarking we also saw that the last way is also quite better in terms of speed and garbage collection compared to a continuation approach. With a mutable stack we can even further eliminate garbage collection cleanup.
But overall we have seen that cata
is very fast and it doesn't mean that tailrecursion
is automatically better or faster.
Further Reading
module List
from Microsoft.FSharp.Collections

type List<'a> =
 Empty
 Cons of head: 'a * tail: List<'a>
Full name: Main.List<_>
Full name: Main.empty
Full name: Main.cons
Full name: catamorphisms.xs
Full name: Main.l1
Full name: Main.l2
Full name: Main.l3
Full name: Main.listLength'
val list : List<'a>

type 'T list = List<'T>
Full name: Microsoft.FSharp.Collections.list<_>
Full name: Main.listSum'
Full name: Main.listMap'
from Microsoft.FSharp.Core
Full name: Microsoft.FSharp.Core.String.length
Full name: Main.listSnoc'
Full name: Main.listCata'
Full name: Main.listLength''
Full name: Main.listCata''
val list : List<'b>

type 'T list = List<'T>
Full name: Microsoft.FSharp.Collections.list<_>
Full name: Main.listCata'''
Full name: Microsoft.FSharp.Collections.list<_>
module List
from Microsoft.FSharp.Collections

type List<'T> =
 ( [] )
 ( :: ) of Head: 'T * Tail: 'T list
interface IEnumerable
interface IEnumerable<'T>
member GetSlice : startIndex:int option * endIndex:int option > 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int > 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list > 'T list
static member Empty : 'T list
Full name: Microsoft.FSharp.Collections.List<_>
Full name: Main.listCata
Full name: Main.listLength
Full name: Main.listSum
val list : List<int>

type 'T list = List<'T>
Full name: Microsoft.FSharp.Collections.list<_>
Full name: Main.listMap
Full name: Main.listSnoc
Full name: Main.listFoldBack
Full name: Microsoft.FSharp.Core.Operators.id
 Leaf
 Node of 'a * Tree<'a> * Tree<'a>
Full name: Main.Tree<_>
Full name: Main.node
Full name: Main.endNode
Full name: Main.tree
Full name: Main.treeCata'
Full name: Main.treeCata''
Full name: Main.treeCata'''
Full name: Main.treeCata
Full name: Main.treeLength
Full name: Main.treeSum
Full name: Main.treeMap
Full name: Microsoft.FSharp.Collections.List.fold
Full name: Main.acc
Full name: Microsoft.FSharp.Collections.List.foldBack
Full name: Main.treeFoldBack
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
Full name: Main.ordered
Full name: Main.reversed
Full name: Main.preOrder
Full name: Main.postOrder
Full name: Main.path
Full name: Microsoft.FSharp.Core.String.concat
Full name: Main.treeFold'
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printf
Full name: Main.treeFold
Full name: Main.createTree
Full name: Main.createLeftTree
Full name: Main.createRightTree
Full name: Main.createBalanced
Full name: Main.smallL
Full name: Main.smallR
Full name: Main.bench
type Stopwatch =
new : unit > Stopwatch
member Elapsed : TimeSpan
member ElapsedMilliseconds : int64
member ElapsedTicks : int64
member IsRunning : bool
member Reset : unit > unit
member Restart : unit > unit
member Start : unit > unit
member Stop : unit > unit
static val Frequency : int64
...
Full name: System.Diagnostics.Stopwatch

Stopwatch() : unit
Full name: Microsoft.FSharp.Core.Operators.ignore
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: Main.treeFoldStack
type Stack<'T> =
new : unit > Stack<'T> + 2 overloads
member Clear : unit > unit
member Contains : item:'T > bool
member CopyTo : array:'T[] * arrayIndex:int > unit
member Count : int
member GetEnumerator : unit > Enumerator<'T>
member Peek : unit > 'T
member Pop : unit > 'T
member Push : item:'T > unit
member ToArray : unit > 'T[]
...
nested type Enumerator
Full name: System.Collections.Generic.Stack<_>

System.Collections.Generic.Stack() : unit
System.Collections.Generic.Stack(capacity: int) : unit
System.Collections.Generic.Stack(collection: System.Collections.Generic.IEnumerable<'T>) : unit
Full name: Main.balanced25
Full name: Main.sum
 NewLine
 Literal of string
 Bold of string
 InlineCode of string
 Block of Markdown list
Full name: Main.Markdown
union case Markdown.Literal: string > Markdown

type LiteralAttribute =
inherit Attribute
new : unit > LiteralAttribute
Full name: Microsoft.FSharp.Core.LiteralAttribute

new : unit > LiteralAttribute
val string : value:'T > string
Full name: Microsoft.FSharp.Core.Operators.string

type string = System.String
Full name: Microsoft.FSharp.Core.string
Full name: Main.markCata
Full name: Microsoft.FSharp.Collections.List.map
Full name: Main.produceHtml
type HttpUtility =
new : unit > HttpUtility
static member HtmlAttributeEncode : s:string > string + 1 overload
static member HtmlDecode : s:string > string + 1 overload
static member HtmlEncode : s:string > string + 2 overloads
static member JavaScriptStringEncode : value:string > string + 1 overload
static member ParseQueryString : query:string > NameValueCollection + 1 overload
static member UrlDecode : str:string > string + 3 overloads
static member UrlDecodeToBytes : str:string > byte[] + 3 overloads
static member UrlEncode : str:string > string + 3 overloads
static member UrlEncodeToBytes : str:string > byte[] + 3 overloads
...
Full name: System.Web.HttpUtility

System.Web.HttpUtility() : unit
System.Web.HttpUtility.HtmlEncode(s: string) : string
System.Web.HttpUtility.HtmlEncode(s: string, output: System.IO.TextWriter) : unit
Full name: Main.document