30 Nov 2020

Recursion, Iteration, and the Hidden Stack

Repeated computation has two common surface forms. In an imperative loop, a counter or other state changes until a condition fails. In recursion, a function calls itself on a smaller or later version of the problem, and the runtime remembers what remains to be done.

Imperative languages expose loops directly through constructs such as for and while. In pure functional programming languages like Haskell, however, there is no primitive imperative for/while syntax. That makes Haskell a useful place to ask the question directly: if the loop disappears from the language, what replaces it?

At the level of expressive power, recursion and loops compute the same class of functions when the loop language includes unbounded while-style iteration. The practical story is narrower: converting recursion to iteration often means making the hidden call stack explicit. In the general non-tail case, the space cost moves rather than vanishes.

Loops Without Loops

That design follows naturally from purity. An imperative loop is meaningful because something changes between iterations: a counter is incremented, a condition is re-evaluated against changing state. A loop whose body changed nothing would either do nothing or spin forever.

Haskell, however, is referentially transparent: a binding x = e names a value, not a mutable cell, and evaluating the same expression always yields the same result. There is no assignment statement to advance a counter, and indeed there are no statements at all in the imperative sense. Even effectful IO code denotes values and actions composed from expressions rather than a mutable sequence of commands. Stripped of assignment and mutable loop state, the classical imperative loop no longer fits directly.

What replaces it is recursion, elevated from a technique to the control-flow primitive. Pure functional programming pushes iteration toward self-reference and function composition rather than mutation.¹ This is not a limitation on expressive power: recursion, and the higher-order functions built on it, can express loop-shaped computations.

How Haskell Expresses Iteration

Explicit recursion

The most direct translation of a loop is a function that calls itself. Summing a list is a standard first example, defined by structural recursion over the list’s two constructors:

mySum :: [Int] -> Int
mySum []     = 0                    -- base case
mySum (x:xs) = x + mySum xs         -- recursive case

main :: IO ()
main = print (mySum [1, 2, 3])      -- 6

The list [1,2,3] is sugar for 1 : (2 : (3 : [])), so mySum walks the spine, deferring each addition until the recursive call returns. There is no counter and no mutation; for a finite, fully defined list, the recursive case moves to a structurally smaller argument until it reaches [].

Higher-order functions

Writing out the recursion for every traversal is repetitive. The idiomatic alternative is a small set of higher-order functions that capture common recursion patterns once and for all. The most fundamental family is the fold (a catamorphism²). The right fold, foldr, exposes the structure most directly:

myFoldr :: (a -> b -> b) -> b -> [a] -> b
myFoldr f z []     = z
myFoldr f z (x:xs) = f x (myFoldr f z xs)

-- many "loops" are just instances of this fold
sum'     = myFoldr (+) 0
product' = myFoldr (*) 1
length'  = myFoldr (\_ n -> n + 1) 0
map' f   = myFoldr (\x acc -> f x : acc) []

main :: IO ()
main = do
  print (sum'     [1, 2, 3]    :: Int)    -- 6
  print (product' [1, 2, 3, 4] :: Int)    -- 24
  print (length'  [1, 2, 3]    :: Int)    -- 3
  print (map' (+ 1) [1, 2, 3]  :: [Int])  -- [2,3,4]

A wide range of list functions can be written as instances of foldr;³ with the fold available, the explicit recursion is rarely needed. The companion foldl associates to the left and corresponds more closely to the accumulator pattern of an imperative loop. The choice between foldr, lazy foldl, and strict foldl' looks small here, but it becomes important once stack usage and laziness enter the picture.

Monadic iteration

Folds express pure iteration that produces a value. To repeat an effect (printing, mutating a reference, querying the world), Haskell uses monadic combinators from Control.Monad. These read almost like imperative loops while remaining ordinary functions:

import Control.Monad (forM_, when, replicateM_, forever)

main :: IO ()
main = do
  forM_ [1 .. 5] $ \i ->                -- "for i in 1..5"
    when (odd i) $                      -- "if i is odd"
      putStrLn ("odd: " ++ show i)
  replicateM_ 3 (putStrLn "tick")       -- run an action 3 times
  -- forever (putStrLn "infinite")      -- the functional infinite loop

Folds and monadic combinators are themselves defined by recursion, so Haskell’s loop-like idioms are built from recursive definitions. That settles the easy direction of the equivalence: ordinary loops can be expressed as recursion without losing computational power. The harder direction is the converse, translating recursion back into loops, where the translation is possible but not always free.

Can Every Recursion Become a Loop?

Theoretically, recursion can be turned into iteration; the general program-level recipe is a while loop driving an explicit stack.

Two qualifications matter. The first is which loop: in the LOOP/WHILE-program models, bounded for loops are strictly weaker than unbounded while loops. The second is at what cost: tail recursion can compile directly to a jump, while the general non-tail conversion relocates the stack rather than eliminating it. Each qualification reflects a different reading of the question, one about which functions can be computed and the other about how a given program is transformed.

The computability view: which functions can be expressed

If the question is whether the class of functions computable by recursion is the same as the class computable by loops, the answer comes from the foundations of computability theory.

The formal equivalence results are the starting point: the λ-calculus (general recursion), Turing machines, and Kleene’s μ-recursive functions define the same class of functions. The Church-Turing thesis is the broader claim that this class captures what can be computed by an effective procedure. So general recursion and a general loop language are equi-powerful: anything one can compute, so can the other.

But the word “loop” hides a fork:

Bounded for loops are loops whose iteration count is fixed before the loop begins. In the formal LOOP-program model, they correspond exactly to the primitive recursive functions: a program built only from assignment, sequencing, and for n times loops computes a function if and only if that function is primitive recursive.⁴
Unbounded while loops in the corresponding WHILE-program sense correspond to the μ-recursive (partial recursive) functions, i.e. the full class of Turing-computable functions.

The gap between the two classes is strict. The Ackermann function,

ack :: Integer -> Integer -> Integer
ack 0 n = n + 1
ack m 0 = ack (m - 1) 1
ack m n = ack (m - 1) (ack m (n - 1))

main :: IO ()
main = print (ack 2 3)   -- 9

is total and computable but not primitive recursive: its diagonal ack n n eventually dominates every primitive recursive function, so it cannot be expressed with bounded formal for loops alone. Its doubly nested recursion requires either a while loop or general recursion.

So the precise statement is:

formal bounded for = primitive recursive $\subsetneq$ μ-recursive = formal while = general recursion

In that formal sense, every function computable by general recursion is computable by an unbounded while program. It cannot in general be computed using only bounded formal for loops. Ackermann is no counterexample to the broader claim: give the while loop an explicit stack and it too becomes iterative.

The mechanical view: transforming a given program

The computability argument works at the level of models, not source programs: it shows that an equivalent loop program exists without giving a convenient rewrite of the program at hand. There is, however, a mechanical procedure that converts ordinary recursive functions into iterative ones, and it makes the cost of the conversion explicit.

Recursive evaluation can be modeled as a loop over a call stack of pending frames. The same can be done by hand for a non-tail recursion such as summing a binary tree, which makes two recursive calls and has work to do after each returns:

class Node:
    def __init__(self, val, left=None, right=None):
        self.val, self.left, self.right = val, left, right

def tree_sum(node):                         # recursive
    if node is None:
        return 0
    return node.val + tree_sum(node.left) + tree_sum(node.right)

def tree_sum_iter(root):                    # iterative, explicit stack
    total, stack = 0, [root]
    while stack:
        node = stack.pop()
        if node is not None:
            total += node.val
            stack.append(node.left)
            stack.append(node.right)
    return total

This example is especially simple because addition is commutative and associative: the answer can be accumulated in any traversal order. More complicated recursive functions have order-sensitive work waiting after each recursive call; their explicit stack must store more than the next input: a small frame recording what remains to be done.

For example, this recursive formatter depends on left-to-right order and on the value and parentheses interleaved with the two recursive results:

def tree_expr(node):
    if node is None:
        return "."
    return "(" + tree_expr(node.left) + str(node.val) + tree_expr(node.right) + ")"

An iterative version cannot just push nodes and append values as it sees them. It needs frames such as “visit this node,” “emit this string,” and “resume after the left subtree.” That explicit state is the call stack in another form.

This generalizes. The formal bridge is a two-step transformation:

Continuation-passing style (CPS). Rewrite the function so that “what to do with the result” is passed explicitly as a continuation argument. After this transformation, calls in the resulting program are in tail position; the continuation represents the work that used to happen after a call returned.
Defunctionalization. Replace the higher-order continuations with a first-order data structure that records the pending work. That data structure is an explicit stack, and the now-tail-recursive driver becomes a plain while loop.⁵

For a simple non-tail recursion such as n * fact(n - 1) for n >= 0, the pending continuation is “multiply the result by n.” Defunctionalized, that continuation becomes an ordinary stack frame:

def fact_iter(n):
    frames = []
    while n != 0:
        frames.append(("mul", n))   # remember: multiply by n after the call returns
        n -= 1

    result = 1
    while frames:
        tag, value = frames.pop()
        if tag == "mul":            # one branch per continuation shape
            result *= value
    return result

fact_iter(5)                        # 120

The composition CPS → defunctionalize turns direct-style recursion into a loop over an explicit stack, a constructive proof that recursion does not compute anything iteration cannot. This fits the structured-program theorem’s broader result that sequencing, selection, and iteration suffice for computable control flow.⁶

The catch is that this general transformation does not eliminate the space cost. The continuations, or equivalently the explicit-stack frames, hold the information the call stack held. Iteration with mutable state and a heap-allocated stack matches recursion; it does not automatically improve on it. The important case where the stack cost can disappear is tail recursion.

The Practical Catch: Stack Overflow

A non-tail recursive call cannot finish until the call it makes finishes, so its stack frame must persist until the entire sub-computation finishes. Recursion of depth $d$ therefore demands $O(d)$ live stack frames. Since the call stack is a bounded resource, sufficiently deep recursion aborts the program. In Python the limit is explicit and conservative:

import sys
print(sys.getrecursionlimit())   # 1000 by default

def depth(n):
    return 0 if n == 0 else 1 + depth(n - 1)

depth(100_000)            # RecursionError: maximum recursion depth exceeded

Haskell is not immune, and laziness gives it a subtler failure mode. A lazy left fold such as foldl (+) 0 [1 .. 10^8] does not combine as it goes; it builds a tower of deferred thunks that is collapsed only when the result is demanded. Collapsing that tower can recurse deeply enough to overflow the stack.

The strict foldl' forces each step and runs in constant space, which is why the choice among the folds matters. foldr suits lazy, possibly infinite, or short-circuiting traversals; foldl' suits strict accumulation; plain foldl is rarely the right choice. The Haskell Wiki covers the details.

Managing Recursion’s Costs

Several established techniques reduce or remove the stack cost of recursion, ranging from the narrowly applicable to the fully general. Two strategies underlie them: rewriting the recursion to run in constant stack space, or relocating its pending work from the call stack to the heap, where the bound is available memory rather than a fixed stack limit.

Tail-call optimization

A call is in tail position if it is the last action of its caller: nothing remains to be done with its result. Such a call need not allocate a new frame; it can reuse the caller’s frame, reducing the recursion to a jump. With tail-call optimization (TCO), tail recursion runs in $O(1)$ stack space and is, operationally, a loop. A properly implemented tail call is no more expensive than a goto. That fact undercut the once-common view that procedure calls are inherently costly.⁷

However, language support is uneven. Scheme goes furthest and requires proper tail calls in its language standard; GHC and the ML family perform the optimization; the JVM historically lacks general proper tail calls, though some JVM languages optimize restricted self-tail-recursive functions. Python omits TCO by design, a choice made to keep stack traces intact for debugging. In CPython, then, tail recursion offers no space relief, so the remaining techniques are often necessary rather than merely convenient.

Accumulators

The prerequisite for TCO is that the recursion actually be in tail form, and a non-tail recursion can often be reshaped into one by carrying an accumulator. The naive and accumulator versions of factorial, for non-negative inputs, make the contrast concrete:

factNaive :: Integer -> Integer
factNaive 0 = 1
factNaive n = n * factNaive (n - 1)         -- the (*) waits for the call: NOT tail recursive

factorial :: Integer -> Integer
factorial = go 1
  where
    go acc 0 = acc
    go acc n = go (acc * n) (n - 1)         -- nothing waits: tail recursive

main :: IO ()
main = print (factNaive 5, factorial 5)     -- (120,120)

In go, the recursive call is the entire right-hand side, so a TCO-capable compiler executes it as a constant-stack loop. Laziness adds one wrinkle: acc * n is itself a deferred thunk, and an unoptimized build piles up the same tower that sinks lazy foldl. GHC’s strictness analysis normally forces the accumulator, and a bang pattern (go !acc n = ...) makes the strictness explicit rather than inferred. Carrying a strict accumulator is the standard way to turn linear recursion into iteration.

Continuation-passing style

Accumulators handle many linear recursions. For recursions that are not naturally tail-shaped, CPS is the general technique: it threads an explicit continuation describing the rest of the computation, so calls in the transformed program become tail calls.

def fact_cps(n, k=lambda x: x):
    if n == 0:
        return k(1)
    return fact_cps(n - 1, lambda acc: k(n * acc))

CPS makes the recursive control flow explicit, but it does not save space by itself. The pending multiplications now live in nested continuation closures on the heap instead of on the call stack; CPS relocates the stack rather than removing it. In CPython the rewrite does not even escape the recursion limit: heap-allocated or not, the calls that build and unwind the continuations still nest, so fact_cps(100_000) fails with the same RecursionError as before. A trampoline removes that last source of nested calls.

Trampolining

In a language without TCO, such as Python, a tail-recursive function still grows the stack. A trampoline breaks the chain: instead of calling itself, the function returns a thunk describing the next step, and a small driver loop repeatedly invokes those thunks. Control returns to the loop between steps, so recursive calls no longer accumulate on the call stack.

from functools import partial

def factorial(n, acc=1):
    if n == 0:
        return acc
    return partial(factorial, n - 1, acc * n)   # return the next step, don't call it

def trampoline(start, *args):
    result = start(*args)
    while callable(result):                     # the actual loop
        result = result()
    return result

trampoline(factorial, 100_000)                  # no RecursionError

The trampoline is, in effect, a hand-rolled TCO: it converts unbounded tail recursion into a genuine while loop in user space. Production versions usually return tagged Done and Call values instead of testing callable, so ordinary callable results cannot be mistaken for the next step.

Explicit stacks and strictness

When the direct recursive form is non-tail (tree and graph traversals, for instance), the usual remedy is the explicit stack from the tree-sum example. The bounded call stack is replaced by a heap-allocated one under program control, which can grow much larger. In Haskell, the stack that overflows is more often a tower of thunks than a chain of calls, and the remedy for that failure mode is strictness: foldl', bang patterns, and seq keep deferred work from piling up in the first place.

Memoization

Some recursive programs are expensive because they are redundant, not because they are deep. The naive fib n = fib (n-1) + fib (n-2) is only linearly deep, yet exponential in the number of calls, because it recomputes subproblems. The fix there is memoization or dynamic programming (in Python, often as little as a functools.lru_cache decorator), which trades repeated computation for a table of cached results. That is a different transformation from recursion to iteration, but it is often the optimization actually needed.

The Boundary Case: Corecursion

The discussion so far assumes recursion that bottoms out. In a lazy language, some recursion does not. Corecursion uses productive, non-terminating definitions that generate infinite structures on demand:

ones :: [Int]
ones = 1 : ones         -- an infinite list

fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs)

main :: IO ()
main = print (take 5 ones, take 10 fibs)
-- ([1,1,1,1,1],[0,1,1,2,3,5,8,13,21,34])

fibs is not meant to “return” a completed finite list, yet take 10 fibs is well-defined because laziness forces only a finite prefix. Corecursion stretches the loop analogy to its limit: not a loop that finishes, but one that runs as long as its output is demanded. The imperative while (true) loop thus has two counterparts in a language with no loops: forever for effects, and corecursion for data.

What Is Gained or Lost

Recursion and iteration therefore meet at several levels. As models of computation, general recursion and unbounded loops are equivalent. As program transformations, direct-style recursive code can be made iterative by making the hidden call stack explicit.

As engineering choices, though, they differ in readability, language support, evaluation strategy, and where pending work is stored. The practical question is not whether recursion can become iteration, but what the conversion costs and what it buys.

References

Paul Hudak, John Hughes, Simon Peyton Jones, and Philip Wadler. 2007. A history of Haskell: Being lazy with class. In Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, 2007. 12-1-12-55. ↩︎
Erik Meijer, Maarten Fokkinga, and Ross Paterson. 1991. Functional programming with bananas, lenses, envelopes and barbed wire. In Conference on Functional Programming Languages and Computer Architecture, 1991. 124–144. ↩︎
Graham Hutton. 1999. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming 9, 4 (1999), 355–372. ↩︎
Albert R. Meyer and Dennis M. Ritchie. 1967. The complexity of loop programs. In Proceedings of the 1967 22nd National Conference, 1967. 465–469. ↩︎
John C. Reynolds. 1972. Definitional interpreters for higher-order programming languages. In Proceedings of the ACM Annual Conference, Volume 2, 1972. 717–740. ↩︎
Corrado Böhm and Giuseppe Jacopini. 1966. Flow diagrams, Turing machines and languages with only two formation rules. Communications of the ACM 9, 5 (1966), 366–371. ↩︎
Guy Lewis Steele Jr. 1977. Debunking the “expensive procedure call” myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO. In Proceedings of the 1977 Annual Conference, 1977. 153–162. ↩︎

Boyang Yue