```
if p then x else y
```

An alternative way of viewing this is as a ternary operator:

```
x <| p |> y
```

This operator is known as the *conditional choice*, but I’m told that it was introduced by Tony Hoare, so maybe naming it the Hoare conditional would make more sense (it makes an appearance in Hoare’s work on Communicating Sequential Processes and Unifying Theories of Programming, but I haven’t found any references to its original introduction).

Rendering conditonals as ternary operators makes it clear that there are a number of nice properties that hold true:

Idempotency

`x <| p |> x == x`

Left-Identity

`x <| True |> y == x`

Right-Identity

`x <| False |> y == y`

Left-Distributivity

`x <| p |> (y <| q |> z) == (x <| p |> y) <| q |> (x <| p |> z)`

Right-Distributivity

`(x <| p |> y) <| q |> z == (x <| q |> z) <| p |> (y <| q |> z)`

Symmetry

`x <| p |> y == y <| not p |> x`

Conjunction-Associativity

`(x <| p |> y) <| q |> z == x <| p && q |> (y <| q |> z)`

Disjunction-Associativity

`x <| p |> (y <| q |> z) == (x <| p |> y) <| p || q |> z`

Conjunction-Collapse

`x <| p |> (y <| p && q |> z) == x <| p |> z`

Disjunction-Collapse

`(x <| p || q |> y) <| q |> z == x <| q |> z`

Abiding (Interchange)

`x # y <| p |> v # w == (x <| p |> v) # (y <| p |> w)`

These laws are easily proved by considering the cases where `p`

and `q`

are `True`

and `False`

.

In Haskell, implementing this operator is quite simple. First we’ll define the right bracket, which takes a predicate and a value `x`

, and returns `Nothing`

if the predicate is `True`

, and returns `Just x`

when it is `False`

:

```
> (|>) :: Bool -> a -> Maybe a
> True |> _ = Nothing
> False |> y = Just y
```

The left bracket is equivalent to `fromMaybe`

, where the resulting value from the application of the right bracket (which evaluates the predicate) is consumed. If the result was `Nothing`

, then we use the value `x`

, otherwise we have `Just y`

, and return `y`

as the result.

```
> (<|) :: a -> Maybe a -> a
> x <| Nothing = x
> _ <| Just y = y
```

Finally we give the operators low infixity precedence, and make them right associative:

```
> infixr 0 <|
> infixr 0 |>
```

Defining the operator this way makes the ternary operator right associative, so that:

```
x <| p |> y <| q |> z == x <| p |> (y <| q |> z)
```

Right associativity here is useful so that reading from left to right, the result is the expression to the left of the first predicate that is true.

]]>A recent blog post by Jeremy Gibbons, Morality and Temptation, left the exercise of finding two definitions of the arrow with the signature \(μF → νF\), and showing that they are equivalent. This post shows my solution to this problem, and while I’m here, I too have a confession to make: when dealing with the ins and outs of initial algebras and final coalgebras I sometimes get the Hokey Cokey playing in my mind. For those of you who don’t know this children’s song, it goes something like this:

You put your left foot in,

Your left foot out,

In, out, in, out,

You shake it all about.

You do the Hokey Cokey,

And you turn around,

That’s what it’s all about!

Now that’s out of the way, let’s focus on the problem at hand. I’ll start by summarising some of the basics of what we’re tackling.

Given a functor \(F : \C → \C\), an F-Algebra is a pair \((X, f)\), where \(X ∈ \C\), and \(f ∈ \C(F X, X)\). F-Algebras are objects in the category written \(Alg_F\), where an arrow in this category is given by an \(h ∈ \C\) such that the following property holds between two objects \((X, f)\), and \((Y, g)\): \[ h . f = g . F h \] The construction of a \(fold\) is given by looking at the initial object in this category, written as \((μF, in)\). Initial objects have a unique arrow to every object in the category; in this category the unique arrow to an object \((Y, g)\) is given by \(\fold{g}\), and it makes the following commute (you’ll have to excuse the ASCII diagrams, I have yet to find a good way of rendering commutative diagrams for the web):

```
F(|g|)
F μF ─——————> F Y
│ │
in │ │ g
│ │
V V
μF ---------> Y
(|g|)
```

Following the arrows of this diagram gives us most of the ingredients needed to state the Universal Property of a \(fold\), which is as follows: \[ h . in = g . F h ⇔ \fold{g} = h \] In other words, \(\fold{g}\) is the unique solution that satisfies this commuting diagram.

Given a functor \(F : \C → \C\), an F-coalgebra is the categorical dual of an F-algebra and inhabits the \(Coalg_F\) category. Here, objects are pairs \((X, f)\), where \(X ∈ \C\), and \(f ∈ \C(X, F X)\).

Unsurprisingly, we are interested in the final object, written \((νF, out)\), of this category, where the unique arrow from every object \((X, f)\) to the final object is written \(\unfold{f}\), and the following commutes:

```
F[(f)]
F X ────────> F νF
^ ^
│ │
f │ │ out
│ │
│ │
X ----------> νF
[(f)]
```

The universal property also dualises, and in this case we have: \[ h = \unfold{f} ⇔ F h . f = out . h \] This property simply states that \(\unfold{f}\) is the unique equation satisfying the commuting diagram.

Our goal is to find two definitions of an arrow of type \(μF → νF\), and to show that these definitions are equivalent. Looking at the definition of an initial F-algebra, it’s pretty obvious that we can produce such an arrow by taking the fold of an arrow with type \(F(νF) → νF\). Using the inverse of \(out\), which we’ll write as \(out°\), is a fairly obvious choice and produces the following diagram:

```
F(|out°|)
F μF ─———————> F νF
│ │
│ │
in │ │ out°
│ │
V V
μF ----------> νF
(|out°|)
```

By dualizing, the alternative definition of this arrow becomes quite apparent, where the unfold of \(in°\) is used instead:

```
F[(in°)]
F μF ─———————> F νF
^ ^
│ │
in° │ │ out
│ │
│ │
μF ----------> νF
[(in°)]
```

So, we now have two definitions for the arrow in question, \(\fold{out°}\), and \(\unfold{in°}\). All that is left is to show that these two definitions are equivalent.

The universal properties of folds and unfolds are a useful way of encoding lots of information about their structure in one place. Consequently, these properties are very useful tools for proving properties of folds and unfolds.

By plugging our two arrows into the equations that describe the universal properties of folds and unfolds, we get the following: \[
\unfold{in°} . in = out° . F \unfold{in°}
⇔
\fold{out°} = \unfold{in°}
⇔
F \fold{out°} . in° = out . \fold{out°}
\] I think this equation is marvellous, I don’t know if it has a name, but for the purpose of this post I’ll call it the *Hokey Cokey Law*. The left hand side comes from instantiating the universal property for folds with an unfold, and the right hand side comes from instantiating the universal property for unfolds with a fold.

We still haven’t proved that \(\fold{out°} = \unfold{in°}\), but proving either the left hand side or the right hand side of the equation is all that is required. Let’s start with what is known: the universal property of \(fold\) applied to \(\fold{out°}\):

\(\fold{out°} . in = out° . F \fold{out°}\)

\(\hint{⇒}{Leibniz, twice}\)

\(out . \fold{out°} . in . in° = out . out° . F \fold{out°} . in°\)

\(\hint{⇒}{Identity, twice}\)

\(out . \fold{out°} = F \fold{out°} . in°\)

\(\hint{⇔}{Hokey Cokey}\)

\(\fold{out°} = \unfold{in°}\)

So there you have it, the Hokey Cokey in all its glory. Of course, we could take this proof *and turn around*, since everything dualises: we could have equally picked the universal of \(unfold\) applied to \(\unfold{in°}\), and would have arrived at the left hand side of our equivalence. That’s what it’s all about!

]]>Woah! The Hokey Cokey!

Woah! The Hokey Cokey!

Woah! The Hokey Cokey!

Knees bent, arms stretched, ra ra ra!

The definition of injectivity is usually given in the following terms, where a function is injective when it is *left cancellative*:

\(h\) is injective iff \(\forall f, g \cdot h . f = h . g \Rightarrow f = g\)

Surjectivity can be described in similar terms, where a function is surjective when it is *right cancellative*:

\(h\) is surjective iff \(\forall f, g \cdot f . h = g . h \Rightarrow f = g\)

While these two definitions show the similarity between the injective and surjective properties, this definition of surjectivity isn’t the standard one. The standard definition goes as follows:

\(h\) is surjective iff \(\forall y \cdot \exists x \cdot h~ x = y\)

I don’t like this definition, since I don’t find it easy to work with in proofs, and it doesn’t show the relationship between surjections and injections well.

The representables, or Hom-functors, are functions from arrows to arrows in a category. These functions come in two flavours, the covariant representable, and the contravariant representable. Given objects \(S\) and \(T\) in a category \(ℂ\), we write \(ℂ(S, T)\) denote the set of all arrows from \(S\) to \(T\).

Given a function \(h : X \rightarrow Y\), the covariant representable, \(ℂ(S,-)\) is defined as:

```
ℂ(S,—) :: (X -> Y) -> ℂ(S, X) -> ℂ(S, Y)
ℂ(S,—) h f = h . f
```

As shorthand for the above, we usually slot the argument \(h\) into the dash:

```
ℂ(S, h) = ℂ(S,—) h
```

Given a function \(h : X \rightarrow Y\), the contravariant representable, \(ℂ(-,T)\) is defined as:

```
ℂ(—,T) :: (X -> Y) -> ℂ(X, T) -> ℂ(Y, T)
ℂ(—,T) h f = f . h
```

Again, we use the following shorthand, where we slot \(h\) into the dash:

```
ℂ(h, T) = ℂ(—,T) h
```

One feature of these representables that I like particularly is that they give rise to a clean correspondence between injectivity and surjectivity.

The injective Representables give rise to a nice model for injective functions.

The following property holds of covariant representables:

\(ℂ(S, h)\) is injective iff \(h\) is injective.

First we show that if \(h\) is injective then \(ℂ(S, h)\) is injective:

```
ℂ(S, h) f == ℂ(S, h) g
== {- definition ℂ(S, h) -}
h . f == h . g
=> {- injective h -}
f == g
```

Then we show that if \(ℂ(S, h)\) is injective then \(h\) is injective:

```
h . f == h . g
== {- definition ℂ(S, h) -}
ℂ(S, h) f == ℂ(S, h) g
=> {- injective ℂ(S, h) -}
f == g
```

Here’s a property of the contravariant representable functor:

\(ℂ(h, S)\) is injective iff \(h\) is surjective.

We work with the contrapositive to show that if \(h\) is surjective, then \(ℂ(h, S)\) is injective:

```
f ≠ g
== {- definition inequality -}
∃ y . f y ≠ g y
== {- surjective h -}
∃ x . f (h x) ≠ g (h x)
== {- definition inequality -}
f . h ≠ g . h
== {- definition ℂ(h, S) -}
ℂ(h, S) f ≠ ℂ(h, S) g
```

Finally we show that if \(h\) is not surjective, then \(ℂ(h, S)\) is not injective:

```
let h 0 = 0 {- h : {0} -> {0,1} -}
let f 0 = 0, f 1 = 0
let g 0 = 0, g 1 = 1
```

then

```
f . h = g . h
== {- definition ℂ(h, S) -}
ℂ(h, S) f = ℂ(h, S) g
```

but

```
f ≠ g
```

Sadly, I don’t like this part of the proof, since it involves finding a counterexample to the injectivity of \(ℂ(h, S)\) when \(h\) is not surjective, and I prefer constructive proofs. Nevertheless, we’ve shown that \(ℂ(h, S)\) is injective iff \(h\) is surjective, which gives us the right cancellative definition of surjection.

]]>The problem is to calculate how much money a roller coaster ride makes in a day, where each time a person completes a ride £1 is made. People queue up for the ride in groups, and groups cannot be split up or reordered. The ride has a capacity of \(k\) people, and is filled with as many groups as possible before it is started. As soon as the ride is completed, the groups in the ride rejoin the queue in the same order they went in. Over the course of the day, the ride will run a total of \(r\) times.

Since this post is a literate Haskell file which can be compiled, we first need to declare the imports that will be used:

```
> import Text.Printf (printf)
```

This imports the handy \(printf\) function which will help to output the solution in the format expected.

To solve this problem, we first create the infinite list of rides which would be taken if the ride went on indefinitely. The creation of infinite lists is an example of an \(unfold\), which we can define as follows:

```
> unfold :: (a -> (b, a)) -> a -> [b]
> unfold f x = y : unfold f x'
> where
> (y, x') = f x
```

Roughly speaking, you can think of an \(unfold\) as taking a grow function and a seed, and applying the grow function to the seed. The result is a value which is added to our list, along with a new seed used to produce the next value in the list. This process is repeated forever, thus generating a stream of values.

The stream of rides is a function of the capacity of the ride, and the current order of the groups, and we can define this stream as follows:

```
> rides :: Int -> [Int] -> [[Int]]
> rides k ps = unfold (requeue . usher k) ps
```

The `usher`

function takes a capacity and a list of groups to produce two lists: the first represents the groups which will take the ride next, and the second represents the remaining groups.

```
> usher :: Int -> [Int] -> ([Int], [Int])
> usher _ [] = ([], [])
> usher 0 ps = ([], ps)
> usher k ps@(p:ps')
> | k >= p = (p:qs, rs)
> | otherwise = ([], ps)
> where
> (qs, rs) = usher (k - p) ps'
```

To model the fact that groups who have just taken the ride want to come back for another go, we use the \(requeue\) function as follows:

```
> requeue :: ([a], [a]) -> ([a], [a])
> requeue (ps, qs) = (ps, qs ++ ps)
```

The profit that is made in a day becomes a simple calculation:

```
> profit :: (Int, Int, [Int]) -> Int
> profit (r, k, ps) = sum . concat . take r $ rides k ps
```

This takes some input parameters and a list of groups, and first organises the groups into a stream of potential rides. The first \(r\) rides in this stream are then taken and concatenated together. The sum of this list gives us our final result.

While the algorithm given above works fine for relatively small inputs, it is too inefficient for larger problems, where the number of runs, \(r\), is very large. One obvious inefficiency is in our \(unfold\), where at some point the seed group will already have been seen, but must be ushered all over again.

While the process can be made more efficient by using memoization, another approach is to consider how the result of the \(unfold\) can be decomposed. To do this, we’ll define a function \(rho\) which satisfies:

```
unfold f x = ys ++ cycle zs
where
(ys, zs) = rho f x
```

This decomposition uses a function \(rho\), named as such since the greek symbol, \(ρ\), is written as an initial segment followed by a cycle (I first saw this use of \(ρ\) in Pollard’s rho algorithm which is also a cycle finding algorithm).

```
> rho :: Eq a => (a -> (b, a)) -> a -> ([b], [b])
> rho f x = rho' [] f x
```

The type signature of \(rho\) is very similar to that of an \(unfold\), but differs in that two lists are returned rather than one. The first list is the initial segment, and the second is the cyclic part.

The \(rho\) function makes use of \(rho'\), which accumulates pairs of seeds and their corresponding values. For efficiency, these are stored in reverse order (so the list \(zs\) gets named \(sz\)).

```
> rho' :: Eq a => [(b, a)] -> (a -> (b, a)) -> a -> ([b], [b])
> rho' sz f x
> | any ((== x) . snd) sz =
> diag (map fst) . break ((== x) . snd) . reverse $ sz
> | otherwise = rho' ((y, x):sz) f x'
> where
> (y, x') = f x
```

The key here is that we check the list of previously computed values, \(sz\), to see whether or not the current seed, \(x\), is present. If the seed is present in the list, then we have found a cycle and therefore, after recovering the order of the list, must break the list into two parts: elements which come before this seed, and those which come after. Once this is done, we extract the values we are interested in from the resulting pair of lists. This makes use of the \(diag\) function which simply applies a function to a pair of values:

```
> diag :: (a -> b) -> (a, a) -> (b, b)
> diag f (x, y) = (f x, f y)
```

If the seed value \(x\) does not appear in the list, we add the pair which consists of \(x\) and its corresponding value \(y\), and continue calculation with the new seed \(x'\).

Finally, we can use this to define an optimised version of \(profit\). Given a list of groups \(ps\) and a number of rides \(r\) we wish to find the largest \(s\) and smallest \(t\) such that:

```
r == length qss + s * length rss + t
where
(qss, rss) = rho (requeue . usher k) ps
```

The idea here is that we can decompose the number of rides into the rides which are performed before the cycle, those which are part of the cycle, and a final number of rides which don’t make a complete cycle.

Once those values are calculated, we can work out the profit as the following function:

```
> profit' :: (Int, Int, [Int]) -> Int
> profit' (r, k, ps) = u + (s * v) + (sum . concat) (take t rss)
> where
> (qss, rss) = rho (requeue . usher k) ps
> u = sum . concat $ qss
> v = sum . concat $ rss
> (s, t) = head [(s', t') |
> t' <- [0 .. r - length qss],
> let s' = (r - t' - length qss) `div` (length rss),
> r == length qss + s' * length rss + t']
```

This works by decomposing the queues into \(qss\) and \(rss\), and then finding the constants \(s\) and \(t\) which satisfy the equation above. Once those values are known, the profit can be worked out by simple arithmetic: the profit is the number of people, \(u\), in the groups which form the initial rides before a cycle, \(qss\), added to the number of cycles, \(s\), multiplied by the size of the cycle, \(v\), added to the number of people in the final \(t\) groups which do not form a complete cycle in \(rss\).

With the algorithm in place, all that is needed to solve this problem is provide some functions that will decode the given input string into the datatypes we require.

The input format is a file where the first line has a number which dictates the number of rides that proceed. Each ride is described in two lines. The first line contains three space separated values: the number of times the ride is started, the capacity of the ride, and the number of groups in the queue. The second line contains a list of space separated values which represents the number of people in each group. While we could certainly use Parsec to do this, using a parser seems a little overkill since decoding this can be expressed in terms of the following function:

```
> decode :: String -> [(Int, Int, [Int])]
> decode cs = [(r, k, ps) | ([r, k, _],ps) <- pairs . tail . convert $ cs]
```

The convert function turns the input string into a list of lists of numbers. The first line of this list is ignored, and all consecutive lists are paired up. These pairs are then used to form the input type appropriate for the `profit`

function.

The conversion from a string makes use of `lines`

and `words`

to find line separated, and space separated values:

```
> convert :: String -> [[Int]]
> convert = (map (map read . words) . lines)
```

The pairs function is not standard, so here’s a simple definition:

```
> pairs :: [a] -> [(a,a)]
> pairs [] = []
> pairs [x] = [(x, undefined)]
> pairs (x:y:xs) = (x, y):pairs xs
```

Finally, we can put this all together to create a program that takes the appropriate input on the `stdin`

pipe, and produces the output on the `stdout`

pipe. The `interact`

function provides much of the plumbing required, and is part of the prelude:

```
interact :: (String -> String) -> IO ()
```

This takes a function that is used to consume the input in `stdin`

and its result is placed as output in `stdout`

. By using `interact`

, we can define a main function that puts our solution together:

```
> main = interact $
> concat . zipWith (printf "Case #%d: %d\n") [1 :: Int ..] .
> map profit' . decode
```

The key part is the function that `interact`

takes as its argument, which takes a `String`

and produces a `String`

. The input string is first decoded into a list of datatypes that we map over using `profit`

to produce a list of results. These results are then zipped together with the list of solution numbers, using a printing function which produces a list of strings in the right format. Finally, these strings are appended together using `concat`

to produce our desired output.

Running this algorithm against the large practice dataset is remarkably fast: it is solved in about 0.6 seconds on my machine, whereas using the slower version didn’t finish after a minute.

As a concluding remark, I started by tackling this using a cycle detecting algorithm, since we can decompose the list of group orderings as follows:

```
ps == qs ++ cycle rs
```

In particular, we are interested in the shortest \(qs\) and \(rs\) that satisfy this equation, and can construct a function \(uncycle\) which calculates these for us.

```
> uncycle :: Eq a => [a] -> ([a], [a])
> uncycle ps = uncycle' [] ps
```

```
> uncycle' :: Eq a => [a] -> [a] -> ([a], [a])
> uncycle' sp [] = (reverse sp, [])
> uncycle' sp (q:qs)
> | elem q sp = (takeWhile (/= q) . reverse $ sp, q : takeWhile (/= q) qs)
> | otherwise = uncycle' (q:sp) qs
```

This function makes the assumption that a cycle is formed whenever an element of the list appears more than once. While this would certainly be appropriate if the function applied in the unfold were a bijection, the \(usher\) function is not bijective, since two separate groups might have the same number of people and be the result of an ushering. This might be solved by either labelling the groups to make them unique, or by returning the seeding values along with the result: this is essentially how \(rho\) works.

]]>For candidates who have never heard of, or can’t remember how the Fibonacci sequence goes, I write out the following sequence, and explain that each new number is the sum of the previous two:

```
n : 0, 1, 2, 3, 4, 5, 6, ...
fib n : 1, 1, 2, 3, 5, 8, 13, ...
```

Good candidates are able to write out the following recursive definition (though I have yet to encounter a candidate that says “that’s my program: it’s in Haskell!”):

```
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
```

So far, every good candidate has gone for something along the lines of the following procedural code:

```
int fib (int n) {
if (n == 0 || n == 1) {
return 1;
}
else {
return (fib (n - 1) + fib (n - 2));
}
}
```

After asking for the complexity of this algorithm, which is almost invariably incorrectly guessed as \(O(n^2)\), when it is in fact closer to \(O(2^n)\), I usually ask whether or not they can derive a linear time version instead.

With help, good candidates have been able to come up with something along the following lines:

```
int fib (int n) {
if (n == 0 || n == 1) {
return 1;
}
int m = 2;
int fib2 = 1;
int fib1 = 1;
int fibm = 2;
while (m < n) {
m = m + 1;
fib2 = fib1;
fib1 = fibm;
fibm = fib1 + fib2;
}
return fibm;
}
```

We can prove that this algorithm works by looking at the \(invariant\), and the \(variant\) of the loop body. The invariant proves correctness, while the variant proves termination of this code.

The invariant is a condition that must be true before entering and after exiting the loop body. In our case, we establish the following conditions:

```
fib1 = fib (m - 1)
fib2 = fib (m - 2)
fibm = fib (m - 1) + fib (m - 2) = fib m
```

It’s easy to check that this is true when we first enter the body of the loop, and while it’s a little tedious, since we’re not making use of multiple assignment (where we can simultaneously change values), we can also show that the invariance holds in the loop body.

The loop variant is used to show that the algorithm is progressing towards its goal. In our case, it is enough to show that the distance from `m`

to `n`

is decreasing. By expressing the old value of `m`

as `m₀`

, we have:

```
| n - m | < | n - m₀ |
```

In terms of execution speed, it’s easy to see that this is a linear time algorithm: the main body of the loop is executed \(n\) times, and the operations in the body are all constant time.

In Haskell, a linear time version is remarkably simple to express:

```
> fibs :: [Int]
> fibs = 1 : fibs'
> fibs' = 1 : fibs''
> fibs'' = zipWith (+) fibs fibs'
```

The \(n\)th Fibonacci can be extracted using a simple list lookup:

```
> fib :: Int -> Int
> fib n = fibs !! n
```

This is a linear time algorithm because the `fibs`

stream is constructed in linear time: each new element of the list is calculated by adding the previous two values, which is a constant time operation. Looking up values in a list is also done linear time.

It’s interesting to see the procedural and the functional versions of these algorithms side by side, and I think it’s no exaggeration to say that the correctness of the functional ones is much easier to see than the correctness of the procedural ones.

As a means of discerning how good a candidate is, I’ve found that implementing fibs has been a particularly telling measure: bad candidates take a very long time to get to the end of the question, while only very good candidates have been able to derive a linear version of the algorithm.

]]>