by Nicolas Wu
Posted on 16 January 2011
Last week I attended the London Haskell Hoodlums meetup, where we worked on solving one of the Google Code Jam problems. If you’re relatively new to Haskell, I can definitely recommend these kinds of meetings: they’re a great way of learning how to tackle problems in a functional style. This post is about the solution I came up with on the journey home, and uses an unusual characterisation of an unfold.
The problem is to calculate how much money a roller coaster ride makes in a day, where each time a person completes a ride £1 is made. People queue up for the ride in groups, and groups cannot be split up or reordered. The ride has a capacity of k people, and is filled with as many groups as possible before it is started. As soon as the ride is completed, the groups in the ride rejoin the queue in the same order they went in. Over the course of the day, the ride will run a total of r times.
Since this post is a literate Haskell file which can be compiled, we first need to declare the imports that will be used:
This imports the handy printf function which will help to output the solution in the format expected.
To solve this problem, we first create the infinite list of rides which would be taken if the ride went on indefinitely. The creation of infinite lists is an example of an unfold, which we can define as follows:
Roughly speaking, you can think of an unfold as taking a grow function and a seed, and applying the grow function to the seed. The result is a value which is added to our list, along with a new seed used to produce the next value in the list. This process is repeated forever, thus generating a stream of values.
The stream of rides is a function of the capacity of the ride, and the current order of the groups, and we can define this stream as follows:
usher function takes a capacity and a list of groups to produce two lists: the first represents the groups which will take the ride next, and the second represents the remaining groups.
To model the fact that groups who have just taken the ride want to come back for another go, we use the requeue function as follows:
The profit that is made in a day becomes a simple calculation:
This takes some input parameters and a list of groups, and first organises the groups into a stream of potential rides. The first r rides in this stream are then taken and concatenated together. The sum of this list gives us our final result.
Unfold and Rho
While the algorithm given above works fine for relatively small inputs, it is too inefficient for larger problems, where the number of runs, r, is very large. One obvious inefficiency is in our unfold, where at some point the seed group will already have been seen, but must be ushered all over again.
While the process can be made more efficient by using memoization, another approach is to consider how the result of the unfold can be decomposed. To do this, we’ll define a function rho which satisfies:
This decomposition uses a function rho, named as such since the greek symbol, ρ, is written as an initial segment followed by a cycle (I first saw this use of ρ in Pollard’s rho algorithm which is also a cycle finding algorithm).
The type signature of rho is very similar to that of an unfold, but differs in that two lists are returned rather than one. The first list is the initial segment, and the second is the cyclic part.
The rho function makes use of rho′, which accumulates pairs of seeds and their corresponding values. For efficiency, these are stored in reverse order (so the list zs gets named sz).
The key here is that we check the list of previously computed values, sz, to see whether or not the current seed, x, is present. If the seed is present in the list, then we have found a cycle and therefore, after recovering the order of the list, must break the list into two parts: elements which come before this seed, and those which come after. Once this is done, we extract the values we are interested in from the resulting pair of lists. This makes use of the diag function which simply applies a function to a pair of values:
If the seed value x does not appear in the list, we add the pair which consists of x and its corresponding value y, and continue calculation with the new seed x′.
Finally, we can use this to define an optimised version of profit. Given a list of groups ps and a number of rides r we wish to find the largest s and smallest t such that:
The idea here is that we can decompose the number of rides into the rides which are performed before the cycle, those which are part of the cycle, and a final number of rides which don’t make a complete cycle.
Once those values are calculated, we can work out the profit as the following function:
profit' :: (Int, Int, [Int]) -> Int profit' (r, k, ps) = u + (s * v) + (sum . concat) (take t rss) where (qss, rss) = rho (requeue . usher k) ps u = sum . concat $ qss v = sum . concat $ rss (s, t) = head [(s', t') | t' <- [0 .. r - length qss], let s' = (r - t' - length qss) `div` (length rss), r == length qss + s' * length rss + t']
This works by decomposing the queues into qss and rss, and then finding the constants s and t which satisfy the equation above. Once those values are known, the profit can be worked out by simple arithmetic: the profit is the number of people, u, in the groups which form the initial rides before a cycle, qss, added to the number of cycles, s, multiplied by the size of the cycle, v, added to the number of people in the final t groups which do not form a complete cycle in rss.
With the algorithm in place, all that is needed to solve this problem is provide some functions that will decode the given input string into the datatypes we require.
The input format is a file where the first line has a number which dictates the number of rides that proceed. Each ride is described in two lines. The first line contains three space separated values: the number of times the ride is started, the capacity of the ride, and the number of groups in the queue. The second line contains a list of space separated values which represents the number of people in each group. While we could certainly use Parsec to do this, using a parser seems a little overkill since decoding this can be expressed in terms of the following function:
The convert function turns the input string into a list of lists of numbers. The first line of this list is ignored, and all consecutive lists are paired up. These pairs are then used to form the input type appropriate for the
The conversion from a string makes use of
words to find line separated, and space separated values:
The pairs function is not standard, so here’s a simple definition:
Finally, we can put this all together to create a program that takes the appropriate input on the
stdin pipe, and produces the output on the
stdout pipe. The
interact function provides much of the plumbing required, and is part of the prelude:
This takes a function that is used to consume the input in
stdin and its result is placed as output in
stdout. By using
interact, we can define a main function that puts our solution together:
The key part is the function that
interact takes as its argument, which takes a
String and produces a
String. The input string is first decoded into a list of datatypes that we map over using
profit to produce a list of results. These results are then zipped together with the list of solution numbers, using a printing function which produces a list of strings in the right format. Finally, these strings are appended together using
concat to produce our desired output.
Running this algorithm against the large practice dataset is remarkably fast: it is solved in about 0.6 seconds on my machine, whereas using the slower version didn’t finish after a minute.
As a concluding remark, I started by tackling this using a cycle detecting algorithm, since we can decompose the list of group orderings as follows:
In particular, we are interested in the shortest qs and rs that satisfy this equation, and can construct a function uncycle which calculates these for us.
This function makes the assumption that a cycle is formed whenever an element of the list appears more than once. While this would certainly be appropriate if the function applied in the unfold were a bijection, the usher function is not bijective, since two separate groups might have the same number of people and be the result of an ushering. This might be solved by either labelling the groups to make them unique, or by returning the seeding values along with the result: this is essentially how rho works.