A Tron AI in Haskell
by Nicolas Wu
Posted on 9 September 2010
This week’s post is a reprint of an article I wrote on 10th March 2010 about the Google AI contest. Since I’m deprecating my blogspot account, I wanted to make sure that this article didn’t get lost in the waves of the web.
Recently Google released a wonderful AI contest, and I submitted an entry. The aim of the competition was to produce an AI that could play tron effectively; this reminded me of my undergrad days, where hours were spent with friends around a single keyboard playing a simple 3D tron game.
The day before the end results, my bot (called zenzike) was ranked 35th in the world, and top of the Haskell league. Sadly though, my final submission never made it through to the final cut; I was submitting from a hotel room with a flaky internet connection, and my final entry clocked in 10 minutes too late, thus disqualifying me. The Haskell champion, jaspervdj, (congratulations!) has written a lovely blog article about his code, and released it for the whole world to oggle. I thought I’d try putting my bot against his to see how it would fare, and it turns out that of the basic starter pack, my AI performs favourably on the majority of maps that I’ve tried.
So here’s my code just for kicks, this is the version of my code one release before my disqualified entry, plus some formatting, like removing couple functions that I never ended up using, as well as writing it up for a wider audience
Strategy
The basic strategy for this bot is a recursive minimax algorithm with alpha-beta pruning. In actual fact, this implements the negamax variant, which makes the code a lot simpler. As a scoring heuristic, I simply used the difference between the amount of territory that each player controls, where we define territory as any space in the board which a player can reach first.
Imports and Datatypes
First up, here’s the list of imports that I made use of for my entry, there’s nothing of much interest here.
import Data.Ix (inRange)
import Data.List (maximumBy, nub, (\\), unfoldr)
import Data.Array.Diff (Array, listArray, amap, assocs, (//), bounds, (!))
import Data.Tree (Tree(Node), rootLabel, subForest)
import qualified Data.Map as M
import System.Time (ClockTime, TimeDiff(TimeDiff), tdSec, tdPicosec, diffClockTimes, getClockTime, addToClockTime)
import System.IO (stdin, stdout, stderr, hPutStrLn, BufferMode(LineBuffering), hSetBuffering)
import System.Timeout (timeout)
import Control.Monad (replicateM)
The datatypes I make use of are pretty basic. First I define what a move might
be, and these are the cardinals are, as well as a move, None
that represents
an empty move.
data Move = North | East | South | West | None deriving (Eq,Ord,Show)
cardinals :: [Move]
= [North, East, South, West] cardinals
We use the type Move
internally since it’s easier to read than a bunch of
number strings, but ultimately the tournament is set up so that there is the
following correspondence between the moves we want to make, and the string that
indicates the move:
makeMove :: Move -> String
None = "1"
makeMove North = "1"
makeMove East = "2"
makeMove South = "3"
makeMove West = "4" makeMove
The basic type I use to identify each point on the board is a Point
which
holds the (y,x)
coordinate of the point in question. I’ve stored the points
in this unusual format since it makes IO a little easier later on in the
program.
type Point = (Int,Int)
Players are simply defined as a pair of points, and the walls in a given board
are simply an array of Bool
, which stores True
at every point where a wall
is present. The dimensions of the array are the dimensions of the game board
itself.
type Players = (Point, Point)
type Walls = Array Point Bool
Finally, the Score
type is a synonym for a pair of values (x,m)
, where x
holds the score that is associated to a particular board state (assigned
through our heuristic), and m
holds the move that a player would have had to
make to get the board in that state.
type Score = (Int,Move)
Utility Functions
Probably the most basic utility function I use is destination
, which, given a
current point and a move, returns the point where the move would end up at,
irrespective of walls or world boundaries.
destination :: Point -> Move -> Point
=
destination (y,x) move case move of
East -> (y,x+1)
South -> (y+1,x)
West -> (y,x-1)
North -> (y-1,x)
None -> (y,x)
The destination function alone doesn’t give us very much to work with, so we’ll
also define a function, adjacent
, that indicates the points that are adjacent
to the point we’re considering.
We parameterise this function with a condition predicate, which allows us to
specialise adjacent
to consider not only legal adjacent moves, but also those
that end in definite survival, or in suicide.
adjacent :: (Walls -> Point -> Bool) -> Walls -> Point -> [Point]
=
adjacent condition ws p filter (condition ws) $ map (destination p) cardinals
The idea behind adjacent is that we filter the moves that result from moving in each of the cardinal directions, starting from our given point.
A simple condition we might use here would filter only moves that are within the bounds of the game board:
bounded :: Walls -> Point -> Bool
=
bounded ws p inRange (bounds ws) p
From this condition we can build two more; one that guarantees survival, and the other that ends in (legal) suicide.
survive :: Walls -> Point -> Bool
=
survive ws p && not (ws ! p) bounded ws p
suicide :: Walls -> Point -> Bool
=
suicide ws p && ws ! p bounded ws p
I don’t think it’s surprising that I never ended up using the suicide
filter.
Scoring Heuristic
An intrinsic part of our algorithm is the scoring heuristic. The score is calculated by making use of a measure of the territory of each player, which we define as the points which a player can reach before the other.
To measure the territory, we first define distances
, which returns a mapping
that relates points to their distance from a particular point.
This function effectively unfolds an initial seed value, increasing the radius
at each level of the unfold.
The key function here is expand
, which we’ll describe in a second.
distances :: Walls -> Point -> M.Map Point Int
=
distances ws p $
M.fromList | (qs, d) <- zip (unfoldr (expand ws) ([p],[p])) [0 .. ], q <- qs] [(q,d)
The expand
function makes use of a seed value, where each seed (ps, qs)
holds a list of values that have been considered, ps
, and a subset of that
list qs
that represent the outer shell of those values.
If there are no values in that outer shell, then we’ve expanded as much as
possible, so there is nothing left to return.
Otherwise, we return the current outer shell qs
, paired with the new seed,
which is the old set of all values ps
alongside the new shell qs'
.
expand :: Walls -> ([Point],[Point]) -> Maybe ([Point], ([Point],[Point]))
= Nothing
expand _ (_ , []) = Just (qs, (ps ++ qs', qs'))
expand ws (ps, qs) where
= (nub $ concatMap (adjacent survive ws) qs) \\ ps qs'
An implicit assumption of this function is that ps
and qs
contain lists of
unique elements; this was true in our initialisation within distances
, and is
preserved whenever we create a new seed.
Given two distances maps, one for each player, we can calculate the territory mapping quite easily. We use positive values to indicate the distances reachable by our player first, and negative values to indicate the distances reachable by the enemy first.
The territory
function works by taking the union of the maps for each of the
players, after negating the map that belongs to the enemy.
If only one player has the key to a point in their map, then the value doesn’t
change under map union, so the only case we need to worry about is when the key
belongs to both players.
This behaviour is handled by the “with” function that we use in this union,
which calculates the player that would get there first.
territory :: M.Map Point Int -> M.Map Point Int -> M.Map Point Int
=
territory ps qs -> -(x + y)) ps (M.map negate qs) M.unionWith (\x y
The scoring heuristic is the first meaty part of my program we consider.
The heuristic simply calculates the score of a given board state, where
a board is basically just some Walls
and Players
, and is given by the
function score
.
We feed in an additional Maybe Int
, whose value is Just x
when the
territory owned by the enemy has an upper bound of x
.
score :: Walls -> Players -> Maybe Int -> (Int, Maybe Int)
= score ws (p,q) mx
First we look at the case of mx
, if it’s Nothing
, then we don’t know how
much territory the enemy has, and we’ll have to calculate it fully.
Here the values tp
and tq
are the territory sizes of players (p, q)
.
case mx of
Nothing ->
if p == q then (0, Nothing) else (attack $ tp - tq, mx')
Otherwise, if mx
is Just x
, then all we need to do is find out the size of
our territory, and remove x
. We also have to return Just (x-1)
, since we
assume that the enemy is playing optimally.
Just x ->
$ M.size dps - x, Just (x-1)) (defend
The value of (tp,tq)
is the result of folding the function tally
over all
the values in a territory map, which basically sums up the amount of
territory each player has.
where
= M.fold tally (0,0) (territory dps dqs)
(tp,tq) =
tally d (x,y) if d == 0 then (x,y) else
if d > 0 then (x+1,y) else (x,y+1)
= distances ws p
dps = distances ws q
dqs = if M.member p dqs then Nothing else Just (M.size dqs - 1)
mx' = (10 * s)
attack s = (10 * s) -- + (length . filter (suicide ws . destination p)) cardinals defend s
In fact, the commented part of the definition of defend
is the single change
I made for my late submission. Sad eh?
The Negamax Score Tree
In order to pick the best move, we make use of the value returned by scoreTree
.
The result of this function is the (very large) tree of all possible allowable
moves, and the scores that each of those moves generates.
Here’s where we see lazy evaluation (not) doing a huge amount of work for us;
once we have this function in place, we can crawl up and down the tree as we
require, and the values we need will be computed only on demand.
The scoreTree
function builds up two levels of the tree at a time,
alternating between states generated when our player goes first, and then the
states generated once the enemy has moved.
Since tron is played as a synchronous game, the scores of intermittent trees,
where only our player has moved are undefined
, but the existence of the node
tells us that the move is legal.
If we really wanted to calculate a value, we could do so, and I’ve left the
code required as a comment.
I actually left the undefined
value there, since it was a good way of making
sure that my algorithms were working (my program would crash if those values
were ever evaluated, which they should never be, in future I think I’d probably
value robustness over correctness!).
scoreTree :: Walls -> Players -> Move -> Maybe Int -> Tree Score
=
scoreTree ws (p, q) move mx Node (score', move)
Node (undefined, m) -- Node ((negate . fst) $ score ws' (destination p m, q) mx, m)
[
[ scoreTree ws' (destination p m, destination q n) n mx'| n <- ns ]
| m <- ms ]
This code makes use of several other values. In particular, ms
and ns
are
the moves that can be made made by our player, and the enemy, respectively.
We save ourselves quite a lot of time when the players are in disjoint parts of
the board by returning Just x
where x
is an upper bound on the number of moves.
If the enemy has more than 0
territory then he may still move, otherwise, no moves remain.
where
= score ws (p, q) mx
(score', mx') = ws // [(p, True), (q, True)]
ws' = filter (survive ws' . destination p) cardinals
ms = case mx of
ns Nothing -> filter (survive ws' . destination q) cardinals
Just x -> if x > 0 then [None] else []
For the truly observant, you’ll notice that I’m actually storing the negation of the scores in this tree; the reason for this is that I’ll be using the negamax variant of the minimax algorithm, which is more naturally expressed in terms of a negative score tree
The tree is initialised by providing an initial configuration for walls and players. Initially, we do not know whether or not the playrs are disjoint.
initTree :: Walls -> Players -> Tree Score
=
initTree ws pq None Nothing scoreTree ws pq
The function which actually decides which move within the tree is optimal is negamax
,
which is a variant of the standard minimax algorithm.
When the depth for the negamax
search is 0
, we simply return the score held
at the node we are inspecting.
negamax :: Tree Score -> Int -> Int -> Int -> Score
0 =
negamax tree _ _ rootLabel tree
Otherwise we must pick the highest scoring score of the negation of the level below. This effectively alternates between minimising and maximising the score at each level of the tree.
=
negamax tree alpha beta depth $ (alpha, None) :
maximumBy maxFst
betaPrune beta []-(fst $ negamax subTree (-beta) (-alpha) (depth-1))
[ (snd $ rootLabel subTree)
, | subTree <- subForest tree]
where
= compare x y maxFst (x,_) (y,_)
To speed things along a little, my algorithm does a beta prune, which takes the first value that is guaranteed to succeed. If I had any sense, I would order this in ascending order of tree depth, but I didn’t end up doing this.
betaPrune :: Int -> [Score] -> [Score] -> [Score]
= xs
betaPrune _ xs [] :xs') =
betaPrune beta xs (xcase beta < fst x of
True -> [x]
False -> betaPrune beta (x:xs) xs'
The negamax
function is pretty crude since we have no idea how long it will
take to return a value.
To get round this, I computed a sequence of moves, in ascending order of
negamax
depth. I called the function that did this bestScores
, since it
returns the list of best scores we can think of.
bestScores :: Tree Score -> [Score]
= (head (map (rootLabel) (subForest tree) ++ [(0, None)])) :
bestScores tree map (negamax tree (-100000) 100000) ([2, 4 .. ])
The first move I take is basically the first legal move that’s available,
which is only there incase we really have no time to compute anything sensible.
Since our tree is only defined properly in even levels, I iterate over only
even numbers, starting from depth 2
.
Recycling Trees
So far our discussion has only considered how to make the best move for a single board state. One cool feature of the infinite tree structure is that values computed can be reused in the next game turn. To do this, we need to assume that the moves we’re given are legal ones, since we only compute legal states in our score tree.
We provide the function inherit
, which basically returns the tree who
inherits the state after a given move.
inherit :: Tree Score -> Move -> Tree Score
=
inherit tree m matchTree m (subForest tree)
The right inheritance is just shorthand for matching the subtree that fits the move that was made. I’ve cheated a little bit in this implementation.
matchTree :: Move -> [Tree Score] -> Tree Score
= error $ "matchTree: Move not found in children!"
matchTree _ [] = t
matchTree _ [t] :ts) =
matchTree m (tcase m == (snd . rootLabel) t of
True -> t
False -> matchTree m ts
Keeping Track of Time
The next thing to consider is how to return a value from our infinite list of
moves within a given timeframe.
To do this I used a function that makes use of Haskell’s in built
timeout :: Int -> IO a -> IO (Maybe a)
function, which tries
to compute the value of some
type a
, and returns Nothing
if it fails.
The idea behind timeoutList
is to provide a default value to output if we run
past some time limit and otherwise we try a strict evaluation of the head of
the list we’re interested in.
If a value is computed, then we use this in the next iteration of
timeoutList
using the same absolute time limit.
The trick is to make sure that we terminate a computation if it’s too slow, and
this is handled perfectly by the standard timeout
function.
timeoutList :: ClockTime -> a -> [a] -> Int -> IO (a, Int)
= do
timeoutList timeLimit m ms n <- getClockTime
time <- if time < timeLimit
mx then do
-- hPutStrLn stderr $ show (diffClockTimes timeLimit time)
return $! head ms)
timeout (toMicrosec (diffClockTimes timeLimit time)) (else return Nothing
case mx of
Nothing -> return (m, n)
Just m' -> do
-- hPutStrLn stderr $ show m'
tail ms) (n+1)
timeoutList timeLimit m' (where
=
toMicrosec timeDiff * 1000000 +
(tdSec timeDiff) fromInteger (tdPicosec timeDiff `div` 1000000)
Strictly speaking, I didn’t need to pass around the Int
s that are there;
they’re just present because they give me some stats about how far down the
recursion tree the algorithm is able to go.
Plumbing and IO
The rest of the program is, well, pretty boring really. It’s all plumbing and
IO, and making sure that we construct the state in the appropriate way.
The real guts of the program are handled by the application of reapply
to the
nextTurn
function, which I explain further down.
The main function does everything you might expect, we start with the initialisation of some timing variables, as well as making sure our input and output buffers are behaving nicely.
main :: IO ()
= do
main let initialDelay = TimeDiff 0 0 0 0 0 0 2900000000000
let normalDelay = TimeDiff 0 0 0 0 0 0 900000000000
LineBuffering
hSetBuffering stdin LineBuffering hSetBuffering stdout
Our next task is to get the state of the board, and this is a good time to start our first timer.
<- getLine
l <- getClockTime time
First up, we read the x
and y
dimensions of the board, and then get all
the lines that make the board itself. The only important information that comes
from the board is the pair (ws, pq)
, where we hold the walls in the grid in
ws
, and the player positions in pq
.
let [x, y] = map read $ words l
<- replicateM y getLine
ls let (ws, pq) = getBoard x y ls
Once we’ve created our state, we initialise our tree, and calculate an initial
score, by making the most of our timeoutList
function.
let tree = initTree ws pq
<- timeoutList (addToClockTime initialDelay time)
((s,move), n) 0, None) (bestScores tree) 0 (
With the score and move computed, we simply output our decision. I also had some debugging code here that gave me vital stats, but this made my algorithm bork on large maps, since a score must be calculated (and the first level of our tree shouldn’t ever calculate a score).
-- hPutStrLn stderr $ "Depth: " ++ show n ++ " Score: " ++ show s
putStrLn $ makeMove move
Finally, we repeat the process for the next turns, where we inherit the tree and pass the position of the enemy.
snd pq)
reapply (nextTurn normalDelay) (inherit tree move, return ()
The reapply
function iterates a monad function over a value, to produce a new
monad. The resulting monad is then applied over the value that was just
computed.
reapply :: Monad m => (a -> m a) -> a -> m a
= f x >>= (\y -> reapply f y) reapply f x
If I wanted to be pointless I might have used the following, but I don’t find it helpful at all:
-- reapply = fix (liftM2 flip ((>>=) .))
Each turn starts off with the previous tree, where the move we computed has been inherited, and the position of the enemy.
nextTurn :: TimeDiff -> (Tree Score, Point) -> IO ((Tree Score, Point))
= do nextTurn timeDelay (tree, enemy)
We then get the new state of the board, which tells us where the enemy moved to, and use this to produce the tree that we must evaluate.
<- getLine
l <- getClockTime
time let [x, y] = map read $ words l
<- replicateM y getLine
ls let move = getEnemyMove x y ls enemy
let tree' = inherit tree move
Once this has been established, we can compute the new move that we want to make, perform the move, and return the tree that we’ve been working with, along with the position of the enemy.
<- timeoutList (addToClockTime timeDelay time)
((s,move'), n) 0, None) (bestScores tree') 0
(-- hPutStrLn stderr $ "Depth: " ++ show n ++ " Score: " ++ show s
putStrLn $ makeMove move'
return $ (inherit tree' move', destination enemy move)
Getting the State
The last few functions we need to define are used in main
and nextTurn
, and
basically construct the state that we use in our algorithm.
You can easily skip the details of these functions, since they’re really not
interesting.
The getBoard
function simply creates a new pair of walls and players, which
represents the state of the board. The only important thing to note is that the
size of ws
, which holds the walls must be of the right dimensions, since we
use this fact in our boundary checking functions.
getBoard :: Int -> Int -> [String] -> (Walls, Players)
=
getBoard x y ls
(ws, (p, q))where
= listArray ((0,0),(y-1,x-1)) (concat ls) :: Array Point Char
grid = amap (== '#') grid
ws = find '1'
p = find '2'
q = head [i | (i,e) <- assocs grid, e == c] find c
After the first turn the only detail we’re interested in is which way the enemy moved. I’m pretty sure that this function could be optimised, but it was pretty clear to me that this one was correct.
getEnemyMove :: Int -> Int -> [String] -> Point -> Move
=
getEnemyMove x y ls p
findMove p p'where
= listArray ((0,0),(y-1,x-1)) (concat ls) :: Array Point Char
grid = find '2'
p' = head [i | (i,e) <- assocs grid, e == c]
find c =
findMove (px,py) (qx,qy) case (qx-px,qy-py) of
0, 1) -> East
( 1, 0) -> South
( 0,-1) -> West
( -1, 0) -> North
(-> error "findMove: Given points not adjacent!" _
Conclusion
I’m really quite surprised that this little AI performed as well as it did, since nothing I’ve implemented is particularly exotic. There are two ideas that I think make this entry work well:
- The tree of all scores.
- The list of best scores.
Using a tree of all scores meant that I could reuse a lot of the expensive
computation that was done from one turn to another, and laziness in Haskell
made it so that I didn’t have to worry about ever computing the entire thing.
I think it’s pretty amazing that the structure of the tree can be expressed so
succinctly in the scoreTree
function.
Conceptually, I enjoyed the fact that my algorithm expressed every single
possible move in this tree, and it was then just a case of traversing it
effectively.
Using this form of structured programming really showed its strengths in the
simplicity of the negamax
algorithm, which rivals even pseudocode in terms of
its clarity and succinctness.
Coming up with the idea of a list of best scores generated at each turn was a
real “aha!” moment for me, and I think I struggled most with the definition of
timeoutList
. Once I realised that I could use an absolute point in time as a
time limit, though, everything seemed to fall into place.
In summary, the lessons I’ve learnt from this are:
- Lazy evaluation is fantastic.
- Winning competition entries don’t have to be rocket science.
- Always submit competition entries at least an our before the deadline.