A Tron AI in Haskell

by Nicolas Wu


Posted on 9 September 2010

Tags: ,


This week’s post is a reprint of an article I wrote on 10th March 2010 about the Google AI contest. Since I’m deprecating my blogspot account, I wanted to make sure that this article didn’t get lost in the waves of the web.

Recently Google released a wonderful AI contest, and I submitted an entry. The aim of the competition was to produce an AI that could play tron effectively; this reminded me of my undergrad days, where hours were spent with friends around a single keyboard playing a simple 3D tron game.

The day before the end results, my bot (called zenzike) was ranked 35th in the world, and top of the Haskell league. Sadly though, my final submission never made it through to the final cut; I was submitting from a hotel room with a flaky internet connection, and my final entry clocked in 10 minutes too late, thus disqualifying me. The Haskell champion, jaspervdj, (congratulations!) has written a lovely blog article about his code, and released it for the whole world to oggle. I thought I’d try putting my bot against his to see how it would fare, and it turns out that of the basic starter pack, my AI performs favourably on the majority of maps that I’ve tried.

So here’s my code just for kicks, this is the version of my code one release before my disqualified entry, plus some formatting, like removing couple functions that I never ended up using, as well as writing it up for a wider audience

Strategy

The basic strategy for this bot is a recursive minimax algorithm with alpha-beta pruning. In actual fact, this implements the negamax variant, which makes the code a lot simpler. As a scoring heuristic, I simply used the difference between the amount of territory that each player controls, where we define territory as any space in the board which a player can reach first.

Imports and Datatypes

First up, here’s the list of imports that I made use of for my entry, there’s nothing of much interest here.

import Data.Ix (inRange)
import Data.List (maximumBy, nub, (\\), unfoldr)
import Data.Array.Diff (Array, listArray, amap, assocs, (//), bounds, (!))
import Data.Tree (Tree(Node), rootLabel, subForest)
import qualified Data.Map as M
import System.Time (ClockTime, TimeDiff(TimeDiff), tdSec, tdPicosec, diffClockTimes, getClockTime, addToClockTime)
import System.IO (stdin, stdout, stderr, hPutStrLn, BufferMode(LineBuffering), hSetBuffering)
import System.Timeout (timeout)
import Control.Monad (replicateM)

The datatypes I make use of are pretty basic. First I define what a move might be, and these are the cardinals are, as well as a move, None that represents an empty move.

data Move  = North | East | South | West | None deriving (Eq,Ord,Show)
cardinals :: [Move]
cardinals = [North, East, South, West]

We use the type Move internally since it’s easier to read than a bunch of number strings, but ultimately the tournament is set up so that there is the following correspondence between the moves we want to make, and the string that indicates the move:

makeMove :: Move -> String
makeMove None  = "1"
makeMove North = "1"
makeMove East  = "2"
makeMove South = "3"
makeMove West  = "4"

The basic type I use to identify each point on the board is a Point which holds the (y,x) coordinate of the point in question. I’ve stored the points in this unusual format since it makes IO a little easier later on in the program.

type Point = (Int,Int)

Players are simply defined as a pair of points, and the walls in a given board are simply an array of Bool, which stores True at every point where a wall is present. The dimensions of the array are the dimensions of the game board itself.

type Players = (Point, Point)
type Walls = Array Point Bool

Finally, the Score type is a synonym for a pair of values (x,m), where x holds the score that is associated to a particular board state (assigned through our heuristic), and m holds the move that a player would have had to make to get the board in that state.

type Score = (Int,Move)

Utility Functions

Probably the most basic utility function I use is destination, which, given a current point and a move, returns the point where the move would end up at, irrespective of walls or world boundaries.

destination :: Point -> Move -> Point
destination (y,x) move =
  case move of
    East  -> (y,x+1)
    South -> (y+1,x)
    West  -> (y,x-1)
    North -> (y-1,x)
    None  -> (y,x)

The destination function alone doesn’t give us very much to work with, so we’ll also define a function, adjacent, that indicates the points that are adjacent to the point we’re considering. We parameterise this function with a condition predicate, which allows us to specialise adjacent to consider not only legal adjacent moves, but also those that end in definite survival, or in suicide.

adjacent :: (Walls -> Point -> Bool) -> Walls -> Point -> [Point]
adjacent condition ws p =
  filter (condition ws) $ map (destination p) cardinals

The idea behind adjacent is that we filter the moves that result from moving in each of the cardinal directions, starting from our given point.

A simple condition we might use here would filter only moves that are within the bounds of the game board:

bounded :: Walls -> Point -> Bool
bounded ws p =
  inRange (bounds ws) p

From this condition we can build two more; one that guarantees survival, and the other that ends in (legal) suicide.

survive :: Walls -> Point -> Bool
survive ws p =
  bounded ws p && not (ws ! p)
suicide :: Walls -> Point -> Bool
suicide ws p =
  bounded ws p && ws ! p

I don’t think it’s surprising that I never ended up using the suicide filter.

Scoring Heuristic

An intrinsic part of our algorithm is the scoring heuristic. The score is calculated by making use of a measure of the territory of each player, which we define as the points which a player can reach before the other.

To measure the territory, we first define distances, which returns a mapping that relates points to their distance from a particular point. This function effectively unfolds an initial seed value, increasing the radius at each level of the unfold. The key function here is expand, which we’ll describe in a second.

distances :: Walls -> Point -> M.Map Point Int
distances ws p =
  M.fromList $
    [(q,d) | (qs, d) <- zip (unfoldr (expand ws) ([p],[p])) [0 .. ], q <- qs]

The expand function makes use of a seed value, where each seed (ps, qs) holds a list of values that have been considered, ps, and a subset of that list qs that represent the outer shell of those values. If there are no values in that outer shell, then we’ve expanded as much as possible, so there is nothing left to return. Otherwise, we return the current outer shell qs, paired with the new seed, which is the old set of all values ps alongside the new shell qs'.

expand :: Walls -> ([Point],[Point]) -> Maybe ([Point], ([Point],[Point]))
expand _  (_ , []) = Nothing
expand ws (ps, qs) = Just (qs, (ps ++ qs', qs'))
  where
    qs' = (nub $ concatMap (adjacent survive ws) qs) \\ ps

An implicit assumption of this function is that ps and qs contain lists of unique elements; this was true in our initialisation within distances, and is preserved whenever we create a new seed.

Given two distances maps, one for each player, we can calculate the territory mapping quite easily. We use positive values to indicate the distances reachable by our player first, and negative values to indicate the distances reachable by the enemy first.

The territory function works by taking the union of the maps for each of the players, after negating the map that belongs to the enemy. If only one player has the key to a point in their map, then the value doesn’t change under map union, so the only case we need to worry about is when the key belongs to both players. This behaviour is handled by the “with” function that we use in this union, which calculates the player that would get there first.

territory :: M.Map Point Int -> M.Map Point Int -> M.Map Point Int
territory ps qs =
  M.unionWith (\x y -> -(x + y)) ps (M.map negate qs)

The scoring heuristic is the first meaty part of my program we consider. The heuristic simply calculates the score of a given board state, where a board is basically just some Walls and Players, and is given by the function score. We feed in an additional Maybe Int, whose value is Just x when the territory owned by the enemy has an upper bound of x.

score :: Walls -> Players -> Maybe Int -> (Int, Maybe Int)
score ws (p,q) mx =

First we look at the case of mx, if it’s Nothing, then we don’t know how much territory the enemy has, and we’ll have to calculate it fully. Here the values tp and tq are the territory sizes of players (p, q).

  case mx of
    Nothing ->
      if p == q then (0, Nothing) else (attack $ tp - tq, mx')

Otherwise, if mx is Just x, then all we need to do is find out the size of our territory, and remove x. We also have to return Just (x-1), since we assume that the enemy is playing optimally.

    Just x ->
      (defend $ M.size dps - x, Just (x-1))

The value of (tp,tq) is the result of folding the function tally over all the values in a territory map, which basically sums up the amount of territory each player has.

  where
    (tp,tq) = M.fold tally (0,0) (territory dps dqs)
    tally d (x,y) =
      if d == 0 then (x,y) else
        if d > 0 then (x+1,y) else (x,y+1)
    dps = distances ws p
    dqs = distances ws q
    mx' = if M.member p dqs then Nothing else Just (M.size dqs - 1)
    attack s = (10 * s)
    defend s = (10 * s) -- + (length . filter (suicide ws . destination p)) cardinals

In fact, the commented part of the definition of defend is the single change I made for my late submission. Sad eh?

The Negamax Score Tree

In order to pick the best move, we make use of the value returned by scoreTree. The result of this function is the (very large) tree of all possible allowable moves, and the scores that each of those moves generates. Here’s where we see lazy evaluation (not) doing a huge amount of work for us; once we have this function in place, we can crawl up and down the tree as we require, and the values we need will be computed only on demand.

The scoreTree function builds up two levels of the tree at a time, alternating between states generated when our player goes first, and then the states generated once the enemy has moved. Since tron is played as a synchronous game, the scores of intermittent trees, where only our player has moved are undefined, but the existence of the node tells us that the move is legal. If we really wanted to calculate a value, we could do so, and I’ve left the code required as a comment. I actually left the undefined value there, since it was a good way of making sure that my algorithms were working (my program would crash if those values were ever evaluated, which they should never be, in future I think I’d probably value robustness over correctness!).

scoreTree :: Walls -> Players -> Move -> Maybe Int -> Tree Score
scoreTree ws (p, q) move mx =
  Node (score', move)
    [ Node (undefined, m) -- Node ((negate . fst) $  score ws' (destination p m, q) mx, m)
      [ scoreTree ws' (destination p m, destination q n) n mx'
      | n <- ns ]
    | m <- ms ]

This code makes use of several other values. In particular, ms and ns are the moves that can be made made by our player, and the enemy, respectively. We save ourselves quite a lot of time when the players are in disjoint parts of the board by returning Just x where x is an upper bound on the number of moves. If the enemy has more than 0 territory then he may still move, otherwise, no moves remain.

  where
    (score', mx') = score ws (p, q) mx
    ws' = ws // [(p, True), (q, True)]
    ms  = filter (survive ws' . destination p) cardinals
    ns  = case mx of 
            Nothing -> filter (survive ws' . destination q) cardinals
            Just x  -> if x > 0 then [None] else []

For the truly observant, you’ll notice that I’m actually storing the negation of the scores in this tree; the reason for this is that I’ll be using the negamax variant of the minimax algorithm, which is more naturally expressed in terms of a negative score tree

The tree is initialised by providing an initial configuration for walls and players. Initially, we do not know whether or not the playrs are disjoint.

initTree :: Walls -> Players -> Tree Score
initTree ws pq =
  scoreTree ws pq None Nothing

The function which actually decides which move within the tree is optimal is negamax, which is a variant of the standard minimax algorithm. When the depth for the negamax search is 0, we simply return the score held at the node we are inspecting.

negamax :: Tree Score -> Int -> Int -> Int ->  Score
negamax tree _ _ 0 =
  rootLabel tree

Otherwise we must pick the highest scoring score of the negation of the level below. This effectively alternates between minimising and maximising the score at each level of the tree.

negamax tree alpha beta depth =
  maximumBy maxFst $ (alpha, None) :
    betaPrune beta []
    [ (-(fst $ negamax subTree (-beta) (-alpha) (depth-1))
      , snd $ rootLabel subTree)
    | subTree <- subForest tree]
  where
    maxFst (x,_) (y,_) = compare x y

To speed things along a little, my algorithm does a beta prune, which takes the first value that is guaranteed to succeed. If I had any sense, I would order this in ascending order of tree depth, but I didn’t end up doing this.

betaPrune :: Int -> [Score] -> [Score] -> [Score]
betaPrune _    xs []      = xs
betaPrune beta xs (x:xs') =
  case beta < fst x of
    True  -> [x]
    False -> betaPrune beta (x:xs) xs'

The negamax function is pretty crude since we have no idea how long it will take to return a value. To get round this, I computed a sequence of moves, in ascending order of negamax depth. I called the function that did this bestScores, since it returns the list of best scores we can think of.

bestScores :: Tree Score -> [Score]
bestScores tree = (head (map (rootLabel) (subForest tree) ++ [(0, None)])) :
  map (negamax tree (-100000) 100000) ([2, 4 .. ])

The first move I take is basically the first legal move that’s available, which is only there incase we really have no time to compute anything sensible. Since our tree is only defined properly in even levels, I iterate over only even numbers, starting from depth 2.

Recycling Trees

So far our discussion has only considered how to make the best move for a single board state. One cool feature of the infinite tree structure is that values computed can be reused in the next game turn. To do this, we need to assume that the moves we’re given are legal ones, since we only compute legal states in our score tree.

We provide the function inherit, which basically returns the tree who inherits the state after a given move.

inherit :: Tree Score -> Move -> Tree Score
inherit tree m =
  matchTree m (subForest tree)

The right inheritance is just shorthand for matching the subtree that fits the move that was made. I’ve cheated a little bit in this implementation.

matchTree :: Move -> [Tree Score] -> Tree Score
matchTree _ []     = error $ "matchTree: Move not found in children!"
matchTree _ [t]    = t
matchTree m (t:ts) =
  case m == (snd . rootLabel) t of
    True  -> t
    False -> matchTree m ts

Keeping Track of Time

The next thing to consider is how to return a value from our infinite list of moves within a given timeframe. To do this I used a function that makes use of Haskell’s in built timeout :: Int -> IO a -> IO (Maybe a) function, which tries to compute the value of some type a, and returns Nothing if it fails.

The idea behind timeoutList is to provide a default value to output if we run past some time limit and otherwise we try a strict evaluation of the head of the list we’re interested in. If a value is computed, then we use this in the next iteration of timeoutList using the same absolute time limit. The trick is to make sure that we terminate a computation if it’s too slow, and this is handled perfectly by the standard timeout function.

timeoutList :: ClockTime -> a -> [a] -> Int -> IO (a, Int)
timeoutList timeLimit m ms n = do
  time <- getClockTime
  mx   <- if time < timeLimit
    then do
      -- hPutStrLn stderr $ show (diffClockTimes timeLimit time)
      timeout (toMicrosec (diffClockTimes timeLimit time)) (return $! head ms)
    else return Nothing
  case mx of
    Nothing -> return (m, n)
    Just m' -> do
      -- hPutStrLn stderr $ show m'
      timeoutList timeLimit m' (tail ms) (n+1)
  where
    toMicrosec timeDiff =
      (tdSec timeDiff) * 1000000 +
      fromInteger (tdPicosec timeDiff `div` 1000000)

Strictly speaking, I didn’t need to pass around the Ints that are there; they’re just present because they give me some stats about how far down the recursion tree the algorithm is able to go.

Plumbing and IO

The rest of the program is, well, pretty boring really. It’s all plumbing and IO, and making sure that we construct the state in the appropriate way. The real guts of the program are handled by the application of reapply to the nextTurn function, which I explain further down.

The main function does everything you might expect, we start with the initialisation of some timing variables, as well as making sure our input and output buffers are behaving nicely.

main :: IO ()
main = do
  let initialDelay = TimeDiff 0 0 0 0 0 0 2900000000000
  let normalDelay  = TimeDiff 0 0 0 0 0 0  900000000000

  hSetBuffering stdin  LineBuffering
  hSetBuffering stdout LineBuffering

Our next task is to get the state of the board, and this is a good time to start our first timer.

  l <- getLine
  time <- getClockTime

First up, we read the x and y dimensions of the board, and then get all the lines that make the board itself. The only important information that comes from the board is the pair (ws, pq), where we hold the walls in the grid in ws, and the player positions in pq.

  let [x, y] = map read $ words l
  ls <- replicateM y getLine
  let (ws, pq) = getBoard x y ls

Once we’ve created our state, we initialise our tree, and calculate an initial score, by making the most of our timeoutList function.

  let tree = initTree ws pq
  ((s,move), n) <- timeoutList (addToClockTime initialDelay time)
    (0, None) (bestScores tree) 0

With the score and move computed, we simply output our decision. I also had some debugging code here that gave me vital stats, but this made my algorithm bork on large maps, since a score must be calculated (and the first level of our tree shouldn’t ever calculate a score).

  -- hPutStrLn stderr $ "Depth: " ++ show n ++ "  Score: " ++ show s
  putStrLn $ makeMove move

Finally, we repeat the process for the next turns, where we inherit the tree and pass the position of the enemy.

  reapply (nextTurn normalDelay) (inherit tree move, snd pq)
  return ()

The reapply function iterates a monad function over a value, to produce a new monad. The resulting monad is then applied over the value that was just computed.

reapply :: Monad m => (a -> m a) -> a -> m a
reapply f x = f x >>= (\y -> reapply f y)

If I wanted to be pointless I might have used the following, but I don’t find it helpful at all:

-- reapply = fix (liftM2 flip ((>>=) .))

Each turn starts off with the previous tree, where the move we computed has been inherited, and the position of the enemy.

nextTurn :: TimeDiff -> (Tree Score, Point) -> IO ((Tree Score, Point))
nextTurn timeDelay (tree, enemy) = do

We then get the new state of the board, which tells us where the enemy moved to, and use this to produce the tree that we must evaluate.

  l <- getLine
  time <- getClockTime
  let [x, y] = map read $ words l
  ls <- replicateM y getLine
  let move  = getEnemyMove x y ls enemy
  let tree' = inherit tree move

Once this has been established, we can compute the new move that we want to make, perform the move, and return the tree that we’ve been working with, along with the position of the enemy.

  ((s,move'), n) <- timeoutList (addToClockTime timeDelay time)
    (0, None) (bestScores tree') 0
  -- hPutStrLn stderr $ "Depth: " ++ show n ++ "  Score: " ++ show s
  putStrLn $ makeMove move'
  return $ (inherit tree' move', destination enemy move)

Getting the State

The last few functions we need to define are used in main and nextTurn, and basically construct the state that we use in our algorithm. You can easily skip the details of these functions, since they’re really not interesting.

The getBoard function simply creates a new pair of walls and players, which represents the state of the board. The only important thing to note is that the size of ws, which holds the walls must be of the right dimensions, since we use this fact in our boundary checking functions.

getBoard :: Int -> Int -> [String] -> (Walls, Players)
getBoard x y ls =
  (ws, (p, q))
  where
    grid   = listArray ((0,0),(y-1,x-1)) (concat ls) :: Array Point Char
    ws     = amap (== '#') grid
    p      = find '1'
    q      = find '2'
    find c = head [i | (i,e) <- assocs grid, e == c]

After the first turn the only detail we’re interested in is which way the enemy moved. I’m pretty sure that this function could be optimised, but it was pretty clear to me that this one was correct.

getEnemyMove :: Int -> Int -> [String] -> Point -> Move
getEnemyMove x y ls p =
  findMove p p'
  where
    grid   = listArray ((0,0),(y-1,x-1)) (concat ls) :: Array Point Char
    p'     = find '2'
    find c = head [i | (i,e) <- assocs grid, e == c]
    findMove (px,py) (qx,qy) =
      case (qx-px,qy-py) of
        ( 0, 1) -> East
        ( 1, 0) -> South
        ( 0,-1) -> West
        (-1, 0) -> North
        _       -> error "findMove: Given points not adjacent!"

Conclusion

I’m really quite surprised that this little AI performed as well as it did, since nothing I’ve implemented is particularly exotic. There are two ideas that I think make this entry work well:

  • The tree of all scores.
  • The list of best scores.

Using a tree of all scores meant that I could reuse a lot of the expensive computation that was done from one turn to another, and laziness in Haskell made it so that I didn’t have to worry about ever computing the entire thing. I think it’s pretty amazing that the structure of the tree can be expressed so succinctly in the scoreTree function. Conceptually, I enjoyed the fact that my algorithm expressed every single possible move in this tree, and it was then just a case of traversing it effectively. Using this form of structured programming really showed its strengths in the simplicity of the negamax algorithm, which rivals even pseudocode in terms of its clarity and succinctness.

Coming up with the idea of a list of best scores generated at each turn was a real “aha!” moment for me, and I think I struggled most with the definition of timeoutList. Once I realised that I could use an absolute point in time as a time limit, though, everything seemed to fall into place.

In summary, the lessons I’ve learnt from this are:

  • Lazy evaluation is fantastic.
  • Winning competition entries don’t have to be rocket science.
  • Always submit competition entries at least an our before the deadline.

Comments