Euler #7 : Benchmarked Primes
by Nicolas Wu
Posted on 12 August 2010
This week’s Project Euler question is:
Find the 10001st prime.
We already described a simple algorithm for finding primes in a previous post, so rather than repeat ourselves, in this article we’ll discuss benchmarking using Criterion to find the fastest prime number algorithm that doesn’t require too much magic. I’ll be taking implementations found on the Haskell wiki.
Imported modules
First we’ll need to import the Criterion
modules, which provides us with our benchmarking suite:
import System.Environment (getArgs, withArgs)
import Criterion (bgroup, bench, nf)
import Progression.Main (defaultMain)
import Data.List.Ordered (minus, union)
I’ll actually be using Progression in conjuction with Criterion, which just makes collecting the results of several benchmarks a little easier.
Prime Algorithms
The prime number generator we used previously was Turner’s sieve, defined as follows:
turner :: [Int]
= sieve [2 .. ]
turner where
:xs) = p : sieve [x | x <- xs, x `mod` p /= 0] sieve (p
The Haskell wiki documents a whole range of other algorithms that can be used to generate primes. Here are the definitions that I pulled from the wiki:
postSieve :: [Int]
= 2 : 3 : sieve (tail postSieve) [5,7..]
postSieve where
:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
sieve (pwhere (h,~(_:t)) = span (< p*p) xs
trialOdds :: [Int]
= 2 : 3 : filter isPrime [5,7..]
trialOdds where
= all (notDivs n)
isPrime n $ takeWhile (\p-> p*p <= n) (tail trialOdds)
= n `mod` p /= 0
notDivs n p
nestedFilters :: [Int]
= 2 : 3 : sieve [] (tail nestedFilters) 5
nestedFilters where
= n `mod` d /= 0
notDivsBy d n :ps) x = foldr (filter . notDivsBy) [x,x+2..p*p-2] ds
sieve ds (p++ sieve (p:ds) ps (p*p+2)
spansPrimes :: [Int]
= 2 : 3 : sieve 0 (tail spansPrimes) 5
spansPrimes where
:ps) x = [n | n <- [x,x+2..p*p-2], and [n`rem`p/=0 | p <- fs]]
sieve k (p++ sieve (k+1) ps (p*p+2)
where fs = take k (tail spansPrimes)
bird :: [Int]
= 2 : primes'
bird where
= [3] ++ [5,7..] `minus` foldr union' [] mults
primes' = map (\p -> let q=p*p in (q,tail [q,q+2*p..])) $ primes'
mults = q : union qs xs
union' (q,qs) xs
wheel :: [Int]
= 2:3:primes'
wheel where
1:p:candidates = [6*k+r | k <- [0..], r <- [1,5]]
= p : filter isPrime candidates
primes' = all (not . divides n)
isPrime n $ takeWhile (\p -> p*p <= n) primes'
= n `mod` p == 0 divides n p
I won’t go into the details of explaining these different algorithms, since I want us to focus on how we might benchmark these implementations.
Benchmarking
In order to compare these different algorithms,
we construct a program that takes as its argument the name
of the function that should be used to produce prime numbers.
Once the user has provided this input, the benchmark is
executed using Criterion to produce the first 101
, 1001
, and 10001
primes.
= do
main <- getArgs
args let !primes = case head args of
"turner" -> turner
"postSieve" -> postSieve
"trialOdds" -> trialOdds
"nestedFilters" -> nestedFilters
"spansPrimes" -> spansPrimes
"bird" -> bird
"wheel" -> wheel
-> error "prime function unkown!"
_ "-n" ++ (head args)) : tail args) $ do
withArgs ((. bgroup "Primes" $
defaultMain "101" $ nf (\n -> primes !! n) 101
[ bench "1001" $ nf (\n -> primes !! n) 1001
, bench "10001" $ nf (\n -> primes !! n) 10001
, bench ]
We then run this code with each prime function name as an argument individually, and the Progression library puts the results together. Here’s a bar chart generated from the data:
These results have been normalised against the turner
function,
and show the results of how long it took for the various algorithms to find the 10001
th, 1001
th and 100
th primes.
Solving this week’s problem is a simple case of running any one of these algorithms on our magic number:
= spansPrimes 10001 euler7
Summary
Collecting benchmark information with Criterion and Progression is really quite simple! The best thing about Criterion is that the benchmarking is very robust: detailed statistics are returned regarding the benchmarking process, and whether the results are likely to be accurate. Progression makes the collation of several runs of benchmarks very simple, and means that different versions of a program can be compared with ease.