"Euler #3 : Sieving Prime Factors"

by Nicolas Wu


Posted on 15 July 2010

Tags: Haskell, Euler


This week’s challenge:

Find the largest prime factor of a composite number.

What’s the prime factor of a number? Well, the short answer is that it’s the unique bag of primes that, when multiplied together, gives you the number you’re looking for.

The composite number we’ve been challenged with is 600851475143, which is pretty massive. In fact, since an Int can only promise an upper bound of 2^29-1 = 536870911, we’ll have to use a different type, called Integer in our calculations, since these are unbounded.

Values of type Integer work just like Int, and also make use of functions like (+) and div. You might be wondering how this works in terms of the types and the truth is that I simplified things earlier on, when we discussed the type of those functions.

The type of (+) is actually:

(+) :: (Num a) => a -> a -> a

This basically says that (+) is polymorphic and takes values of type a, where each a must be part of the Num class. We haven’t talked about classes yet, but they’re basically a way of expressing the fact that some types are related, like Int and Integer, and other numbers like Float that represent fractions. Other functions like div and mod are defined similarly. I’ll talk more about this in another article.

Sieving Primes

To solve our problem, we’ll need a definition that gives us the primes. To make our version reasonably efficient, we’ll implement a version of the Sieve of Eratosthenes, and pay tribute to the old codger while we strike off the primes from our list as we encounter them (though, apparently, the variant used here is actually due to Turner).

> primes :: [Integer]
> primes =  sieve [2 .. ]
>   where
>     sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]

This definition makes use of a where clause, where part of the definition is given after the main body of our function, and restricted to the scope of the parent function. Here a where is really what we want because sieve is defined in terms of itself, as a recursive function, and we don’t particularly want to expose its definition to the rest of the program.

The idea behind sieve is that we start with some prime number p and the rest of the numbers xs that we haven’t considered yet. The list of primes becomes the value of p tagged onto the front of the rest of xs where we remove multiples of p.

Integer Conversions

Our next task is to use our list of primes and find out the largest one which factors into our number. Using basic arithmetic, it’s obvious to see that our factor will only be as big as the square root of our number, unless it is prime. We’ll start by defining a function that calculates integer square roots for us.

The prelude gives us a function sqrt that gives us the square root of a floating point number, and has this type:

sqrt :: (Floating a) => a -> a

Here, the class Floating is used for types that represent numbers with a “floating point”, like 1.5, or 4.125.

Since we want to work with values of type Integer, or Int, we want to convert to and from these types. To do so, we’ll make use of fromIntegral:

fromIntegral :: (Integral a, Num b) => a -> b

This basically takes a value of type a that is a member of the Integral class, which are just whole numbers, and produces a value of type b that is a Num. To convert from our number that is fractional, we make use of the function floor, which basically removes the fractional remainder of a number.

We can then compose these functions together to get a version of sqrt that works for values of type Integer:

> isqrt :: (Integral a) => a -> a
> isqrt = floor . sqrt . fromIntegral

Producing Factors

We can now write a function, pfactors, that solves the problem for us by finding the prime factors. This definition uses a list comprehension to generate all the primes less than the square root of the number, x, that we’re interested in, such that each of those primes is a factor of x:

> pfactors :: Integer -> [Integer]
> pfactors x
>   | null ps   = [x]
>   | otherwise = ps
>   where ps = [p | p <- takeWhile (<= isqrt x) primes, x `mod` p == 0]

Sadly, using this version is very inefficient, since we’re not making the most of the fact that when we find a factor, p, we can reduce the problem to finding the factors of our original number divided by p. In other words, we know that the following property holds when p is a prime factor:

pfactors x == p : pfactors (x `div` p)

So let’s write a program that does make the most of this fact:

> pfactors' :: Integer -> [Integer] -> [Integer]
> pfactors' x (p:ps)
>   | p > x          = []
>   | x `mod` p == 0 = p : pfactors' (x `div` p) (p:ps)
>   | otherwise      = pfactors' x ps

This version takes an additional argument, (p:ps), which is the increasing list of all primes. The first condition p > x has the same effect as our takeWhile, and returns the empty list when we know that no more factors can be found.

The next condition makes use of our fact about factors directly: if p is a multiple of x, then we add p to the list of prime factors found, and we continue factoring using x `div` p. The only tricky part is that we need to remember to pass on the whole list of primes (p:ps), and include p, since that prime might factor more than once into the new number.

If none of the other conditions hold, then we know that p is not a factor, and we keep searching the rest of ps for factors of x.

To answer our problem, we want to find the last value in our list, since factors are produced in increasing order, which we can do with the function last, found in the prelude:

last :: [a] -> a

This function just takes a list, and returns its last value. Putting this together, we have the following:

> euler3 = last (pfactors' 600851475143 primes)

Summary

In this article we took a quick look at classes, and how they can be used to relax some of functions, like (+), so that they can be used in related types, like Int and Integer. We’ll talk more about classes in another article.

We’ve also seen the following points:

  • Using a where clause allows us to create local function definitions.
  • Values of different types can be converted.
  • Exploiting problem properties makes algorithms much faster!

Comments