Sunday, June 7, 2015

Sieve of Eratosthenes

I noticed that the Crystal Programming Language homepage has an example of the Sieve of Eratosthenes algorithm for finding prime numbers. I thought it would be fun to show a few simple variations of that algorithm in Factor.

The algorithm works by iteratively marking as not prime the multiples of each prime, starting with the multiples of 2. In pseudocode, it looks like this:

Input: an integer n > 1
 
Let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
 
 for i = 2, 3, 4, ..., not exceeding n:
  if A[i] is true:
    for j = i2, i2+i, i2+2i, i2+3i, ..., not exceeding n :
      A[j] := false
 
Output: all i such that A[i] is true.

We will be looking at the performance and memory usage to calculate the 664,579 prime numbers between 1 and 10,000,000.

Version 1

Our first version, is a straightforward implementation from the pseudocode:

:: sieve1 ( n -- primes )
    n 1 + t <array> :> sieve
    f 0 sieve set-nth
    f 1 sieve set-nth

    2 n sqrt [a,b] [| i |
        i sieve nth [
            i sq n i <range> [| j |
                f j sieve set-nth
            ] each
        ] when
    ] each

    sieve [ drop ] selector [ each-index ] dip ;

We can show that it works for the first few prime numbers:

IN: scratchpad 25 sieve1 .
V{ 2 3 5 7 11 13 17 19 23 }

Calculating prime numbers up to 10 million takes less than half a second:

IN: scratchpad [ 10,000,000 sieve1 ] time
Running time: 0.408065579 seconds

The memory used is basically an array with 10 million elements in it, each element of the array is a boolean (64 bits on my laptop). Total memory used is 80 MB.

Version 2

Storing booleans can be more memory-efficient using bit-arrays, where each boolean is stored as a single bit. The logic is the same, however primes are marked by false rather than true.

:: sieve2 ( n -- primes )
    n 1 + <bit-array> :> sieve
    t 0 sieve set-nth
    t 1 sieve set-nth

    2 n sqrt [a,b] [| i |
        i sieve nth [
            i sq n i <range> [| j |
                t j sieve set-nth
            ] each
        ] unless
    ] each

    sieve [ drop not ] selector [ each-index ] dip ;

Calculating prime numbers up to 10 million is faster:

IN: scratchpad [ 10,000,000 sieve2 ] time
Running time: 0.241894524 seconds

The memory used is now one bit per number, so only 10 million bits or 1.25 MB.

Version 3

Since we know that even numbers (except 2) are not prime, we can only work with odd numbers and start our search from 3. For each number found, we can check only the odd multiples by marking off i2 + k*i for only even k.

:: sieve3 ( n -- sieve )
    n dup odd? [ 1 + ] when 2/ <bit-array> :> sieve
    t 0 sieve set-nth

    3 n sqrt 2 <range> [| i |
        i 2/ sieve nth [
            i sq n i 2 * <range> [| j |
                t j 2/ sieve set-nth
            ] each
        ] unless
    ] each

    V{ 2 } clone :> primes

    sieve [
        swap [ drop ] [ 2 * 1 + primes push ] if
    ] each-index

    primes ;
Note: the convenience of selector is not available because we want to initialize the list of primes with 2. Instead, we just do it manually.

Calculating prime numbers up to 10 million is much faster!

IN: scratchpad [ 10,000,000 sieve3 ] time
Running time: 0.110387127 seconds

As for memory usage, it is storing one bit for each odd number, so only 5 million bits or 625 KB.

1 comment:

Unknown said...

Note that you can even store them more efficiently, as is done in the standard math.primes.erato vocabulary. Using it, math.primes stores the primes up to 8999999 using only 300kB.