Dreamer of Worlds: Algorithms: Prime Number Revisited, The Sieve of Eratosthenes

The last time we played with prime numbers, we created an algorithm that would start with the first prime as a seed, and then discover each subsequent prime in the sequence by testing it against all known primes. The problem with this algorithm is that although it works, it needs a large number of computations once the prime list gets high enough.

What if we could eliminate many of those computations, and use a boolean array to immediately tell if a number was prime or not?

The good news is, we can. And this method has been known since the time of the ancient Greeks. It's easy enough to do by hand, but when working with very large numbers, computers can do a much faster job than a human. The method is known as the Sieve of Eratosthenes.

In order to use the method, you must first have a target. Let's say that you want to find all the prime numbers up to 10. You start by making a list of all numbers from 2 to 10.

2, 3, 4, 5, 6, 7, 8, 9, 10

The first number on your list is 2, and that's because 2 is the first prime number. I will underline it to mark it as prime.

2, 3, 4, 5, 6, 7, 8, 9, 10

From here, you cross off all the numbers that are multiples of 2.

2, 3, 4, 5, 6, 7, 8, 9, 10

The next number that is not crossed off is the next prime in the series, which is 3. We underline 3 to mark it as prime, and cross off the multiples of 3.

2, 3, 4, 5, 6, 7, 8, 9, 10

The next number not crossed off is 5, which is prime. We repeat the process with 7, and we are left with our list of primes.

2, 3, 4, 5, 6, 7, 8, 9, 10

The algorithm is straightforward, with a base case and a finite number of steps. We can easily code this to accept a target number, generate an array of the numbers between 2 and the target, and then run the sieve method on the list to generate all primes. We can even optimize it so that we are not working with a number type at all, but rather a boolean type using a hash-table.

In my implementation, I create a hash-table as a simple array of booleans. The index of the array represents the actual number. If the value is true, then the number is prime. If the value is false, then the number is non-prime. Once the hash-table has computed, the algorithm will walk the table and add each prime to a list of numbers, returning it to the calling method.



        static List<int> SieveOfEratosthenes_Full(int x)

        {

            //The sieve will generate a list of primes between 0 and x.



            //Array of booleans to determine whether or not a given index is a prime number.

            //For example, primes[2] == true because 2 is a prime number while primes[4] == false because 4 is not prime.

            bool[] sieve = new bool[(int)(x + 1.0)];



            //Seed the starting values. 0 and 1 are not considered prime and are seeded as false.

            sieve[0] = false;

            sieve[1] = false;



            //2 is the first prime number and is seeded as true. Each number after 2 is considered prime until ruled out by the sieve

            //and is seeded as true.

            for (int y = 2; y <= x; y++)

                sieve[y] = true;



            //The algorithm walks the array. If the number is prime, then all multiples of that number are marked as false.

            //The algorithm continues ruling out numbers based on their status as a multiple of a prime.

            //The algorithm starts at 2, the first prime.

            for (int y = 2; y <= x; y++)

                if (sieve[y])

                    for (int z = 2 * y; z <= x; z += y)

                        sieve[z] = false;



            List<int> primes = new List<int>();



            for (int y = 2; y < sieve.Length; y++)

                if (sieve[y])

                    primes.Add(y);



            return primes;

        }

The algorithm is in place and runs quite fast. While discussing ways to improve it, several programmers suggested to only test odd numbers as we know all even numbers except 2 are non-prime. By using a hash-table with no gaps, we would cut the physical memory required down by half.

The second implementation I came up with does just that. Simple formulas will convert a number to an index and back again. The hash-table holds all odd values from 3 to the target number. The issue with the implementation is that the code becomes more complex. Even with comments, it's not immediately clear why some of the in-line calculations are happening.

static List<int> SieveOfEratosthenes_Odd(int x)

{

//The sieve will generate a list of primes between 3 and x, only odd numbers.

bool[] sieve = new bool[(int)(Math.Ceiling(x / 2.0) - 1)];

for (int z = 0; z < sieve.Length; z++)

sieve[z] = true;

            //The sieve only needs the odd numbers from 3 to x. The prime's index in the sieve is: Math.floor(n / 2) - 1

//The index can be converted to the prime it represents by the formula: index * 2 + 3

for (int z = 0; z < sieve.Length; z++)

if (sieve[z])

{

int number = z * 2 + 3;

//When incrementing, only investigate the odd numbers.

for (int y = 3 * number; y <= x; y += (2 * number))

sieve[(int)Math.Floor(y / 2.0) - 1] = false;

}

//Create the list of primes. Seed with 2 and pull the odd primes from the sieve.

List<int> primes = new List<int>();

primes.Add(2);

for (int z = 0; z < sieve.Length; z++)

if (sieve[(int)z])

//The index can be converted to the prime it represents by the formula: index * 2 + 3

primes.Add(z * 2 + 3);

return primes;

}

The first implementation is much easier to read and follow than the second. The second implementation uses half the memory, but an array of over 2,000,000 boolean values still takes up less space than the average video or music file. The footprint is theoretically 1.9MB for the full hash implementation and 0.95MB for the odds-only implementation. What about speed? I ran a simple test harness to determine the time used to run both implementations. The speed saved by only testing odd numbers is then lost during the additional calculations. Here are results from a few runs:

The odds-only implementation saves a small fraction of the time. In one case above, it even took longer than the full implementation. Unless processing speed and memory footprint of the hash-table are tight constraints, I'd opt to use the full implementation as it would be easier to read and maintain in a team-environment.

Dreamer of Worlds

Labels

Search This Blog

Saturday, November 30, 2013

Algorithms: Prime Number Revisited, The Sieve of Eratosthenes

No comments:

Post a Comment