Monday, 22 July 2013

Generating list of primes


I thought I'd share some basic ideas to make faster codes for some simple problems. One simple problem every post.

Generating list of all primes less than or equal to N

Simple enough problem, and I'm sure that Googling it will probably yield a host of solutions, but let's try coming up with a basic solution ourselves. So, to get started, a brute force solution would be to check divisibility of every number by numbers smaller than it.
This is Solution 0, which in C++ looks like this :-

vector <int> Primes;
forint i = ; i <= N ; i++) {
      bool isPrime = true;
      forint j = ; j < i ; j++)
            if( i % j == ) {
                  isPrime = false;
                  break;
            }
      if( isPrime ) Primes.push_back( i );
}

On my poor decade-old computer (2.x GHz, 500MB RAM), when compiled with "gcc -O2", the above code takes 7s for N = 10^5. That is a crappy solution. We can improve it a lot.

To make it faster, we can ignore the even numbers and concern ourselves with just the odd numbers, but that only makes it slightly faster.
(Why only slightly faster? Aren't we removing half the numbers? Well yes, but it's not like an equal amount of time is spent on every number. Even numbers get discarded very quickly in the inner loop, the almost-primes take a lot more time to discard, and the primes take the most amount of time to confirm that there are no divisors.)
Also, since we are generating and saving a list of primes, we can use this list to speed things up - we don't need to divide by every single number but just smaller primes.
So we have Solution 1, which in C++ looks like this :-

vector <int> Primes;
Primes.push_back( );
forint i = ; i <= N ; i += ) {
      bool isPrime = true;
      for( vector <int> :: iterator j = Primes.begin() + ; j != Primes.end() ; j++ )
            if( i % *j == ) {
                  isPrime = false;
                  break;
            }
      if( isPrime ) Primes.push_back( i );
}

If we compare the run-times, we get:
N = 10^5N = 10^6
Solution 07s~12 min
Solution 10.7s~48s
This code is faster than the previous one, but this is still a terrible solution.

The above two solutions are very slow because their time complexity is essentially quadratic in N. (Actually, other than N^2, we also have some complicated terms of log(N) here, but the basic idea is that for every number, we are checking with all the primes smaller than it, which is (sort of) as bad as two loops of N).
This can be easily improved, since we only need to check with primes smaller than the square root of a number e.g. to check whether 101 is a prime, we only need to check with all primes up to 10 i.e. 2, 3, 5 and 7.
This gives us Solution 2, for which the C++ code is the same as above, except this line :

for( vector <int> :: iterator j = Primes.begin() + ; j != Primes.end() && *j * *j <= i ; j++ )

Now we get:
N = 10^5N = 10^6N = 10^7N = 10^8
Solution 07s~12 min
Solution 10.7s~48s
Solution 20.2s4.5s~100s
The run-times show a sudden improvement as the time complexity comes down by a factor of N^1/2. (The asymptotic time complexity is an even more complicated expression now, and I'm not even gonna look into it now).

Now, how do we make this faster? Well, to reduce time from Solution 0 to Solution 2, instead of dividing by every smaller number, we were dividing only by primes smaller than the square root. But why go through this entire list every single time? I mean for every single number we first divide by 3, then 5, then 7, then 11, and so on until we find a divisor. For a number like 391 = 17*23, 5 iterations (3, 5, 7, 11, 13) are tried before the sixth iteration (17) marks this number as composite. Can't we save this effort? Isn't there any easier way to directly jump to its divisors 17 and 23?
Well, not really. It is a hard problem to find the factors of a given number. But we can approach this from another direction. It may be difficult to find divisors of a given number but it is easy to find multiples of a given divisor. For every prime number we find, we can find all its multiples in a single loop and discard these numbers as composite. Thus, now, to eliminate 391 = 17*23, we save the time of unnecessarily checking it with 3, 5, 7, 11 and 13 before it is eliminated by 17. As there are a lot of composite numbers to eliminate, this makes it considerably faster.
This solution is basically Sieve of Eratosthenes which was taught to us in middle school.




Our Solution 3 looks like this in C++ :-

vector <int> Primes;
Primes.push_back( );
vector <bool> Sieve( N + 1 );
forint i = ; i <= N ; i += ) Sieve[i] = 1;
forint i = ; i <= N ; i += )
      if( Sieve[i] ) {
            Primes.push_back( i );
            forint j = 3 * i ; j <= N ; j += 2 * i ) if( Sieve[j] ) Sieve[j] = 0;
      }

Note how we are still ignoring all the even numbers in our solution. Unlike our first solution, this actually makes a non-negligible difference right now. This is because unlike our first solution, the overhead involved with discarding each composite number is much lower, and getting rid of half the numbers makes it almost twice as fast.
Now we get:
N = 10^5N = 10^6N = 10^7N = 10^8N = 10^9
Solution 07s~12 min
Solution 10.7s~48s
Solution 20.2s4.5s~100s
Solution 30.2s4.2s~60s

Now, how do we make this faster? Well, earlier in Solution 2, when we were checking divisibility by 2, 3, 5, 7 etc we noted that we only needed to check up to the square root of the number. Along similar lines of reasoning, we note that when we are discarding multiples of a prime p, we only need to consider the multiples greater than or equal to p^2, since the ones below have already been discarded by smaller primes.
This gives us our Solution 4, which in C++ looks the same as the previous one, the only change being inside this if block :

if( Sieve[i] ) {
      Primes.push_back( i );
      if1LL * i * i > N ) continue;
      forint j = i * i ; j <= N ; j += 2 * i ) if( Sieve[j] ) Sieve[j] = 0;
}

Now we get:
N = 10^5N = 10^6N = 10^7N = 10^8N = 10^9
Solution 07s~12 min
Solution 10.7s~48s
Solution 20.2s4.5s~100s
Solution 30.2s4.2s~60s
Solution 40.14s2.4s~28s

Now, the same question that we ask each time : how do we make this faster? Consider what happens when we discard multiples of 23. We start at 23x23, then 23x25, then 23x27, and so on. However, after 23x23, many of its multiples have already been discarded by smaller primes. 23x25 is discarded by 5, 23x27 by 3, 23x33 by 3, 23x35 by 5 and so on. So we could potentially save some time if we only focus on its multiples which have not already been discarded.These are numbers which are only divisible by 23 and larger primes.
This gives us our Solution 5, which in C++ looks like this :-

vector <int> Primes;
Primes.push_back( );
vector <bool> Sieve( N + );
forint i = ; i <= N ; i += ) Sieve[i]=1;
forint i = ; i <= N ; i += )
      if( Sieve[i] ) {
            Primes.push_back( i );
            if1LL * i * i > N ) continue;
            forint j = N / i ; j >= i ; j-- ) if( Sieve[j] ) Sieve[i * j] = 0;
      }

Now we get:
N = 10^5N = 10^6N = 10^7N = 10^8N = 10^9
Solution 07s~12 min
Solution 10.7s~48s
Solution 20.2s4.5s~100s
Solution 30.2s4.2s~60s
Solution 40.14s2.4s~28s
Solution 52s~21s

Now, how do we make this faster? I'll continue this in Part 2.

No comments: