Top Quality Programming Team Speaks Out
Factoring exteremely large numbers
Written by Adrian Pavone
EFF are running a long-standing competition whereby they will pay large sums of money to the first person/group to find prime numbers of exteremly large numbers of digits. The first prize awarded was for finding a one million digit prime number, and was for $50,000. This award went on Apr 6, 2000. However, the 10 million digit award of $100,000 is still up for grabs, and hence that is what I am working towards.
The problem of determining large prime numbers stems from a number of issues. The first, and most obvious for a programmer, is how you get to a number of that size in the first place. It is impossible for the CPU to use a number of that size, with a 32 bit CPU only being able to handle numbers up to 4 billion and change (10 digits), and then only using unsigned integers, and a 64bit computer can only handle numbers of up to 18 million million (19 digits), again with unsigned integers. Now, it is possible for a computer to use numbers larger than these limits, but this involves floating point numbers, which lose the precision required for this task. As an exteremely exaggerated example, as an integer 100 is stored as exactly 100, whereas a floating point number could easily store it as 99 or 101 (this isn't really the case, for such small numbers it would only be fractions off at worst).So the first problem is, how do you store integers of ten million digits? This actually turned out to be much easier done than said, using the Math::BigInt package. With that problem resolved, you get to the fact of generating a ten million digit number.
My first attempt at generating a ten million digit number was to create the BigInt, store the value 1 in it, and then loop 10,000,000 times, multiplying the value by 10 each time. Unfortunately, around 55,000 digits, this really started to slow down, to the point where the number was only increasing by a few hundred digits per seconds (previously it had been a few thousand digits per second). So a new approach was required. I then actually bothered to look further at the perldoc, and noticed a method named "bpow". By calling $bigNum->bpow(1000000), this would allow for the class itself to use it's own method to generate the number.
Not expecting much in the way of results, I was exteremely surprised when this resulted in the generation of the number within 10 seconds. At this point, I actually started to think that maybe this was possible. Not only this, but within 1m19s, the script was able to determine which of the first 20 numbers didn't have factors under 10.
I should probably go a little bit more into the implementation that I have used to determine what numbers to try. Frankly, it was quite simple, all I did was create a standard for loop that would execute next whenever the next number to be generated was divisible by 2 or 5. This cut down 6 out of every 10 numbers generated, saving the script from attempting to find factors for every number.
Once the numbers were generated, they are then passed into a script aptly named isValuePrimeToHundred, which obviously checks whether the first 100 prime number can be divided into it cleanly (I am only checking the prime numbers as divisors, for the simple reason that if the number cannot be divided by 3, then 9 definitely can't divide it either).
The subroutine named above looks like:
sub isValuePrimeToHundred {
$currentNum = ${+shift};
%lowPrimes = %{+shift};
for (sort {$a <=> $b} keys %lowPrimes) {
$testNum = $currentNum->copy()->bdiv($_);
if ($testNum->bmul($_) == $currentNum) {
return 0;
}
}
return 1;
}
As can be seen, I am using references for passing around the large variables. This is for obvious reasons, were I not to do this, the script would create duplicates in memory of every variable each time, and given the size of these numbers (each representing about 2MB of ram minimum), this would generate a large performance degredation over time.
Anyway, the %lowPrimes associative array (hash) is the result of:
sub generatePrimeHash {
%lowPrimes = qw/2 1 3 1 5 1 7 1 11 1 13 1 17 1 19 1 23 1 29 1 31 1 37 1 41 1 43 1 47 1 53 1
59 1 61 1 67 1 71 1 73 1 79 1 83 1 89 1 97 1/;
return \%lowPrimes;
}
All this hash does is store as its keys the prime numbers under the value of 100 (an arbitrary limit set by myself).
In order to distribute the workload, I have broken the script into two parts, one that performs the above task of determining which numbers cannot be factored with values under 100, and the real serious problem, which is determining which of the numbers output by the above part are actually prime. As such, the first part outputs its results to a file, with each number it returns being placed on a seperate line. The worker computers that need to determine which values are actually prime of these merrily ping a script on the server (using http), which returns a single number to be tested. The computer then gets to work determining if this value is actually prime, by testing all numbers between 100 and the square root of the number, excluding any numbers that can be divided by one of the lowPrimes already discussed. When it does find a factor, it contacts the server again, including informing it of the number that it was working on, and the server removes that number from the file, assigning the computer a new number.
In another post I may go into the complexity of talking with the server via HTTP using Perl, but for now, this is Adrian signing off.
UPDATE: One of the computers appears to be stuck finding a factor for 1e1,000,000+39. Who knows, maybe we found a winner (doubtful, this number has probably already been checked and just has a huge factor with thousands of digits, but still there's always hope).


