!lm10
!rm76
Sifting Primes Faster and Faster

Benchmark programs are sometimes useful for selecting between various processors.  Quite a few articles have been published which compare and rank the various Z-80, 8080, 6800, and 6502 systems based on the speed with which they execute a given BASIC program.  Some of us cannot resist the impulse to show them up by recoding the benchmark in our favorite language on our favorite processor, using our favorite secret tricks for trimming microseconds.

"A High-Level Language Benchmark" (by Jim Gilbreath, BYTE, September, 1981, pages 180-198) is just such an article.  Jim compared execution time in Assembly, Forth, Basic, Fortran, COBOL, PL/I, C, and other languages; he used all sorts of computers, including the above four, the Motorola 68000, the DEC PDP 11/70, and more.  He used a short program which finds the 1899 primes between 3 and 16384 by means of a sifting algorithm (Sieve of Eratosthenes).

His article includes table after table of comparisons.  Some of the key items of interest to me were:

!lm15
Language and Machine                    Seconds

Assembly Language 68000 (8 MHz)            1.12
Assembly Language Z80                      6.80
Digital Research PL/I (Z80)               14.0
Microsoft BASIC Compiler (Z80)            18.6
FORTH 6502                               265.
Apple UCSD Pascal                        516.
Apple Integer BASIC                     2320.
Applesoft BASIC                         2806.
Microsoft COBOL Version 2.2 (Z80)       5115.

!lm10
There is a HUGE error in the data above; I don't know if it is the only one or not.  The time I measured for the Apple Integer BASIC version was only 188 seconds, not 2320 seconds!  How could he be so far off?  His data is obviously wrong, because Integer BASIC in his data is too close to the same speed as Applesoft.

I also don't know why they neglected to show what the 6502 could do with an assembly language version.  Or maybe I do....were they ashamed?

William Robert Savoie, an Apple owner from Tennessee, sent me a copy of the article along with his program.  He "hand-compiled" the BASIC version of the benchmark program, with no special tricks at all.  His program runs in only 1.39 seconds!  That is almost as fast as the 8 MHz Motorola 68000 system!  The letter that accompanied his program challenged anyone to try to speed up his program.

How could I pass up a challenge like that?  I wrote my own version of the program, and cut the time to .93 seconds!  Then I made one small change to the algorithm, and produced exactly the same results in only .74 seconds!

Looking back at Jim Gilbreath's article, he concludes that efficient, powerful high-level languages are THE way to go.  He eschews the use of assembly language for any except the most drastic requirements, because he could not see a clear speed advantage.  He points out the moral that a better algorithm is superior to a faster CPU.  (Note that his algorithm is by no means the fastest one, by the way.)

Here is Gilbreath's algorithm, in Integer BASIC:

<program#1>

The REM tagged onto the end of line 70, if changed to a real PRINT statement, will print the list of prime numbers as they are generated.  Of course printing them was not included in any of the time measurements.  According to my timing, printing adds 12 seconds to the program.

I modified the algorithm to take advantage of some more prior knowledge about sifting:  There is no need to go through the loop in lines 50 and 60 if P is greater than 127 (the largest prime no bigger than the square root of 16384).  This means changing line 40 to read:

     40 P=I+I+3 : IF P>130 THEN 70 : K=I+P

This change cut the time for the program from 188 seconds to 156 seconds.  My assembly language version of the original algorithm ran in .93 seconds, or 202 times faster; the better algorithm ran in .74 seconds, or almost 211 times faster.

William Savoie has done a magnificent job in hand-compiling the first program.  He ran the program 100 times in a loop, so that he could get an accurate time using his Timex watch.  Here is the listing of his program.

<Bill Savoie's program>

Here is a listing of my fastest version.  If you delete lines .... through ...., you get my code for the original algorithm.

<my program>

Michael R. Laumer, of Carrollton, Texas, has been working for about a year on a full-scale compiler for the Integer BASIC language.  He has it nearly finished now, so just for fun he used it to compile the algorithm from Gilbreath's article.  Mike used a slightly different form of the Integer BASIC program than I did, which took 238 seconds to execute.  But the compiled version ran in only 20 seconds!  If you are interested in compiling Integer BASIC programs, you can write to Mike at Laumer Research, 1832 School Road, Carrollton, TX 75006.

If you want to, you can easily cut the time of my program from .74 to about .69 seconds.  Lines 1600-1650 in my program set each byte in ARRAY to $01.  If I don't mind the extra program length, I can rewrite this loop to run in about 42 milliseconds instead of the over 90 it now takes.  Here is how I would do it:

!lm15
.1     STA ARRAY,Y
       STA ARRAY+$100,Y
       STA ARRAY+$200,Y
       STA ARRAY+$300,Y          TOTAL OF 32
        .                       LINES LIKE THESE
        .
        .
       STA ARRAY+$1E00,Y
       STA ARRAY+$1F00,Y
       INY
       BNE .1
!lm10

If you can find a way to implement the same program in less than .69 seconds, you are hereby challenged to do so!
