!pr1
Faster Cyclic Redundancy Checking..........Bob Sander-Cederlof

In the April 1984 issue of AAL I showed how to compute a cyclic redundancy check code (CRC) for a buffer full of data.  I also tried to explain a little of the theory, as much as I understood.  In the June 1984 issue Bruce Love explained how to work backward from the computed CRC of a received buffer to correct a single bit error.  Both of these programs were written in plain 6502 code.

In the  February 1986 "Dr. Dobb's Journal" Terry Ritter writes about "The Great CRC Mystery".  He also presents some Pascal programs and 8088 machine code programs for calculating the CRC in various ways.  Terry describes very briefly a table driven method (the very fastest way) and a byte-oriented method (almost as fast as table-driven).

I translated Terry's machine-coded byte-oriented method from 8088 to 65802 code, but even after twiddling and tweaking for half a day I could not make it give the correct answers.  I don't know if his method is correct or not, but of course it MUST be, since it is printed in Dr. Dobb's and since he claims it works and since he even tells how many milliseconds it takes.

Anyway, I decided to derive my own byte-oriented method.  The CRC algorithm is basically a "long division" of the entire bit stream in the buffer as though it were one long binary word.  The divisor is $11021 in the CCITT scheme.  The check code we use is the remainder of the division.  The normal algorithm does "long division" on a bit-by-bit basis.  The byte-oriented algorithm does "long division" on a byte-by-byte basis.

I put long division in quotation marks above because it is not EXACTLY long division.  The difference is that the subtraction steps are replaced with exclusive-or operations.  The exclusive-or is performed whenever the leading bit of the new dividend is a 1-bit.  Here is a fully worked out example, for a CRC-so-far = $E1F0, and the next byte  = $CC:

       "divide"   $E1F0CC by $11021, "quotient bits" down
       the left edge.  Next CRC is the "remainder"

       1110 0001 1111 0000 1100 1100   (E1F0CC in binary)
1  eor 1000 1000 0001 0000 1           (11021 in binary)
       ---------------------
        110 1001 1110 0000 01
1   eor 100 0100 0000 1000 01
        ---------------------
         10 1101 1110 1000 000
1    eor 10 0010 0000 0100 001
         ---------------------
0         0 1111 1110 1100 0010
            1111 1110 1100 0010 1
1     eor   1000 1000 0001 0000 1
            ---------------------
             111 0110 1101 0010 01
1      eor   100 0100 0000 1000 01
             ---------------------
              11 0010 1101 1010 000
1       eor   10 0010 0000 0100 001
              ---------------------
               1 0000 1101 1110 0010
1        eor   1 0001 0000 0010 0001
               ---------------------
                 0001 1101 1100 0011 = $1DC3

Note that the "quotient" is $EF.  This "quotient" can always be exactly computed by using just the first byte of the dividend (the high byte of the old CRC code):  quotient = crchi .eor. crchi/16.  If you carefully study the worked out example above, you should be able to see why this is true.  Now, if we use the exclusive-or rather than addition to perform a multiplication of the quotient times $11021, it will look like this:

                      uuuu.vvvv  (symbolic quotient in binary)
                       x $11021  (multiplier in hexadecimal)
                       --------
                      uuuu.vvvv
               u.uuuu.vvv0
       uuuu.vvvv
  uuuu.vvvv
  -----------------------------
    whatever........

There are several significant things to notice about the multiplication above.  First, we only need to save the rightmost 16 bits of the "product".  If we exclusive-or those bits with the rightmost 16 bits of the original dividend (which means the low byte of the old CRC followed by the new byte), we will get the next CRC.  (This trick relies on the fact that exclusive-or is a reversible operation, so that "adding" and "subtracting" give the same result!)

Furthermore, we can organize those "partial products" in a more efficient way for computation.  Now, let's write the original CRC symbolically as "aaaa.bbbb.cccc.dddd", and the next data byte as "eeee.ffff".  The "quotient" after "dividing" by $11021 will be "aaaa.bbbb exclusive-or 0000.aaaa"; let's write that symbolically as "aaaa.gggg".  Then we can compute the next CRC code by the following very simple steps:

       cccc.dddd.eeee.ffff
  eor  gggg.0000.aaaa.gggg
  eor  000a.aaag.ggg0.0000
       -------------------
       wwww.xxxx.yyyy.zzzz

Believe it or not!

The program that follows implements this algorithm, in lines 1550-1760.  I used 65802 code, but it really could be done quite nicely in plain 6502 as well.  I leave it as "an exercise for the reader" (as college textbooks are wont to say), should you wish to try the algorithm in a plain-vanilla 6502.

The SEND and RECV programs simulate sending and receiving a buffer-full of data.  I chose to put my buffer at $4000, for 258 bytes.  This is the same as in the April 1984 article.

The FIND.BAD.BIT program is simply a translation of Bruce Love's 1984 program into 65802 code.  Thanks to 16-bit registers, it is significantly faster and shorter.

Speaking of speed, the code for computing the next CRC code for one new byte takes (if I counted correctly) 57 clock cycles.  In a normal Apple that means about 56 microseconds.  The time for 8088 machine code in Terry Ritter's article was 17 microseconds for the equivalent steps.  He was running with a 7.16 MHz clock.  If you ran the 65802 code in an Applied Engineering Transwarp card or a Titan Accelerator card with a 4-MHz 65802 (running at 3.58 MHz), the time would be only 15.9 microseconds in an Apple.
