!pr3
Redundancy in Tables for Faster Lookups....Bob Sander-Cederlof

When speed is the main objective, you can sometimes use table lookups to great advantage.  You trade program size for speed.

Here is an easy example.  Suppose I want to convert the two nybbles of a byte to ASCII characters.  I can do it all with code, like this:

CONVERT
       PHA             Save original byte
       LSR             Position first nybble
       LSR
       LSR
       LSR
       JSR MAKE.ASCII
       STA XXX
       PLA             Original byte
       AND #$0F        Isolate second nybble
       JSR MAKE.ASCII
       STA XXX+1
       RTS

MAKE.ASCII
       ORA #$B0        Make B0...BF
       CMP #$BA
       BCC .1          It is 0-9
       ADC #6          Make A-F codes
.1     RTS

That takes 30 bytes, and 75-77 cycles including a JSR CONVERT to call it.  Actually 75 cycles if both nybbles are 0-9, 77 cycles if they both are A-F, and 76 cycles if there is one of each.  If I move the code from MAKE.ASCII in-line, it saves 24 cycles (two JSRs, two RTSs), and only lengthens the program by one byte.

Or I can do a table lookup by substituting these two lines for both JSR MAKE.ASCII lines above:

       TAX
       LDA ASCII.TABLE,X

and making a little table like this:

ASCII.TABLE .AS -/0123456789ABCDEF/

In this form, the program takes 49 cycles, and uses a total of 39 bytes including the table.  Perhaps it could be an advantage that the # of cycles is always constant, regardless of the value being converted.

You can make it even faster by using two whole pages for table space, like this:

CONVERT
       TAX
       LDA HI.TABLE,X
       STA XXX
       LDA LO.TABLE,X
       STA XXX+1
       RTS

HI.TABLE
       .AS -/0000000000000000/
       .AS -/1111111111111111/
       .
       .
       .AS -/FFFFFFFFFFFFFFFF/

LO.TABLE
       .AS -/0123456789ABCDEF/
       .AS -/0123456789ABCDEF/
       .
       .
       .AS -/0123456789ABCDEF/

The program itself is 14 bytes long, but there are 512 bytes of tables.  The conversion, including JSR and RTS, now takes only 30 cycles.  And since the program is now so short, it would probably get placed in line, saving the JSR-RTS, converting in only 18 cycles.  And if the in-line routine already had the nybble in the X-reg, whack off another two cycles.

The redundancy in the tables gives a huge speed increase.

I have been tearing into the super fast copy utility that comes with Locksmith 5.0, and I discovered some of these redundancy tricks in their disk I/O tables.  For example, the table for converting a six-bit value into a disk-code normally takes 64 bytes.  The table looks like this:

TABLE  .HS 96979A9B9D9E9FA6
       .
       .
       .HS F7F9FAFBFCFDFEFF

Code to access the table might look like this:
       LDA BUFFER,X
       AND #$3F        Mask to 6 bits
       TAY
       LDA TABLE,Y

When you are writing to a disk, every single cycle counts.  Therefore, it is pleasant to discover redundant tables.  By making four copies of the table, using 256 bytes rather than 64, we no longer need to strip off the first two bits.  The code can be shortened to this:

       LDY BUFFER,X
       LDA TABLE,Y

It only saves 3 cycles, but those three cycles can and do make the whole difference in the fast copy program.  That is part of Locksmith's secret to reading a whole disk into RAM in only 8 seconds.


Speaking of Locksmith........................Warren R. Johnson

Did you know that Locksmith 5.0 can nearly be copied by plain old COPYA?  Or with its own fast backup copier?  All but the last few tracks copy, and they may not be necessary.

The only problem is, the resulting copy will not boot until you make a small patch using some sort of disk ZAP utility.  (You can use Omega's Inspector/Watson team, Bag of Tricks, Disk Fixer, CIA, for example.)  Patch Track-0F Sector-0E Byte-6F: change it from 6C to 0F.  [ Editor's note:  in my copy, Locksmith had C6 in that byte rather than 6C.  And I have not tried the resulting disk to see if all functions work. ]

I have modified my Apple a little to make my life easier.  I have 2732's in the motherboard ROM sockets, with bank switch selection.  Applesoft is in one bank, and a modified version of Applesoft in the other.  My modifications include replacing the old cassette commands (LOAD/SAVE/SHLOAD etc.) for an INWAT command.  INWAT downloads the Inspector and Watson from some expansion chassis ROM boards.

1
