!pr0
Line Number Cross Reference.........................Bill Morgan

Have you ever had to modify a BASIC program written by someone who didn't seem to know what he was doing?  Deciphering several hundred undocumented lines of split FOR/NEXTs and tangled GOTOs can lead to a severe headache.  We recently had a consulting job that involved just such a project:  one program to be altered was about a hundred sectors of spaghetti-plate Applesoft.  A couple of the biggest problems were figuring out which lines used a particular variable, and what lines called others, or were called from where.

Back in November of 1980, AAL published a Variable Cross Reference program which neatly took care of the first problem by producing a listing in alphabetical order of all the variables used and all the lines using them.  At the end of that article, Bob S-C pointed out that the program could, with some effort, be modified into just the sort of Line Number Cross Reference we now needed.  Well, I drew the job of making that modification, and here's what I came up with.


The Basis

These Cross Reference programs use a hash-chain data structure to store the called and calling line numbers.  Each called line has its own list of lines which refer to it.  We locate these lists by using the upper six bits of the line number for an index into a table located at $280.  This table contains the address of the beginning of each of the 64 possible chains.  Each chain is made up of the data for a range of 1024 possible called line numbers.  The first one has called lines 0-1023, the second has 1024-2047, and so on.

The entry for each called line is made up of a pointer to the next called line in that chain, this called line number, a pointer to the next calling line, and the number of this calling line.  Each subsequent calling line entry has only the last four bytes.  A pointer with a value of zero marks the end of each chain and each list.

VCR used three characters for each variable:  the first two letters of the variable name and a type designator of "$", "%" or " ".  The first character was the hash index and the last two characters were stored at the beginning of each variable's chain.  LCR uses the high-order 6 bits of the called line number for the hash index and stores both bytes of the number in the chain.  This is slightly redundant, so if you want to store more information about the called line, you can use the upper six bits of the chain entry.

VCR stored the calling line numbers with the high byte first, backwards from usual 6502 practice.  This was done so the same search-compare code could handle both variable names and line numbers.  To simplify the conversion I kept the same structure, even though it's no longer strictly necessary.


The Program

LCR, the overall control level, is identical to VCR and just calls the other routines.

INITIALIZATION prepares a couple of pointers and zeroes the hash table.  The only difference here is the size of the hash table.

PROCESS.LINE is also the same as in VCR.  This routine steps through the lines of the Applesoft program, moving the calling line number into our data area and JSRing to SCAN.FOR.CALLS to work on each line.

SCAN.FOR.CALLS is the first really new section of code.  We start by setting a flag used to mark ON ... GO statements.  Then we step through the bytes of the line, looking for tokens that call another line.  GOTO and GOSUB are processed immediately.  For a THEN token we check to see if the next character is a number.  If it is, we deal with it; if not, we go on.  If we find an ON token, we set the flag and keep looking.  After a GOTO or GOSUB we check ONFLAG.  If there was an ON, we look for a comma to mark another called line number.

PROCESS.CALL first converts the ASCII line number of the called line into a two-byte binary number and then searches the data structure for that line.  If it is there, we simply add this calling line to the list.  If we don't find the called line we create a new entry for it.

CONVERT.LINE.NUMBER is lifted straight from Applesoft's LINGET, at $DA0C.

NEXT.CHAR is a utility routine to get the next byte from the program and advance the pointer.

SEARCH.CALL.TABLE starts the search pointer on the appropriate chain.

CHAIN.SEARCH uses the pointer in an entry to step to the next entry.  If the pointer is zero, then there is no next entry and the search has failed.  We then compare the line number in the entry to the one we're looking for.  If the entry is less than the search key, we go on.  If it is equal, we update the pointer and report success.  If we hit an entry greater than the key, the search fails and we return.

SEARCH.LINE.CHAIN is called after SEARCH.CALL.TABLE has found a match.  Here we move the pointer to the calling line field of the matching entry and use the current calling line for a search key.

ADD.NEW.ENTRY first updates the pointers in the previous entry and this new entry, and the end-of-table pointer.  We then make sure there is room for the new entry and move the data up into the new space.
!np
Now we are done with the routines devoted to building the Cross Reference tables.  Interestingly, SEARCH.CALL.TABLE, CHAIN.SEARCH, SEARCH.LINE.CHAIN, and ADD.NEW.ENTRY are the real heart of this program, and the only change I had to make in these routines from VCR to LCR was to alter the method of figuring the hash index in SEARCH.CALL.TABLE.  Next we come to getting the data back out of the tables and onto a display.

PRINT.REPORT first sets a pointer we'll be using later on and then steps through the hash table, calling PRINT.CHAIN for each entry found.

PRINT.CHAIN starts out by checking for a pause or abort signal from the keyboard.  It then moves the current called line number into LINNUM, checks to see if it really exists, and prints it, followed by an asterisk if it is undefined.  Now we move a pointer up to the start of the calling line list and call PRINT.LINNUM.CHAIN to display all the entries.  The last step is to move the pointer up to the next called line in this chain, if any, and go back to do that one.

CHECK.DEFINITION keeps its own pointer into the program and steps along checking each called line to see if it actually exists.  It provides a space or an asterisk to be printed after the line number.

PRINT.LINNUM.CHAIN displays the calling lines stored for each called line.  We first tab to the next column (or line if necessary), then get the line number out of the list and print it.  Lastly, we move the pointer up to the next entry, if any, and loop back.

TAB.NEXT.COLUMN prints enough blanks to move over to the next output position.  If a new line is necessary, it checks the line number to see if the new line should go to the screen only, or also to a printer.  This is Louis Pitz's addition, designed to automatically handle either 40- or 80-column output.

PRINT.LINE.NUMBER and CHECK.FOR.PAUSE are pretty standard routines to convert a two-byte binary number into five decimal characters, and to provide for pause/abort during display.

Well, now we have a Line Number Cross Reference to go along with the Variable Cross Reference.  Now all that remains is to integrate the two programs into one master Applesoft Cross Reference Utility.  Maybe you could call it with "&V" for VCR, or "&L" for LCR, and simply "&" to get both listings.  Any takers out there?


PS:  Bob suggested that I add a diagram of the hash chain structure, and a summary of the search process.  OK, here they are...
