!pr0
!lm12
!rm75
6502 Mini-Assembler in Applesoft...........Bob Sander-Cederlof

The original Apple II came with a built-in mini-assembler.  By typing "F666G" in the monitor, you entered a new realm.  The prompt changed from "*" to "!"; errors not only earned a "beep" but also a printed "?"; and monitor commands were still available by typing an initial "$".  I learned 6502 programming using this little tool, together with the handy "L" disassembly command.  At the time, none of the other computer systems on the market came with either mini-assembler or "L" command.

A mini-assembler allows you to type in mnemonics rather than converting them "by hand".  It also will translate branch addresses to the relative offsets needed in relative branch instructions.  It will not retain the source code on a file, and will not handle labels.  If you want to modify a program, you have to use patches or retype the whole thing.  A full assembler will accept labels and comments, and will have some method for working with stored source programs.  The S-C Macro Assembler, for example, includes a co-resident source program editor.  The extra features a full assembler can include are limited only by the potential market.  But mini-assemblers are free.

A long time ago MICRO published a 6502 mini-assembler written in Commodore or OSI BASIC.  I started converting it, just for fun, into Applesoft.  It wasn't long before I realized that my thought processes were totally incompatible with the author's programming techniques.  So I essentially started over.  Last month the partially finished listing appeared out of some long forgotten crack, so I dusted it off and finished the program.

It operates a lot like the old "F666G" mini-assembler by Steve Wozniak.  (And, even though it is in Applesoft, it is almost as fast.)  The initial display is the address "0300" at the left margin, and the cursor in column 20 of the top line on an otherwise empty screen.  You can type RETURN to quit, a colon followed by a hex address to change the assembly address, or an instruction mnemonic to be assembled.

I could go into a long-winded explanation of how the program works, describing each subroutine.  But you can probably read the listing easily enough, and there are identifying REM statements with each subroutine.  The really interesting part to me is the structure of the opcode tables which are contained as strings in OP$, F$, and E$.  These tables are set up in lines 2030 through 2050.

OP$ contains the opcodes names.  OP$(1) holds the names of all the single byte opcodes.  If the input line has no operand data after the opcode mnemonic, the program will search through OP$(1) and had better find your mnemonic.  If not, it is "???" for you!  Note that the opcode names are three characters each, packed into one long string.  Also note that ASL, LSR, ROL, and ROR appear in this string.  These four opcodes can have an operand-less mode, as well as any of four modes with operands.

OP$(2) contains the mnemonics for the relative branches.  OP$(3) holds "JMP" and "JSR".  And OP$(4) holds all the rest, which I call the complex opcodes.  These are the ones which can have a variety of addressing modes.

F$(1) through F$(4) correspond to OP$(1) through OP$(4).  Each three digit group in one of the F$ strings is the opcode value (in decimal) for the corresponding mnemonic from OP$.  F$(4) contains a base value, which will be augmented to obtain a specific value for the particular address mode chosen.

The complex opcodes can be classified in many different ways...I tried so many I lost count.  I finally settled on the scheme shown in the two tables below:

!lm+5
          Imm Zp  Abs Z,X A,X Z,Y A,Y (X) ()Y
       +  08  04  0C  14  1C  --  18  00  10    Base
------------------------------------------------------
ADC    0  69  65  6D  75  7D  --  79  61  71   61  097
AND    0  29  25  2D  35  3D  --  39  21  31   21  033
CMP    0  C9  C5  CD  D5  DD  --  D9  C1  D1   C1  193
EOR    0  49  45  4D  55  5D  --  59  41  51   41  065
LDA    0  A9  A5  AD  B5  BD  --  B9  A1  B1   A1  161
ORA    0  09  05  0D  15  1D  --  19  01  11   01  001
SBC    0  E9  E5  ED  F5  FD  --  F9  E1  F1   E1  225

STA    1  --  85  8D  95  9D  --  99  81  91   81  129



          Imm Zp  Abs Z,X A,X Z,Y A,Y (X) ()Y
       +  00  04  0C  14  1C  14  1C  --  --    Base
------------------------------------------------------
ASL    2  --  06  0E  16  1E  --  --  --  --   02  002
LSR    2  --  46  4E  56  5E  --  --  --  --   42  066
ROL    2  --  26  2E  36  3E  --  --  --  --   22  034
ROR    2  --  66  6E  76  7E  --  --  --  --   62  098

BIT    3  --  24  2C  --  --  --  --  --  --   20  032

CPX    4  E0  E4  EC  --  --  --  --  --  --   E0  224
CPY    4  C0  C4  CC  --  --  --  --  --  --   C0  192

DEC    5  --  C6  CE  D6  DE  --  --  --  --   C2  194
INC    5  --  E6  EE  F6  FE  --  --  --  --   E2  226

LDX    6  A2  A6  AE  --  --  B6  BE  --  --   A2  162

LDY    7  A0  A4  AC  B4  BC  --  --  --  --   A0  160

STX    8  --  86  8E  --  --  96  --  --  --   82  130

STY    9  --  84  8C  94  --  --  --  --  --   80  128

!lm-5
The first column of numbers is the opcode class number.  These numbers are stored in E$ (see line 2050).  The next nine columns show the hex opcode values for each valid combination of opcode and address mode.  The last two columns show the "base" value in both hex and decimal.

The top row of numbers (above the dashed lines) shows the augment needed to transform a "base" opcode value into the value for a specific address mode.  I broke the data into two separate tables because the Imm and A,Y columns have one pair of values for class 0 and 1 opcodes and another for classes 2 through 9.  The class number is used to select which address modes are legal for a given opcode, as well as in selecting the augment values.

If you have ever studied the listing of Wozniak's mini- assembler, you know that his approach was entirely different.  If you look inside the S-C Macro Assembler you will find yet another approach.  I suppose there are more approaches than existing assemblers.  In our line of Cross Assemblers we use about five or six different techniques.  The choice depends on the syntax of the operands and the bit structure of the opcodes, as well as whim.

I have also written a disassembler in Applesoft, and the beginnings of a simulator for 6502 code.  Maybe they will see print in the near future.  There is a lot to be learned from studying or even writing these kinds of programs, and they can even be useful.
