!pr2
!or1
The 65802 is Here!.........................Bob Sander-Cederlof

I think it was last December that I learned of the new 16-bit versions of our old friend, the 6502.  You will remember my enthusiastic description in the Jan 84 issue.  People at Western Design Center were optimistic about shipping chips in a month or so.  Very optimistic.  Way too optimistic.  Nevertheless, they followed the tradition of our whole industry by continuing to stick by their commitment.  Every time we called, it was always "in a month or so"!

But yesterday (Oct 12th) it arrived.  Nice shiny new COD sticker on top, for $98.05, and nice new 40-legged bug inside.  I plugged the 65802 into my //e, after carefully removing the 65C02 I had just put in a week before.  Power on, the drive whirrs, RESET works, hurray!

So far I have spent about six hours exploring the new opcodes.  I used the new but yet unreleased version 2.0 of the S-C Macro Assembler, naturally.  The literature available up till now has been very sketchy on the details of some of the new opcodes and addressing modes.  Anyway, no matter how well the printed word is used, the chip itself will always have the final say, the last word.

Which reminds me that I have already had to correct one mis-understanding (bug?).  I was not computing the relative offsets for the 16-bit relative address mode.  There are two opcodes which use this mode:  BRL, Branch Relative Long; and PER, Push Effective address Relative.

BRL can branch anywhere within a 64K memory, using an offset of 16-bits.  Compare this with the other relative branches, which use only an 8-bit offset and can only branch inside a 256-byte space centered around the instruction.  BRL's offset ranges from -32768 to +32767.

PER pushes two bytes onto the stack.  The two bytes pushed are the high byte and then the low byte of the address calculated by adding the 16-bit offset to the current PC-register.  For example,

       0800- 62 FD FF    PER $0800
       0803-

pushes first $08 and then $00 onto the stack.  Voila!  Now we really can write position independent code!  Using the 16-bit mode, I can PER the address of a data item or table onto the stack, and then PLX (Pull to X-register) that address, and access data by LDA 0,X or the like.

Another favorite pair are the two block move instructions:  MVN and MVP.  With these I can move any block of memory from 1 byte up to 64K bytes from anywhere to anywhere.  With the 65802, anywhere is still limited to the 64K address space, but with the 65816 it can be anywhere in 16 megabytes.

To get full advantage of MVP and MVN, you need to be in the 16-bit mode.  You get there in two steps:  first you turn on the 65802 mode, as opposed to the 6502-emulation mode; and then you set some status bits which select 16-bit memory references and 16-bit indexing.

You turn on the 65802 mode by clearing the new E-bit in the status register.  The E-bit hides behind the Carry bit, and you access it with the XCE (Exchange C and E) instruction.

       CLC
       XCE     turns on 65802 mode

       SEC
       XCE     turns on 6502 emulation mode

Then REP #$30 turns on the 16-bit mode.  REP stands for Reset P-bits.  Wherever there are one bits in the immediate value, the corresponding status bits will be cleared.  Where there are zero bits in the immediate value, the corresponding status bits will be unaffected.  The two bits cleared by REP #$30 are the M- and X-bits.  If either of these, or both, are zero, the immediate mode of LDA, LDX, LDY, CMP, ADC, SBC, AND, ORA, and EOR become three byte instructions.  For example,

       LDA ##$1234

loads $1234 into the extended 16-bit A-register.  The long A-reg gets a new name or two.  The high byte is called the B-register, the low byte is still the A-register, and the pair together are called the C-register.

Okay.  Now back to the block movers.  Both of the moves require some setting up first.  You put the 16-bit address of the source block into the X-register, the destination address in Y, and the move count in C.  For example, suppose I want to move the block $0800-$0847 up to $0912:

       LDX ##$0800     source
       LDY ##$0912     destination
       LDA ##$0047     # bytes - 1
       MVN 0,0         move it

As each byte is moved, X and Y are incremented and A is decremented.  After all is complete, A will have $FFFF, X=$0848, and Y=$095A.

MVP, on the other hand, decrements the A-, X- and Y-registers for each byte moved.  If the block source and destination overlap, you can use the one which moves in the order that prevents mis-copying.

Those two zeroes after the MVN instruction above are two 8-bit values.  In the 65802 they don't mean anything, but in the 65816 they are the high 8-bits of the 24-bit addresses of source and destination.  In the 65816, you could copy one entire 64K bank to another with just four instructions!  And it only takes 3 cycles per byte moved!

The 65802 plugs directly into the 6502 socket in your Apple //e.  It may or may not work in older Apples ... I haven't tried it yet.  The 65816 will not plug into any current Apple II, even though it also has forty pins.  The extra 8-bits of address are multiplexed on the 8 data lines, and the meaning of the other pins is somewhat changed.

Please don't get the idea that plugging in this new chip will speed up your old software.  Old software will stay in the 6502 emulation mode, and will run at exactly the same pace as before.  New software can be written which will take advantage of the new features, and it can be a little faster, more compact, and so on.  The exciting future of the 65802 and 65816 lies not inside old Apples, but in the Apples yet to be born.  I am dreaming of a 4-megahertz, 1- to 8-megabyte Apple ...

Meanwhile, here is a REAL example.  Way back in the January 1981 issue of Apple Assembly Line I published a General Move Subroutine.  It was set up as a control-Y command for the monitor.  As an improvement over the monitor M-command, it could move blocks which overlapped either up or down in memory without repeating the leading bytes.

The following program takes advantage of the MVN and MVP commands to greatly speed up and shrink my previous effort.  The old one took 149 bytes, the new one only 80.  Disregarding all the setup time, which also improved, the time to move a single byte changed from a minimum of 16 cycles to a consistent 3 cycles.

Lines through 1090 describe how to set up and run the program, but don't even TRY it until you get a 65802 chip into your Apple!  The new opcodes will do amazing things in an old 6502 chip, but nothing at all like intended.

Line 1100, the .OP 65816 directive, tells version 2.0 that it should allow and assemble the full 65816 instruction set.

Lines 1180-1250 are executed if you use $300G after assembling, or if you BRUN it from a type-B file.

A1, A2, and A4 are monitor variables which are setup by the control-Y command.  When you type, for example, 800<900.957^Y (where by ^Y I mean control-Y), $800 is stored in A4, $900 in A1, and $957 in A2.

Lines 1270-1290 save the three registers, and these will be restored later at lines 1500-1520.  Lines 1320-1340 get us unto the 16-bit mode described above.  Just before returning to the monitor we will switch back to 6502 emulation mode, at lines 1480-1490.

Lines 1360-1390 calculate the "#bytes-1" to be moved, by using 16-bit subtraction.  Note that the opcodes assembled are exactly the same as they would be for 8-bit operations; the cpu does 16-bit steps here because we set the 16-bit mode.

Lines 1410-1460 determine which direction the block is to be moved:  up toward higher memory addresses, or down toward lower addresses.  By using two separate routines we prevent garbling the move of an overlapping block.

Lines 1610-1660 move a block down.  It is as easy as rolling off a log....  Just load up the registers, and do an MVN command.

Lines 1680-1760 move a block up.  Here we need the addresses of the ends of the blocks, so lines 1690-1720 calculate the end address for the destination.  Then we do the MVP command, and zzaappp! it's done.
