How to Move Memory

One of the most common problems in assembly language programming is the problem of moving data from one place in memory to another.

Moving Little Blocks:  If you only need to move one or two bytes of data from one place to another in memory, it is easy.  You might do it like this:

       LDA SOURCE
       STA DEST
       LDA SOURCE+1
       STA DEST+1

Or, if the A-register was busy but X and Y were not, you might write:

       LDX SOURCE
       LDY SOURCE+1
       STX DEST
       STY DEST+1

If you know ahead of time exactly how many bytes you want to move, and exactly where you want it copied from and to, you can write a very fast loop.  For example, suppose I know that I want to copy 20 bytes from BUFFER1 into BUFFER2, and that there is no overlap.  Then I can write:

       LDX #19
LOOP   LDA BUFFER1,X
       STA BUFFER2,X
       DEX
       BPL LOOP
       ...

The loop moves the last byte first, then the next-to-last, and so on until the first byte in BUFFER1 is moved into BUFFER2.  If it is important to move them in the opposite direction (first byte first, last byte last), you can change the loop this way:

       LDX #0
LOOP   LDA BUFFER1,X
       STA BUFFER2,X
       INX
       CPX #20
       BCC LOOP
       ...

Terminating the loop can be done in various ways.  The two examples above do it with a count in the X-register.  Another way is to use a data sentinel.  For example, the last byte to be moved, and only the last byte, might contain the value $00, or $FF, or anything you choose.  Then after moving a byte, you can check to see if the sentinel byte was just moved.  If it was, you are finished moving.  Here is an example using a sentinel of $00:

       LDX #-1
LOOP   INX
       LDA BUFFER1,X
       STA BUFFER2,X
       BNE LOOP
       ...

Pascal Language promoters often recommend the sentinel technique; however, in Assembly Language, you msut be very careful if you plan to use it.  The sentinel you choose today may become a valid data value tomorrow!


Moving Bigger Blocks:  All of the examples so far will only work if the total number of bytes to be moved is less than 256.  What if you need to move a larger block?

When I need to move a large block of data from one place to another, I frequently use the MOVE subroutine in the Apple Monitor ROM.  It starts at $FE2C, and looks like this:

FE2C- B1 3C    MOVE LDA (A1L),Y  MOVE (A1...A2)
FE2E- 91 42         STA (A4L),Y    TO (A4)
FE30- 20 B4 FC      JSR NSTA4
FE33- 90 F7         BCC MOVE
FE35- 60            RTS

The subroutine NXTA4 (at $FCB4) increments A4L,A4H ($42,43), which is the destination address.  Then it compares A1L,A1H ($3C,3D) to A2L,A2H ($3E,3F); the result of the comparison is left in the Carry Status bit: Carry is set if A1 is greater than or equal to A2.  Finally, the subroutine increments A2L,A2H ($3E,3F).

To use the MOVE subroutine, you have to set the starting address of the block to be copied into $3C,3D; the last address of the block to be copied into $3E,3F; and the starting address of the destination into $42,43.  You also need to be sure that the Y-register contains zero before you start.  Here is an example:

       LDY #0          CLEAR Y-REGISTER
       LDA #BUFFER1    START ADDRESS OF SOURCE
       STA $3C
       LDA /BUFFER1
       STA $3D
       LDA #BUFFER1.END  END ADDRESS OF SOURCE
       STA $3E
       LDA /BUFFER1.END
       STA $3F
       LDA #BUFFER2    START ADDRESS OF DESTINATION
       STA $42
       LDA /BUFFER2
       STA $43
       JSR $FE2C
       ...

Because it is there, the Monitor MOVE subroutine is handy.  But it is not a general subroutine.  If the source and destination blocks overlap, you may get funny results.  For example, if I try to move the data between $1000 and $10FF up one byte in memory, so that it runs from $1001 to $1100, the MOVE subroutine will not work.  Instead, it will copy the contents of $1000 into every location from $1001 through $1100.

The MOVE subroutine is also not very fast.  Anyway, it is not as fast as it could be.  Steve Wozniak evidently wrote with size in mind (to make it fit in ROM) rather than speed.

The Applesoft ROMs contain several subroutines for moving data around in memory.  Here is one used during execution to move the array table up to make room for a new simple variable:

<<<<listing of BLTU, $D393...D3D5>>>>

Since this code moves from the end of the block backwards, it will safely move a block up in memory.  However, it would not be save to use with an overlapping range down in memory; it will do the same thing as the Monitor MOVE subroutine.

The Applesoft subroutine is faster than the Monitor subroutine, because the least significant half of the pointer is kept in the Y-register instead of in page-zero of memory.  The INY instruction takes only two cycles, whereas an INC instruction takes five.  The three cycles saved in moving each byte add up to nearly 25 milliseconds in moving 8K bytes.  The extra overhead of setting up the pointers is more than paid for.

Additional time is saved in the termination test.  Instead of testing after moving every byte with a LDA, CMP, LDA, SBC sequence, the number of full 256-byte blocks to be moved is put in the X-register; only a DEX instruction once out of every 256 bytes is needed.  This saves over 100 millisecondes in moving an 8K block.  By putting the incrementing and testing code in line, rather than in a subroutine like NXTA4, we save the JSR and RTS time.  This amounts to another 100 milliseconds in moving an 8K block.


A General Move Subroutine:  Can we write a subroutine which will move a block of data from one place to anothere regardless of overlap and direction?  Of course!  All we have to do is test at the beginning for direction, and choose which method to use accordingly.

Here is a fast subroutine which will move any block of memory anywhere you want.  You call it by putting the starting address of the source block in A1L,A1H; the end address of the source in A2L,A2H; and the start address of the destination in A4L,A4H.  (This is the same way you set up the MOnitor MOVE subroutine.)  I wrote it to be used with the control-Y monitor command.

<<<<listing of general move subroutine>>>>

