!pr0      Paul Schlyter's DOS patch
!lm12
!rm75
Speeding-up Text File I/O......................Paul Schlyter

In the April 1983 AAL (pages 2-8), Bob Sander-Cederlof presented a small patch that I had sent him almost a year earlier.  The patch greatly speeded up LOAD/BLOAD of long files.  At the moment, I had recreated a lot of very long assembler source files, such as the source code to DOS and Applesoft.  The long assembly times grew annoying, especially when I realized how much time was wasted inside RWTS just waiting for the right sector to pass under the R/W head of the disk drive!

Just one note about what was written on the bottom of page 2 of that issue:  my patch does not influence SAVE/BSAVE at all.  The read-after-write during a SAVE/BSAVE is made using the VERIFY command, and that command already works at top speed; in fact, VERIFY's speed was a major inspiration for my LOAD/BLOAD patch.

Next I tried to speed up SAVE/BSAVE with an equally simple patch.  I found it was not so easy, mainly because SAVE/BSAVE might have to allocate new sectors for the file.  I also felt it wasn't worth the trouble writing a more complicated patch, since SAVE/BSAVE isn't really used that often.

Next in line was a speedup of text file read and write.  Here I found a great "time-hog" in DOS.  The innocent-looking routines at $AE68 and $AE7E each require about 800 cycles to execute.  All they do is to swap a 45-byte area back and forth between the file buffers and a local workarea inside the file manager.  This is of course necessary when you open/close files or switch from file to file.  But if you're reading the same text file, the swapping may not be needed.  Nevertheless, file manager swaps the buffer in and out for each and every character you read or write!  This amounts to 256*(800+800) = roughly 410,000 cycles or 0.4 seconds for each sector you read or write!  This is about six seconds for each track!  And all it does during those six seconds is needlessly swap the same 45 bytes back and forth!

The principle of my patch is this:  When entering or exiting the file manager, first check to see if you're doing something else besides reading/writing.  If so, just go on as usual.  If you are reading/writing, check to see if the local workarea belongs to the file being read/written.  If so, just exit and save 800 clock cycles.  If not, check to see if it belongs to another file.  If the workarea contains another file's data, put it back into the file buffer where it belongs and then get the workarea for the current file.  All occurs this when you enter the file manager.

Upon exit from the file manager, if you're reading/writing, just set a flag to mark that the local workarea is being used, and save the address of the file buffer it came from.  This always saves 800 cycles.

Practical tests show that text file reading/writing is done up to about 40% faster with this patch installed.  This is slower than Diversi-DOS, but on the other hand this patch is compatible with S-C assemblers (and almost everything else in sight).  Also, this patch works equally well for all file types; it even speeds up the loading of type-R files with RBOOT/RLOAD (from DOS Tool Kit).  Diversi-DOS treats T type files in a special way, but does nothing to speed up type-R files.  And mine is free!

I put the patch at $300, because there's no free area large enough inside DOS where you can put it...especially if you have already installed the LOAD/BLOAD speedup described by Bob last April.  The listing which follows includes code to hook in the patches by overwriting the file manager where it calls the two workarea transfer subroutines.
!np
!pr1
Making Paul's Patches Fit in DOS...........Bob Sander-Cederlof

Don't tell me it won't fit!  It is so good, it MUST fit!

Let's see...there are 74 bytes available from $B6B3 thru $B6FC.  But Paul's patches are 93 bytes long.  Maybe if I twist it sideways and then hold my mouth just right....

Ha!  It worked!

Let me tell you how, but please don't think I am trying to pick Paul apart.  His analysis and creative programming are terrific!  He has taught me a lot.

First I noticed some common code in PATCH1 and PATCH2.  I made a subroutine called CHECK.OPCODE to test for the read or write command.  I used the carry status to pass back the answer to the caller.  Then I put the call to POINT.TO.WORKAREA (which loads an address into $42 and $43) at the top of the subroutine.  There's no need to duplicate it in the two callers.  These changes saved two or three bytes, for a tiny penalty in speed.

I noticed Paul used CLC, ROR FLAG to clear the sign bit of FLAG.  I save one byte two times by replacing these with LSR FLAG.  I set up the carry status info in CHECK.OPCODE so that carry SET means read/write...this lets me omit the SEC before ROR FLAG when I want to turn on the sign bit.

I noticed that both patches used the current contents of PNTR:  PATCH1 compared PNTR to PNTR.SAVE, while PATCH2 copied PNTR into PNTR.SAVE.  So I loaded up the contents of PNTR into the A- and X-registers inside my CHECK.OPCODE subroutine.  This saves a few more bytes.

At lines 1320-1330 in Paul's program he uses BNE to jump around an RTS.  I changed that to BEQ to an existing RTS further down in the program, saving one byte.

I moved the PNTR.SAVE variable, two bytes, to another area.  $B5CF and $B5D0 are unused, at the end of the file manager parameter list.  Conveniently, the subroutines which load addresses into PNTR refer to three such addresses inside the parameter list.  (See the code at $AF08-$AF1C.)  The X-register is loaded with 0, 2, or 4 to index into the list.  By putting PNTR.SAVE at the end of the list, I can load the X-register with 8 (PNTR.SAVE-$B5C7) and use the same subroutine, entering at $AF12.  This takes five bytes instead of twelve for LDA-STA- LDA-STA.

The final shortener I applied was to make the code which clears FLAG and copies the workarea to a buffer into a subroutine.  This is called PATCH4 in my listing.  The two lines at PATCH4 look just like what was in line inside the PATCH1 code, but different from what was done by the PATCH2 code.
!np
PATCH2 falls into PATCH4 if the opcode was not read/write. This used to clear the flag and call $AE7E; now it is $AE81. Since the difference between $AE7E and $AE81 is a JSR to setup PNTR with the workarea address, and since that was already arranged by CHECK.OPCODE, I can safely enter at $AE81.

No doubt if you followed me this far, you can see even more ways to save bytes.  In fact, I see one extra byte myself!  But the program is now just the right size for that hole at $B6B3, so enough is enough.

My listing includes some code to install the patches.  If you assemble my version, and BSAVE it on a binary file (A$300,L$6A), you can BRUN it whenever you want to install the patches.  Or, with version 1.1 of the Macro Assembler just add these lines:

!lm+5
1195       .TF B.FAST TEXT
1380       .PH $B6B3
1790       .EP
!lm-5

I also worked out the code for using Applesoft POKEs to patch it all in, and here it is:

!lm+5
100  REM  TEXT FILE SPEEDUP PATCH
110  READ N
   : IF N = 0 THEN  END 
120  READ A
   : FOR I = 0 TO N - 1
   : READ X
   : POKE A + I,X
   : NEXT
   : GOTO 110
200  DATA 74,46771,32,210,182,144,10,205,207,181,
          208,5,236,208,181,240,51,44,252,182,16,
          8,162,8,32,18,175,32,246,182,76,106,174,
          32,8,175,173
210  DATA 187,181,56,73,3,240,5,73,7,240,1,24,165,
          66,166,67,96,32,210,182,144,10,110,252,182,
          141,207,181,142,208,181,96,78,252,182,76,
          129,174,0
220  DATA 2,43787,179,182
230  DATA 2,45967,231,182
240  DATA 0
!lm-5

I tested the patches on a 24-sector text file.  The file was created by using the TEXT command in the S-C Macro Assembler.  I used EXEC to read it back in.  I also wrote a short Applesoft program which read the whole file with GET A$ in a loop.  Here are the results:

!lm+5
       NORMAL  PATCHED  CHANGE
       ---------------------------
TEXT   24 sec  18 sec   25% faster
EXEC   52 sec  34 sec   35% faster
GET A$ 30 sec  21 sec   30% faster
!lm-5
!np
I think you get the most benefit if the un-patched DOS has to work so long between calls to RWTS that the disk motor stops, but the patched DOS keeps the motor alive.  You save 0.4 seconds per sector anyway, but you can also save waiting for the motor to come up to speed.

Warning:  One danger I noted, and which I am wary of, is that FLAG could get out of sync with reality.  For example, if somehow FLAG was set with the sign bit on before ever calling the file manager, it could try to copy the workarea to any-old-place in RAM (or ROM, or I/O space).  If you install the patches after booting, there should be no problem.  But what happens if you initialize a disk with the patched DOS?  I think the flag MIGHT turn out wrong.  Maybe a little patch is needed to insure FLAG starts out clear, and is cleared after abnormal exits from file manager (such as RESET).
