!pr0
!lm12
!rm75
Patch DOS 3.3 for Fast LOAD and BLOAD......Bob Sander-Cederlof

There must be at least a dozen products on the market now to speed up DOS 3.3:  Diversi-DOS, David-DOS, The DOS Enhancer, QuickDOS, FastDOS, Hyper-DOS, et cetera.  Some of these are unfortunately not compatible with the everyday programs we like to use, such as the S-C Assembler, ES-CAPE, or our favorite word processor.  And it can be quite difficult sometimes to determine the degree of compatibity.

For the record, S&H Software's DOS Enhancer is completely compatible with the S-C Macro Assembler.  David-DOS works well until you try to use the .TF directive.

Most of the speed-up systems only improve the speed of LOAD, BLOAD, RUN, BRUN, SAVE, and BSAVE.  Some also speed up booting into the language card.  And two (Diversi-DOS and David-DOS) speed up READing and WRITE-ing TEXT files, as well as offering a lot of minor enhancements in pursuit of more "user- friendliness".

It seems that the more the speed-up system does, the more compatibility problems you can expect.  After all, to add a feature you do have to change some code.  And many programs on the market expect the DOS image to be un-modified so they can jump into DOS subroutines in strange unexpected places and make their own custom patches to the DOS image.

Paul Schlyter (a subscriber in Sweden) sent me a small patch for DOS 3.3 early in April, 1982.  Paul's patch speeds up only RUN, BRUN, LOAD and BLOAD, but it such a small patch that it will almost fit into the interstices (unused bytes) inside DOS.  In fact, after I removed one bug and reorganized the code a little, I was able to fit it entirely within two unused areas:  $BA69-BA95 and $BCDF-BCFF.  I believe the result is completely compatible with all the programs I use around here, except for the ones that use their own modified and protected DOS.

Paul's patch turns out to be functionally equivalent to the much longer patch proposed in HardCore Magazine's HyperDOS, but it leaves the INIT command intact.

I ran some timing tests:

     LOAD   40 sectors  standard 10 sec
                        patched   3.5 sec

     BLOAD  37 sectors  standard 11 sec
                        patched   4 sec

     LOAD  132 sectors  standard 32 seconds
                        patched   7.5 seconds

I didn't try measuring times, but I suspect that SAVE and BSAVE may be just a little faster with this patch installed (during the read-after-write phase).
!np
Since the S-C Assemblers use the LOAD command to process .IN directives, large assemblies with large included files will assemble about three times faster when you install this speed-up patch.

The patch is really rather simple.  But before examining the patch, let's review the normal flow inside DOS for LOADing and BLOADing.

DOS is constructed in three layers:  the outer layer accepts your commands from the keyboard or from your program.  The inner layer, called RWTS, handles the intimate details of reading or writing a specified sector on a specified track.  RWTS also does the raw disk initialization when you use the INIT command.  The layer between commands and RWTS is called the File Manager (FM).

The command layer calls FM to open, close, rename, lock, unlock, verify, or delete a file; to print a catalog; to initialize a disk; or to position within a file.  There are also four kinds of calls for reading and writing files, to read or write one byte or a range of bytes.

When you use the RUN or LOAD command, the command layer calls FM to read the first two bytes.  These bytes contain the length of your program.  For Integer BASIC or S-C Assembler source files, the length is subtracted from HIMEM to get a loading address.  The loading address for Applesoft programs is found in $67,68.  Then FM is called to read a range of bytes of that length, to be stored starting at the loading address just determined.

When you use the BRUN or BLOAD command, the first four bytes are read off the front of the file.  The first two bytes are the loading address, and the next two are the length.  (Of course, you can override the loading address with the "A" parameter after the file name.)

After winding our way through the front end of FM, we finally get to this subroutine (where the range is read):

!lm+5
                1000 READ.RANGE
AC96- 20 B5 B1  1010       JSR DECR.TEST.LENGTH
AC99- 20 A8 AC  1020       JSR READ.BYTE
AC9C- 48        1030       PHA              SAVE THE BYTE
AC9D- 20 A2 B1  1040       JSR GET.ADDRESS.INC
ACA0- A0 00     1050       LDY #0
ACA2- 68        1060       PLA              GET THE BYTE
ACA3- 91 42     1070       STA ($42),Y      STORE IN BUFFER
ACA5- 4C 96 AC  1080       JMP READ.RANGE
!lm-5

The subroutine DECR.TEST.LENGTH breaks out of this loop when the range has been completely read.  The READ.BYTE subroutine picks bytes out of the DOS buffer, and reads a sector into that buffer when the buffer is empty.
!np
To understand the speed-up patch, break the reading process into three parts:  the first sector, the last sector, and all the in-between sectors.  We will let the loop shown above handle the first sector and possibly the last sector, and read the in-between sectors using a faster method.  Short files with only one or two data sectors will not have any in-between sectors, and so there will be no improvement in speed.

First we need to read the rest of the first sector of the file.  The first two or four bytes were already read to get address and length information.  We can let the loop shown above do that job.  But we need a way to break into the loop when it is our turn.  Let's patch the JMP on the last line to jump to our patch.

Our patch will get control after the loop above has read and stored a byte of data.  At that time our patch can look at the current file position in $B5E6; if $B5E6 is non-zero, then there are still bytes in the DOS buffer.  As long as there are bytes in the DOS buffer, we will branch back to $AC96 and let FM handle the bytes in its normal way.

Once the first sector has been read and stored, a byte at a time, $B5E6 will have a zero value.  Then our patch can look at the remaining length.  If the remainging length is at least one whole sector, we can read it faster.  If not, FM can read the last partial sector in its normal fashion.

To read a sector faster, we bypass the DOS buffer.  We can temporarily patch the actual destination address where the sector must go into the RWTS call block.  RWTS can put the entire sector directly into its final destination, rather than into the DOS buffer to be later moved by the rather slow loop above.

The extra time saved by eliminating the middle man will save an entire revolution of the drive to get the next sector (if it is in the same track, and they usually are).  A 40 sector file laid out sequentially on three tracks will save 38 revolutions of the disk.  The disk spins at 5 revolutions per second, so we will save a hair over 7 seconds.  (If the file is not laid out sequentially, the savings will be less.)

The bigger the file, the bigger the percentage improvement.  We can save 3 seconds per track.  It normally takes FM about 18 revolutions to read a track; with our patch, a track can be read in about 3 revolutions.  We save 15 revolutions or 3 seconds on each full track.  That is, a full track can be read in .6 seconds instead of 3.6 seconds.  The rest of the time required to read the file is spent moving the head from track to track, and reading the catalog and VTOC sectors.

If all 16 sectors of a track are to be read, and if the sectors were allocated the normal DOS 3.3 way, I think this is the way it happens with my patch installed:

!lm-7
F     E   D   C   B   A   9   8     7   6   5   4   3   2   1     0
F 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F 0
!lm+7

The bottom line of numbers shows the physical sector numbers.  As you move across the page from left to right, you simulate the disk read head.  It may take up to a full revolution of the disk before sector F appears, but once it does we proceed to pick off approximately every other sector as they come by.  The top line of numbers shows the DOS 3.3 logical sector numbers.  Logical sector E is actually physical sector 2, and so on.  So it takes two full revolutions, plus two more sectors, to read all 16.

If you are trying to figure out where the rest of the time is used, keep in mind that DOS first reads the VTOC (track 17, sector 0); then the first catalog sector (track 17, sector 15); if the file specified is not in the first catalog sector, it reads another; and so on.  If the file is far down in the catalog, it might have to read all 15 catalog sectors to find the file.  Then the track/sector list is read; it is usually in sector 15 of the same track containing the first 15 sectors of data.  On the other hand, as the disk fills up the sectors get splattered all over the disk.

Here is the patch code, arranged so that it squeezes into those two interstices I mentioned earlier:
!np
To install the patches, you need to BLOAD PATCH1 and BLOAD PATCH2.  Then patch locations $ACA6-7 to 69 BA, to change the JMP READ.RANGE instruction to a JMP PATCH1.  Note that you must BLOAD the patches before changing $ACA6-7.  If you change $ACA6-7 first, the system will crash as soon as you try to execute a BLOAD.

Here is an Applesoft program (which you could append to your HELLO program) to poke the patches into DOS.

!lm+5

20000  REM INSTALL FAST DOS LOAD AND BLOAD PATCHES
20010  READ N: IF N = 0 THEN  END
20020  READ A
20030  FOR I = 1 TO N: READ P: POKE A,P:A = A + 1: NEXT 
20040  GOTO 20010
20100  DATA  44,47721,173,230,181,208,36,173,194,181,
       240,31,173,203,181,72,173,204,181,72,173,195,
       181,141,203,181,173,196,181,141,204,181,32,
       182,176,176,3,76,223,188,76,111,179,76,150,172
20110  DATA  33,48351,238,228,181,208,3,238,229,181,
       238,196,181,238,204,181,206,194,181,208,11,
       104,141,204,181,104,141,203,181,76,150,172,
       76,135,186
20120  DATA  2,44198,105,186
20130  DATA  0
!lm-5

Paul mentioned he was working on an equally simple patch to speed up SAVE and BSAVE, but I haven't heard any more from him on that subject.
