!pr0
New DP18 Square Root Subroutine............Bob Sander-Cederlof

Even after bending over backwards to be certain I had the best possible SQR implementation in the October AAL, I still found some ways to improve it.  Last night I found some more information in a book called "Software Manual for the Elementary Functions", by William Cody and William Waite, Prentice-Hall, 1980.

They pointed out that in general an extra Newton iteration took less time than a complex method of getting an initial approximation which would be accurate enough to avoid one iteration.  In other words, using a cubic polynomial like I did in October is just not worth it.  Not worth the time, and not worth the space.

They further pointed out that it is best to compute the last Newton iteration in a slightly different fashion, to avoid shifting out the last significant digit.  The normal iteration computes (x/y + y)*.5.  Re-arrangement to y+(x/y-y)*.5 is better.  Since it takes an extra step, it should only be used the last time.

To see the difference, consider the example below.  I have used a precision of just 3 digits (instead of 18 or 20)to simplify the illustration:

     let x=.253, and y=.5
     then x/y=.506

     x/y+y=1.00 (truncating to 3 places)
     (x/y+y)*.5 = .500, which is wrong

     x/y-y=.006
     (x/y-y)*.5=.003
     y+(x/y-y)*.5 = .503, which is correct.

My new SQR version uses a much faster method for getting the first approximation.  The first two digits of the argument (in DAC.HI) must be in the range from 10 to 99.  I convert them to an index between $02 and $13 by shifting the first digit over three, and adding one if the second digit is 5 or more.  In other words, 10-14 become $02, $15-19 become $03, on up to $95-99 becoming $13.  Then I use that value as an index into a table which gives a good approximation to the first two digits of the square root.  For example, any number between .10 and .19999...9 will get a first approximation of .35.  I store those two digits into DAC.HI, letting the remaining digits stay as they were.  This method gives a first approximation which in the worst case still has at least the first digit correct.

It turns out the worst case is for numbers with odd exponents and the mantissa=1, such as 1 (which is .1*10^1), 100 (which is .1*10^3), and so on.  Even in this worst case, four iterations give 20 digits of precision.

The end result of these changes is a faster and shorter program which is more accurate.  Here is the new listing:
