M
Martin Brown
Guest
On 08/09/2023 22:14, Joe Gwinn wrote:
Indeed. Halley was pretty clever but forever overshadowed by Newton.
Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!
One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR
(I suspect it is some sort of pipeline stall slowing the others down).
In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that
(at an increased risk of under/over flow in some edge cases)
The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is
most noticeable with the Intel compiler (less so on MSC or GCC).
The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f\'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.
But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.
No. NR can go completely AWOL far too easily and diverge to infinity or
oscillate chaotically between a set of values. Halley\'s method is the
best of the three for stability, but MNR will get you slightly better
precision if you can be sure your guesses are always close enough.
My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would
implement a hardware cube root then I would be very happy.
I only do that if something catches my attention as potentially odd.
It confirmed that given the right awkward parameters the solver would
fail but also that it was a a very low risk event. Taking an overnight
run just to find a single example using random data.
I just happened to be unlucky in my exact choice of problem. Expansion
around pi/4 was absolutely fine but the expansion around 3pi/4 went
completely haywire even though it differed only in a handful of sign
changes to coefficients. I was utterly convinced it was my user error.
ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70\'s).
--
Martin Brown
On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:
NR = Newton Raphson?
Yes. Although these days my production code actually uses Halley\'s
method in preference since it is almost the same speed to execute on
modern pipeline CPUs and gives cubic rather than quadratic convergence.
Interesting. I had not heard of Halley\'s Method, and cubic
convergence is worthwhile.
Indeed. Halley was pretty clever but forever overshadowed by Newton.
Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!
One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR
(I suspect it is some sort of pipeline stall slowing the others down).
In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that
(at an increased risk of under/over flow in some edge cases)
The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is
most noticeable with the Intel compiler (less so on MSC or GCC).
The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f\'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.
But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.
Some practitioners prefer what they call modified NR which is a sort of
hybrid of NR with Halley with slightly better convergence than Halley.
However it involves a sqrt and isn\'t always better.
Equation (4) in this paper (which includes the derivation) - method
originally due to Laguerre but now seeing a revival in some codes.
https://academic.oup.com/mnras/article/467/2/1702/2929272
The gotcha is that although as written with abs() it never actually
fails - a sufficiently bad guess will still give poor answers, but is
much less likely to explode or oscillate in the way that NR does.
So NR is better because it always warns?
No. NR can go completely AWOL far too easily and diverge to infinity or
oscillate chaotically between a set of values. Halley\'s method is the
best of the three for stability, but MNR will get you slightly better
precision if you can be sure your guesses are always close enough.
I\'m not convinced that it is worth the cost of a sqrt but with CPUs
improving all the time it is now very close. The latest i5-12600 I have
can do a sqrt in about the same time as it takes to do a divide!
Wonder what algorithm it uses?
My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would
implement a hardware cube root then I would be very happy.
And then there is co-plotting and staring - the Mark I eyeball is a
very good pattern and anomaly detector, especially of concentricity or
linearity.
Looking at the plot of residual errors can be very enlightening. So can
looking at the histogram for many test cases of the error distribution
of solve, test and then measure error in the resulting answer.
It is remarkably easy to home in on any weak spots and edge cases. My
normal reporting rule is printout exceptions whenever the worst case
high or low error is exceeded. Verbose mode shows when it is equalled.
Yes. But I always make plots and stare at them.
I only do that if something catches my attention as potentially odd.
My initial reaction was that it was tested library code so it must be my
problem - until I traced into it and saw how it failed. It gives 8
instead of 16 sig fig in double precision for these pathological data.
That\'s a good one.
I was a bit surprised. It required an astonishingly exact combination of
factors for it to trigger and one of my real problems hit it square on.
So biased probing paid off here? Or only for diagnosis?
It confirmed that given the right awkward parameters the solver would
fail but also that it was a a very low risk event. Taking an overnight
run just to find a single example using random data.
I just happened to be unlucky in my exact choice of problem. Expansion
around pi/4 was absolutely fine but the expansion around 3pi/4 went
completely haywire even though it differed only in a handful of sign
changes to coefficients. I was utterly convinced it was my user error.
In the 1980s I hit a similar problem, but it wasn\'t hard to find. In
scaled integer arithmetic (no floating point then), it\'s very common
to multiply two 32-bit integers together, yielding a 64-bit product,
and then divide that by a 64-bit scaling factor, yielding a 32-bit
quotient. In the Fortran of the day, if one just wrote this as
(a*b)/c, the compiler would truncate the product (a*b) to 32 bits
before dividing by c, yielding a wildly wrong answer, without any
warnings or drama. The solution was define a 64-bit intermediate
variable, and do the operation in two lines: p=a*b, result = p/c.
ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70\'s).
--
Martin Brown