Non-deterministic CPU's.

O

omattos

Guest
The first rule of digital electronics is they should be predictable -
ie. when the same data is fed into a circuit twice, the output should
be the same every time.

I'm trying to do a thought experiment to see what could happen if this
restriction was to be relaxed for CPU's.

Current CPU's should process every instruction with 100% accuracy.
What if I remove that restriction and say an error may be made in one
in every 1000 instructions. The error could effect anything, from
simple incorrect results of an arithmetic operation to incorrect
branching in the flow of control. After an "incorrect" instruction, I
understand that any further instructions executed could depend on the
"faulty" one, and therefore also produce unexpected results.

My question is what optimizations and speedups could be applied to CPU
design if they were allowed to occasionally produce "wrong" results?

For example I suspect a higher clock speed would be ok, higher
operating temperature range would be possible, on-die defects would be
ok (provided they only affect a few operations), due to more defects
being allowed, feature size could be reduced with the same
manufacturing process, and therefore clock speeds increased further,
and finally I'm guessing operating voltage could be reduced, reducing
power consumption.

My main question is could a CPU design expert take an "out of the air"
estimate how much faster a CPU could be made if it only had to produce
mainly-correct results, and not perfect results?


Before anyone asks why I want such a CPU, it's just a thought
experiment to see how an algorithm that can't be multi-threaded could
be run fastest. By having multiple CPU's running on the same code and
data data and being "synched" every 100 instructions or so, the
current state of each processor could be compared, and a majority
decision taken. Any CPU which isn't in the correct state would be
reset by copying the state of another CPU, and execution would
continue.
 
'omattos' wrote, in part:

The first rule of digital electronics is they should be predictable -
ie. when the same data is fed into a circuit twice, the output should
be the same every time.

I'm trying to do a thought experiment to see what could happen if this
restriction was to be relaxed for CPU's.
_____

Predictable performance is not the same as perfect performance. The outcome
of many spins of a roulette wheel are predictable. The outcome of any one
spin is not. The smaller components become physically, and the smaller the
number of electrons involved in calculations, the more the chimera of '100%
accuracy' recedes. As the number becomes smaller the output becomes more
granular.

Think a little more before engaging in your 'thought experiment'.



Redundant computer systems already exist for critical real time
applications; i.e. of three systems performing the same calculation, in the
case of different results, any majority agreement is chosen as correct. In
less time critical applications the same calculation can be run three times
serially. This would be most useful in an environment where random events
may affect calculation outputs; high energy ionizing radiation, for example.
On the component level parity / self-correcting error detection in RAM and
caches is an example.

This kind of redundancy is always be more expensive in time and material
than reducing the error rate by merely operating components 'in spec'.

Now, if by operating a systems in 'non-deterministic' regimes you mean
systems where quantum states are collapsed to obtain output, then that's a
horse of an entirely different color.

Finally, you will be running your thought experiment on a system that is
'non-deterministic' to some extent, prone to errors, and 'just good enough'
rather perfect.

Phil Weldon




"omattos" <omattos@gmail.com> wrote in message
news:21afd9c0-c7a8-41e0-ace5-cdb6e1b8fdef@w1g2000prm.googlegroups.com...
The first rule of digital electronics is they should be predictable -
ie. when the same data is fed into a circuit twice, the output should
be the same every time.

I'm trying to do a thought experiment to see what could happen if this
restriction was to be relaxed for CPU's.

Current CPU's should process every instruction with 100% accuracy.
What if I remove that restriction and say an error may be made in one
in every 1000 instructions. The error could effect anything, from
simple incorrect results of an arithmetic operation to incorrect
branching in the flow of control. After an "incorrect" instruction, I
understand that any further instructions executed could depend on the
"faulty" one, and therefore also produce unexpected results.

My question is what optimizations and speedups could be applied to CPU
design if they were allowed to occasionally produce "wrong" results?

For example I suspect a higher clock speed would be ok, higher
operating temperature range would be possible, on-die defects would be
ok (provided they only affect a few operations), due to more defects
being allowed, feature size could be reduced with the same
manufacturing process, and therefore clock speeds increased further,
and finally I'm guessing operating voltage could be reduced, reducing
power consumption.

My main question is could a CPU design expert take an "out of the air"
estimate how much faster a CPU could be made if it only had to produce
mainly-correct results, and not perfect results?


Before anyone asks why I want such a CPU, it's just a thought
experiment to see how an algorithm that can't be multi-threaded could
be run fastest. By having multiple CPU's running on the same code and
data data and being "synched" every 100 instructions or so, the
current state of each processor could be compared, and a majority
decision taken. Any CPU which isn't in the correct state would be
reset by copying the state of another CPU, and execution would
continue.
 
This kind of redundancy is always be more expensive in time and material
than reducing the error rate by merely operating components 'in spec'.
I was more thinking of redundancy like this for performance. Say I
have a task that must be completed in 1 second, but takes 2 seconds on
the best currently available processor. The task can't be
parallelized (ie. every part of the task depends on the previous). By
using multiple processors running faster than designed to get the task
done I can get the job done in 1 second. By having multiple
processors running the same job simultaneously, I can check which
result is correct (by a majority vote)

This uses the theory that as the clock speed of the part moves out of
it's designed working region the reliability goes down, and the
further out of the working region, the further it goes down.
 
On Wed, 10 Dec 2008 10:21:41 -0800 (PST), omattos <omattos@gmail.com> wrote:
: > This kind of redundancy is always be more expensive in time and material
: > than reducing the error rate by merely operating components 'in spec'.
:
: I was more thinking of redundancy like this for performance. Say I
: have a task that must be completed in 1 second, but takes 2 seconds on
: the best currently available processor. The task can't be
: parallelized (ie. every part of the task depends on the previous). By
: using multiple processors running faster than designed to get the task
: done I can get the job done in 1 second. By having multiple
: processors running the same job simultaneously, I can check which
: result is correct (by a majority vote)
:
: This uses the theory that as the clock speed of the part moves out of
: it's designed working region the reliability goes down, and the
: further out of the working region, the further it goes down.
:

Can you fillin a blank please? What's the application?
 
'Howard' wrote:
Can you fillin a blank please? What's the application?
_____

Gedanenexperiment

Phil Weldon

"Howard" <bit-bucket@queue.to> wrote in message
news:slrngk24qe.mnv.bit-bucket@individual.net...
On Wed, 10 Dec 2008 10:21:41 -0800 (PST), omattos <omattos@gmail.com
wrote:
: > This kind of redundancy is always be more expensive in time and
material
: > than reducing the error rate by merely operating components 'in spec'.
:
: I was more thinking of redundancy like this for performance. Say I
: have a task that must be completed in 1 second, but takes 2 seconds on
: the best currently available processor. The task can't be
: parallelized (ie. every part of the task depends on the previous). By
: using multiple processors running faster than designed to get the task
: done I can get the job done in 1 second. By having multiple
: processors running the same job simultaneously, I can check which
: result is correct (by a majority vote)
:
: This uses the theory that as the clock speed of the part moves out of
: it's designed working region the reliability goes down, and the
: further out of the working region, the further it goes down.
:

Can you fillin a blank please? What's the application?
 
On Thu, 11 Dec 2008 08:26:37 -0500, Phil Weldon <notdisclosed@example.com> wrote:
: 'Howard' wrote:
: > Can you fillin a blank please? What's the application?
: _____
:
: Gedanenexperiment

I was sort of hoping there'd be a use for it

:
: Phil Weldon
:



: "Howard" <bit-bucket@queue.to> wrote in message
: news:slrngk24qe.mnv.bit-bucket@individual.net...
: > On Wed, 10 Dec 2008 10:21:41 -0800 (PST), omattos <omattos@gmail.com>
: > wrote:
: > : > This kind of redundancy is always be more expensive in time and
: > material
: > : > than reducing the error rate by merely operating components 'in spec'.
: > :
: > : I was more thinking of redundancy like this for performance. Say I
: > : have a task that must be completed in 1 second, but takes 2 seconds on
: > : the best currently available processor. The task can't be
: > : parallelized (ie. every part of the task depends on the previous). By
: > : using multiple processors running faster than designed to get the task
: > : done I can get the job done in 1 second. By having multiple
: > : processors running the same job simultaneously, I can check which
: > : result is correct (by a majority vote)
: > :
: > : This uses the theory that as the clock speed of the part moves out of
: > : it's designed working region the reliability goes down, and the
: > : further out of the working region, the further it goes down.
: > :
: >
: > Can you fillin a blank please? What's the application?
:
 
Can you fillin a blank please? What's the application?  
The main one is experimentation to take a fresh look at the design of
modern high performance digital electronics. One example of a system
that is currently used and can be "wrong" is branch prediction in
modern CPU's. In most cases, it predicts the branch correctly and
performance is improved, but in some cases it incorectly predicts the
branch and performance is reduced, but the overall effect is an
increase in average performance.

My suggestion is to take this to the next logical step - since the
outcome of the branch prediction doesn't affect the results of the
calculation but instead only the performance, there is no reason it
has to be 100% deterministic. The same can be applied to other parts
of the CPU provided that, as with branch prediction, faults can be
retrospectively detected and corrected. The key to this system would
be effective fault detection (which is easy if you think about it,
since checking the execution history of a core is now a parallel
problem not a serial one), and also effective fault recovery.
Recovery is harder, since effectively you need a way to "roll back" to
the state just before the error occurred - this could include rolling
back main memory, but I believe it's still possible.

The same tech would also have advantages for reducing CPU testing and
development time - it wouldn't matter if you happened to happened to
introduce a "pentium divide bug" (http://www.google.co.uk/search?
q=pentium+divide+bug) since the "detection" code would detect the
error and recalculate the result using a slower failsafe mechanism.
Equivalently bugs could be introduced on purpose to increase speed for
the majority of cases - effectively turn on optimizations that don't
work for corner cases.

I'm really looking to see if this idea has been investigated before
and if it's already proven to not add significantly to performance.

If there isn't a consensus that it simply won't work, I might have a
go by making a simple CPU on an FPGA and re-making it with error
detection and recovery and intentionally introducing both random and
systematic errors into the processing core to see how it behaves.
 

Welcome to EDABoard.com

Sponsor

Back
Top