D
David Brown
Guest
On 02/04/13 19:20, glen herrmannsfeldt wrote:
didn't help). There were two far bigger problems. One is that the chip
was targeted as maximising throughput with little regard for power
efficiency, since it was for the server market - so all of the logic was
running all of the time to avoid latencies, and it has massive caches
that run as fast as possible. The result here is that the original
devices had a power density exceeding the core of a nuclear reactor (it
was probably someone from AMD who worked that out...).
The big problem, however, is that the idea with VLIW is that the
compiler does all the work scheduling instructions in a way that lets
them run in parallel. This works in some specialised cases - some DSP's
have this sort of architecture, and some types of mathematical
algorithms suit it well. But when Intel started work on the Itanium,
compilers were not up to the task - Intel simply assumed they would work
well enough by the time the chips were ready. Unfortunately for Intel,
compiler technology never made it - and in fact, it will never work
particularly well for general code. There are too many unpredictable
branches and conditionals to predict parallelism at compile time. So
most real-world Itanium code uses only about a quarter or so of the
processing units in the cpu at any one time (though some types of code
can work far better). Thus Itanium chips run at half the real-world
speed of "normal" processors, while burning through at least twice the
power.
x86 compatibility was not the "big" problem with the Itanium (though itIn comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote:
For Itanium, the different units do different things. There are
instruction formats that divide up the bits in different ways to make
optimal use of the bits. I used to have the manual nearby, but I don't
see it right now.
Yes, the array processor I worked on was coded from scratch, very
laboriously. The Itanium is trying to run existing code as fast as
possible. So they have a number of units to do similar things, but also
different types, all working in parallel as much as possible. Also the
parallelism in the array processor was all controlled by the programmer.
In regular x86 processors the parallelism is controlled by the chip
itself. I'm amazed sometimes at just how much they can get the chip to
do, no wonder there are 100's of millions of transistors on the thing.
I assume parallelism in the Itanium is back to the compiler smarts to
control since it needs to be coded into the VLIW instructions.
Seems to me that the big problem with the original Itanium was the
need to also run x86 code. That delayed the release for some time, and
in that time other processors had advanced. I believe that later
versions run x86 code in software emulation, maybe with some hardware
assist.
didn't help). There were two far bigger problems. One is that the chip
was targeted as maximising throughput with little regard for power
efficiency, since it was for the server market - so all of the logic was
running all of the time to avoid latencies, and it has massive caches
that run as fast as possible. The result here is that the original
devices had a power density exceeding the core of a nuclear reactor (it
was probably someone from AMD who worked that out...).
The big problem, however, is that the idea with VLIW is that the
compiler does all the work scheduling instructions in a way that lets
them run in parallel. This works in some specialised cases - some DSP's
have this sort of architecture, and some types of mathematical
algorithms suit it well. But when Intel started work on the Itanium,
compilers were not up to the task - Intel simply assumed they would work
well enough by the time the chips were ready. Unfortunately for Intel,
compiler technology never made it - and in fact, it will never work
particularly well for general code. There are too many unpredictable
branches and conditionals to predict parallelism at compile time. So
most real-world Itanium code uses only about a quarter or so of the
processing units in the cpu at any one time (though some types of code
can work far better). Thus Itanium chips run at half the real-world
speed of "normal" processors, while burning through at least twice the
power.