OT: Fast Circuits

Chris Maryan · Jan 7, 2011

A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule).

So we had some theories about the cause of the difference:
- Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC?
- The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism.
- The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static.
- The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower speeds to reduce power consumption. That is you could get a 3.5GHz ARM processor, but it'd be 100W.

Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited to somewhere between 500MHz-1.5GHz.

Chris

Nico Coesel · Jan 7, 2011

Chris Maryan <kmaryan@gmail.com> wrote:

A coworker and I were debating what do the likes of Intel, IBM and AMD do d=
ifferently that allows them to design circuits at 3GHz+. In contrast with F=
PGAs which for the most part run on a similar process node (i.e. 65 or 40nm=
), but where even the major static blocks (i.e. DSP blocks) are only capabl=
e of around 500Mhz performance. Also compare to the fastest ARM chips, grap=
hics chips, most ASICs and other chips which may get up to 1.5GHz, but rare=
ly faster (yes, faster chips do exist, but they are the exception rather th=
an the rule).

So we had some theories about the cause of the difference:
- Intel/IBM are way ahead in their technology development over the likes of=
TSMC and UMC. Doesn't AMD use UMC?

Just compare the power consumption and there is your answer.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Jon Elson · Jan 7, 2011

On 01/07/2011 11:49 AM, Chris Maryan wrote:

A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule).

Well, one of the differences is that the CPUs are predetermined logic.

They only have one "configuration" to get timing closure on, not an
infinitely variable number of possibilities. I think that makes a HUGE
difference. When they complete the design of some particular functional
block, they can know EVERYTHING about it, such as setup and hold times,
clock loading, clock skew within the module, etc. With an FPGA, there
are a number of variables that add a large 'fuzz factor" to the timing
margins and make it a lot harder to operate every FF at the maximum
rate. FPGAs are designed to WORK correctly, but are clearly not
completely optimized for speed. if you want max speed, you may need a
custom part. because the CPU has only one config, they can optimize the
speed to the utmost.

This only explains part of the difference, of course.

Jon

Thomas Entner · Jan 8, 2011

On 7 Jan., 18:49, Chris Maryan <kmar...@gmail.com> wrote:

A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule).

So we had some theories about the cause of the difference:
- Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC?
- The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism.
- The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static.
- The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower speeds to reduce power consumption. That is you could get a 3.5GHz ARM processor, but it'd be 100W.

Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited to somewhere between 500MHz-1.5GHz.

Chris

The same question came to my mind a few days ago... For sure they are
really running basically on the mentioned clock-rate. And of course
the difference to an FPGA is clear (DSP blocks with 2GHz would simply
make no sense when the logic fabric is not fast enough). But why is
Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL-
design (i.e. fewer gates between the flip-flops)? Better process
technology? Better use of dynamic logic? Prefering speed over power in
the process? Something else? My guess is that it is a little bit of
all...

I have crossposted this to comp.arch, as we may get there a better
answer.

Thomas

Gabor · Jan 8, 2011

On Jan 7, 6:03 pm, Thomas Entner <thomas.entne...@gmail.com> wrote:

On 7 Jan., 18:49, Chris Maryan <kmar...@gmail.com> wrote:

A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule).

So we had some theories about the cause of the difference:
- Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC?
- The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism.
- The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic.. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static.
- The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower speeds to reduce power consumption. That is you could get a 3.5GHz ARM processor, but it'd be 100W.

Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited to somewhere between 500MHz-1.5GHz.

Chris

The same question came to my mind a few days ago... For sure they are
really running basically on the mentioned clock-rate. And of course
the difference to an FPGA is clear (DSP blocks with 2GHz would simply
make no sense when the logic fabric is not fast enough). But why is
Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL-
design (i.e. fewer gates between the flip-flops)? Better process
technology? Better use of dynamic logic? Prefering speed over power in
the process? Something else? My guess is that it is a little bit of
all...

I have crossposted this to comp.arch, as we may get there a better
answer.

Thomas

Whip out your Virtex-6 datasheet to find some answers. In the FPGA,
the
logic cells, slices, whatever you want to call them are pretty fast.
Internal
time values may be sub 100 ps for clock to Q or setup. All of this is
completely swamped by the multi nanosecond routing delays in the
same architecture. It's a bit like a miniature version of a PC board
full of tiny ASICs. Every time you leave an ASIC, you get hit with
big
IO buffer delays and board routing delays. Clearly the same process
can do quite well timing-wise when you insert "hard" blocks like
power PC processors or PCIe endpoint blocks. Altera is touting
25 Gb/s SERDES on their latest process. So really the big culprit
is fabric interconnect. You pay a big price for programmability,
and a bigger price for finer grain programmability.

-- Gabor

glen herrmannsfeldt · Jan 8, 2011

In comp.arch.fpga Thomas Entner <thomas.entner99@gmail.com> wrote:
(snip)

The same question came to my mind a few days ago... For sure they are
really running basically on the mentioned clock-rate. And of course
the difference to an FPGA is clear (DSP blocks with 2GHz would simply
make no sense when the logic fabric is not fast enough).

The FPGA routing fabric is slower than direct wiring, and that
comes through to the final speed. But if you do things in parallel,
you can get enough done in a given time.

But why is Intel faster than e.g. ARM in terms of maximum clock rate?

Clock rate is not a good measure of processor speed. You have to
also see how much gets done each clock cycle. For a pipelined
design, clock rate is determined by the logic between pipeline
registers, and faster is usually better. The tradeoffs are not
easy, though, and sometimes the slower clock gets more done.

Better RTL-
design (i.e. fewer gates between the flip-flops)? Better process
technology? Better use of dynamic logic? Prefering speed over power in
the process? Something else? My guess is that it is a little bit of
all...

I have crossposted this to comp.arch, as we may get there a better
answer.

-- glen

Thomas Entner · Jan 8, 2011

The FPGA routing fabric is slower than direct wiring, and that
comes through to the final speed. But if you do things in parallel,
you can get enough done in a given time.

As the OP wrote, the question is a little bit off-topic for
comp.arch.fpga: I think it is clear to all that an ASIC will always be
faster than a FPGA for various reasons in the same process.

But why is Intel faster than e.g. ARM in terms of maximum clock rate?

Clock rate is not a good measure of processor speed. You have to
also see how much gets done each clock cycle. For a pipelined
design, clock rate is determined by the logic between pipeline
registers, and faster is usually better. The tradeoffs are not
easy, though, and sometimes the slower clock gets more done.

But to my knowledge, modern x86-CPUs, with all their out-of-order-
stuff, etc. are still more complex than the latest ARM-CPUs. Still
they achieve higher clock-rates...

Thomas

Andy \"Krazy\" Glew · Jan 8, 2011

On 1/7/2011 3:03 PM, Thomas Entner wrote:

On 7 Jan., 18:49, Chris Maryan<kmar...@gmail.com> wrote:
A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule).

So we had some theories about the cause of the difference:
- Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC?
- The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism.
- The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static.
- The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower speeds to reduce power consumption. That is you could get a 3.5GHz ARM processor, but it'd be 100W.

Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited to somewhere between 500MHz-1.5GHz.

Chris

The same question came to my mind a few days ago... For sure they are
really running basically on the mentioned clock-rate. And of course
the difference to an FPGA is clear (DSP blocks with 2GHz would simply
make no sense when the logic fabric is not fast enough). But why is
Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL-
design (i.e. fewer gates between the flip-flops)? Better process
technology? Better use of dynamic logic? Prefering speed over power in
the process? Something else? My guess is that it is a little bit of
all...

I have crossposted this to comp.arch, as we may get there a better
answer.

Thomas

The main difference: full custom VLSI is faster than ASIC cell based
design is faster than FPGA design.

I am often amazed the other way, at how fast FPGAs are: every time I
look at FPGAs from first principles, I see them as 10X to 16X slower
than full custom design. Yet they are actually closer than that.

Ditto ASIC cell based design. Now, Intel and AMD both do cell based
design, but not necessarily everywhere, and/or are quite willing to
rework the cell library in critical areas. You can often see the
difference on a cell photo: the sort of stack of boxes and then routing
that is typical of cell based design, versus the really dense datapaths
typical of full custom.

Other differences: Intel's big design teams. There's a lot more manual
work at Intel than at many other places. It's worth it, given Intel's
manufacturing runs, to spend a lot of money to make the chip 10%
smaller, but that almost directly translates to profit.

Other points from the original poster and crossposter.

- Intel/IBM are way ahead in their technology development over the
likes of TSMC and UMC. Doesn't AMD use UMC?

Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.

By the way, an interesting conversation is why IBM is above 5GHz,
whereas Intel is not.

- The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel
CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns

into a mess clock enables and logic effectively running at a much slower
rate. Though effective 3GHz performance is still achieved through
parallelism.

While certainly this trick should be in every designer's toolbox, it is
not overall true. Large parts of the CPUs run at the above 3 GHz frequency.

(Way back in Willamette, the important parts of the chip ran at 2X the
published frequency. Not so much any more.)

- The difference is dynamic logic/domino logic/etc. Most common
logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess

of conventional CMOS gates separated by flops. High performance chips
use dynamic logic, lots of latches and similar tricks to avoid the
overhead of static logic. This idea may not stand up to scrutiny as I
understand that the latest Intel architectures (Nehalem) are fully static.

Again, while domino, etc., are an option for every project, many recent
Intel projects have been full static.

E.g. googling "intel nehalem static cmos"

IDF: Inside Nehalem - HotHardware
Aug 22, 2008 ... Another way Intel managed to keep the power
requirements for Nehalem relatively low (130 watts TDP) was by using
static CMOS for all of the ...
hothardware.com/Articles/IDF-Inside-Nehalem/?page=3

[PDF] Intel and Core i7 (Nehalem) Dynamic Power Management
File Format: PDF/Adobe Acrobat - Quick View
hungry. To save power, Intel circuit designers decided to switch
from domino logic to static CMOS based logic circuits when
implementing Nehalem. ...
cs466.andersonje.com/public/pm.pdf - Similar

(I haven't found similarly clear statements for Intel Sandybridge, Atom,
or AMD. But I haven't bothered to look at ISSCC papers. Yet.)

MitchAlsup · Jan 8, 2011

On Jan 7, 8:38 pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net>
wrote:

Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.

When I was at AMD and AMD still owned the Dresden FABs, we would look
at the process technologies from (say) TSMC,... and find that if we
dumped our chips in that FAB they woud run about 1/2 as fast. So,
there is about a factor of 2X in the FAB techology.

By the way, an interesting conversation is why IBM is above 5GHz,
whereas Intel is not.

The market limits Intel to 100 Watt air cooled envelope, IBM is not so
limited.

>> - The difference is dynamic logic/domino logic/etc. Most common
logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess
of conventional CMOS gates separated by flops. High performance chips
use dynamic logic, lots of latches and similar tricks to avoid the
overhead of static logic. This idea may not stand up to scrutiny as I
understand that the latest Intel architectures (Nehalem) are fully static..

Note: excepting for RAMs and ROMs, there is almost no dynamic logic in
one of the x86 manufactures products. Dynamic logic is hard and takes
a lot more designers to get right (some of them in the FAB.) Dynamic
logic is sensitive to the process window swings. In many cases,
dynamic logic is not really faster once you consider not being able to
use the logic in the other 1/2 of the clock cycle and the added skew
on the falling edge of the clock.

E.g. googling "intel nehalem static cmos"

IDF: Inside Nehalem - HotHardware
Aug 22, 2008 ... Another way Intel managed to keep the power
requirements for Nehalem relatively low (130 watts TDP) was by using
static CMOS for all of the ...
hothardware.com/Articles/IDF-Inside-Nehalem/?page=3

[PDF] Intel and Core i7 (Nehalem) Dynamic Power Management
File Format: PDF/Adobe Acrobat - Quick View
hungry. To save power, Intel circuit designers decided to switch
from domino logic to static CMOS based logic circuits when
implementing Nehalem. ...
cs466.andersonje.com/public/pm.pdf - Similar

A good decision based on the power envelope not letting the speed of
dynamic logic to be utilized to its fullest.

Mtich

glen herrmannsfeldt · Jan 8, 2011

In comp.arch.fpga MitchAlsup <MitchAlsup@aol.com> wrote:

On Jan 7, 8:38 pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net
wrote:
Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.
(snip)

Note: excepting for RAMs and ROMs, there is almost no dynamic logic in
one of the x86 manufactures products. Dynamic logic is hard and takes
a lot more designers to get right (some of them in the FAB.) Dynamic
logic is sensitive to the process window swings. In many cases,
dynamic logic is not really faster once you consider not being able to
use the logic in the other 1/2 of the clock cycle and the added skew
on the falling edge of the clock.

The 8080 and 8086 used dynamic logic. (Possibly only for registers.)

One reason the Z80 became more popular than the 8080 was its use
of static logic, and the ability to debug with a slow clock.

The processors with built-in PLL can't be slow clocked, even if
the logic is static.

-- glen

Muzaffer Kal · Jan 9, 2011

On Sat, 8 Jan 2011 22:08:03 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

In comp.arch.fpga MitchAlsup <MitchAlsup@aol.com> wrote:
On Jan 7, 8:38 pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net
wrote:
Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.
(snip)

Note: excepting for RAMs and ROMs, there is almost no dynamic logic in
one of the x86 manufactures products. Dynamic logic is hard and takes
a lot more designers to get right (some of them in the FAB.) Dynamic
logic is sensitive to the process window swings. In many cases,
dynamic logic is not really faster once you consider not being able to
use the logic in the other 1/2 of the clock cycle and the added skew
on the falling edge of the clock.

The 8080 and 8086 used dynamic logic. (Possibly only for registers.)

One reason the Z80 became more popular than the 8080 was its use
of static logic, and the ability to debug with a slow clock.

The processors with built-in PLL can't be slow clocked, even if
the logic is static.

What if the PLL is programmable and can be told to generate a slow
clock?
--
Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services

http://www.dspia.com

Michael S · Jan 9, 2011

On Jan 9, 12:08 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

In comp.arch.fpga MitchAlsup <MitchAl...@aol.com> wrote:> On Jan 7, 8:38 pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net
wrote:
Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.

(snip)

Note: excepting for RAMs and ROMs, there is almost no dynamic logic in
one of the x86 manufactures products. Dynamic logic is hard and takes
a lot more designers to get right (some of them in the FAB.) Dynamic
logic is sensitive to the process window swings. In many cases,
dynamic logic is not really faster once you consider not being able to
use the logic in the other 1/2 of the clock cycle and the added skew
on the falling edge of the clock.

The 8080 and 8086 used dynamic logic. (Possibly only for registers.)

One reason the Z80 became more popular than the 8080 was its use
of static logic, and the ability to debug with a slow clock.

The processors with built-in PLL can't be slow clocked, even if
the logic is static.

-- glen

IIRC, somebody here (or was it Paul DeMone in one of his excellent RWT
articles) explained that dynamic logic of today is different from
dynamic logic of late 70s. From high level perspective today's dynamic
logic could be described as pseudo-static. It works just fine with
slow clocks.

Michael S · Jan 9, 2011

On Jan 8, 11:29 pm, MitchAlsup <MitchAl...@aol.com> wrote:

A good decision based on the power envelope not letting the speed of
dynamic logic to be utilized to its fullest.

Mtich

What you are saying sounds right for quad and hexacores. But what
about duals?
Here is the fastest Westmere-based dual-core: http://ark.intel.com/Product.aspx?id=48504
73W at 3.6 GHz. Taking into account that same-generation hexacore runs
at 130W/3.3 GHz things do look either timing path limited or limited
by desire of the marketeers to avoid too big speed difference in favor
of cheaper part.

glen herrmannsfeldt · Jan 9, 2011

In comp.arch.fpga Muzaffer Kal <kal@dspia.com> wrote:
(snip, I wrote)

The processors with built-in PLL can't be slow clocked, even if
the logic is static.

What if the PLL is programmable and can be told to generate
a slow clock?

What do you mean by slow?

Unless one is allowed to supply an external capacitor, there is
no way to make a PLL go as slow as you can clock the Z80.

How about 1uHz? (or 1nHz or 1pHz?)

At 1GHz, you can make an RC circuit with R=1Mohm, C=1fF,
such that RC=1e-9s

For 1uHz, R=1Mohm, C=1F. That isn't easy, even with an external
capacitor!

-- glen

Michael S · Jan 9, 2011

On Jan 9, 1:53 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

In comp.arch.fpga Muzaffer Kal <k...@dspia.com> wrote:
(snip, I wrote)

The processors with built-in PLL can't be slow clocked, even if
the logic is static.
What if the PLL is programmable and can be told to generate
a slow clock?

What do you mean by slow?

Unless one is allowed to supply an external capacitor, there is
no way to make a PLL go as slow as you can clock the Z80.

How about 1uHz? (or 1nHz or 1pHz?)

At 1GHz, you can make an RC circuit with R=1Mohm, C=1fF,
such that RC=1e-9s

For 1uHz, R=1Mohm, C=1F. That isn't easy, even with an external
capacitor!

-- glen

I can imagine the utility of 1mHz, may be, even 0.1 mHz in some
extreme cases where you want to take a deep thought between the
clocks. But 1uHz? Can't see how it helps debugging.

Normally, when one needs very slow clock he gets it by dividing VCO
with counter. Today many off-the-shelf PLLs have counters built-in.
Although I have to admit that I never seen PLL with built-in counter
that was wide enough to produce 1mHz.

MitchAlsup · Jan 9, 2011

On Jan 8, 4:08 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

In comp.arch.fpga MitchAlsup <MitchAl...@aol.com> wrote:> On Jan 7, 8:38 pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net
wrote:
Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.

(snip)

Note: excepting for RAMs and ROMs, there is almost no dynamic logic in
one of the x86 manufactures products. Dynamic logic is hard and takes
a lot more designers to get right (some of them in the FAB.) Dynamic
logic is sensitive to the process window swings. In many cases,
dynamic logic is not really faster once you consider not being able to
use the logic in the other 1/2 of the clock cycle and the added skew
on the falling edge of the clock.

The 8080 and 8086 used dynamic logic. (Possibly only for registers.)

I was talking about processor designed and FABed in the 2003-2010 time
frame, not made by that company over the entire life of that
architecture.

Yes all the old stuff used dynamic logic. In 2.5 micron single metal
process you had to use dynamic logic to even get the chip to fit in
the die. Times have changed.

Mitch
Mitch

MitchAlsup · Jan 9, 2011

On Jan 9, 3:42 am, Michael S <already5cho...@yahoo.com> wrote:

On Jan 8, 11:29 pm, MitchAlsup <MitchAl...@aol.com> wrote:

A good decision based on the power envelope not letting the speed of
dynamic logic to be utilized to its fullest.

Mtich

What you are saying sounds right for quad and hexacores. But what
about duals?

Q: How could a company afford do another core design that would be
used in a dual processor?

A: They cannot. The dual is the same core as the quad, it just gets to
run in a more power rich environment.

Mitch

Nico Coesel · Jan 9, 2011

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

In comp.arch.fpga Muzaffer Kal <kal@dspia.com> wrote:
(snip, I wrote)

The processors with built-in PLL can't be slow clocked, even if
the logic is static.

What if the PLL is programmable and can be told to generate
a slow clock?

What do you mean by slow?

Unless one is allowed to supply an external capacitor, there is
no way to make a PLL go as slow as you can clock the Z80.

Bypassing the PLL is a common trick / feature. Many modern processors
allow clocking through the JTAG interface for flash programming an
debugging purposes.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Del Cecchi · Jan 10, 2011

On 1/7/2011 8:38 PM, Andy "Krazy" Glew wrote:
snip

Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't
explain the whole difference, not by a long shot, especially given AMD's
situation.

By the way, an interesting conversation is why IBM is above 5GHz,
whereas Intel is not.

snip

My first guess would be that SOI had something to do with it. Second
would be packaging and higher power budget. Third might be because it
was a no holds barred high end design while Intel had other constraints.

Don't know for sure.
And I know that there are differences of opinion about SOI and its
leverage at various process nodes.

--
del

Roberto Waltman · Jul 14, 2011

Michael S wrote:

glen herrmannsfeldt wrote:
How about 1uHz? (or 1nHz or 1pHz?)
...
For 1uHz, R=1Mohm, C=1F. That isn't easy, even with an external
capacitor!
...
I can imagine the utility of 1mHz, may be, even 0.1 mHz in some
extreme cases where you want to take a deep thought between the
clocks. But 1uHz? Can't see how it helps debugging.

Reading 1uHZ as "full stop",...

A few geological ages ago, I was debugging a quite complex system
based on a Z80 CPU (Fully static design)
Things like an ICE or logic analyzer were luxuries not available to
me, so I had to improvise.

One of the things I did was to hook up a couple of logical gates and a
flip-flop between the RD & WR lines and the WAIT lines, forcing the
CPU into wait state each time it tried to access the bus.

The address and data lines were connected to 3 x 8-bit buffers,
multiplexing the 24 bits into a single 8-bit lane.

That lane plus a few control lines went to a CP/M computer printer
port, where an interpreted BASIC program read the data and then
toggled the flip-flop allowing the Z80 to do one more memory cycle.

The basic program would disassemble the current operation and, if it
involved an external memory transfer show exactly what was being
read/written and where

Voila! With an investment of a few hours of work I had a system that
allowed me to fully trace the program flow and memory access.
Could have any number of breakpoints (just keep toggling the flip-flop
until reaching a given address, then stop for manual control), count
the number of times a branch was taken, etc.
Everything in slow motion, of course, but the information I gathered
was not available otherwise.

I was forcing wait states, but could have accomplished the same thing
gating the CPU clock.
Either technique would have been impossible with chips like the
6800/6502 that would loose state if the clock was below a certain
minimum frequency, and that could not be kept in wait state for more
than a few microseconds.

You can not do the same with a modern controller with on-chip memory,
etc. but still, slowing down the processor so that you can, for
example, check the state of 10 GPIO pins with your 2-channel scope, is
a very valuable feature.

OT: Fast Circuits

Chris Maryan

Guest

Nico Coesel

Guest

Jon Elson

Guest

Thomas Entner

Guest

Gabor

Guest

glen herrmannsfeldt

Guest

Thomas Entner

Guest

Andy \"Krazy\" Glew

Guest

MitchAlsup

Guest

glen herrmannsfeldt

Guest

Muzaffer Kal

Guest

Michael S

Guest

Michael S

Guest

glen herrmannsfeldt

Guest

Michael S

Guest

MitchAlsup

Guest

MitchAlsup

Guest

Nico Coesel

Guest

Del Cecchi

Guest

Roberto Waltman

Guest

Log in

Welcome to EDABoard.com

Sponsor