Source of Dynamic Power Consumption in FPGAs

rickman · Apr 12, 2011

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

In CMOS devices the power consumed comes from charging and discharging
capacitance. So I would expect the clock trees with their constant
toggling to be a likely candidate for the most power consumption.
Second on my list is the routing since I would expect the capacitance
to be significant. I expect the LUTs to be next but may be fairly
close to the power consumed in the FFs.

With that in mind, I think the typical way of reducing power by the
use of clock enables on registers which are on the output of logic
block may not be optimal. This is the part that I have not fully
analyzed, but I think it could be significant.

When a register on the output of a logic block is enabled, routing and
logic feeding the register inputs will have dissipated power but after
the clock, routing and logic fed by the output will also dissipate
power regardless whether the next register will be enabled on the next
clock or not! In other words, the routing and logic can dissipate
power just because the inputs to the logic are changing even when that
logic is not needed.

If the registers are placed at the input to a function block the
routing and logic will only dissipate power when the registers are
enabled allowing the register outputs and the logic inputs to change.
Why is this different from output registers? If your design is a
linear pipeline then it is not different. But that is the exception.
With branching and looping of logic flow an output can feed multiple
other logic blocks. If multiple inputs to logic change at different
times this will also increase dissipation. When those other logic
blocks do not need this new data the power used in the routing and
logic is wasted.

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

I don't think I am the first person to think of this. Since this is
not a part of vendors recommendations I think it is a pretty good
indicator that it is not a large enough factor to be useful. Has
anyone seen an analysis on this?

Rick

glen herrmannsfeldt · Apr 12, 2011

rickman <gnuarm@gmail.com> wrote:

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

In CMOS devices the power consumed comes from charging and discharging
capacitance. So I would expect the clock trees with their constant
toggling to be a likely candidate for the most power consumption.
Second on my list is the routing since I would expect the capacitance
to be significant. I expect the LUTs to be next but may be fairly
close to the power consumed in the FFs.

When I first knew about CMOS, it was the transition time when
both transistors were on that was the most significant power
drain, but then logic was slower. As I understand it, in the
faster, smaller devices, it is not tunneling current that is a
large fraction of the power.

With that in mind, I think the typical way of reducing power by the
use of clock enables on registers which are on the output of logic
block may not be optimal. This is the part that I have not fully
analyzed, but I think it could be significant.

The complication in the capacitance calculation is that the metal
widths can change a lot. The input to a buffer can be narrow, and
drive a wider output line. Most FPGAs now include many buffers in
the routing, where routing lines used to be passive. (There are
no internal tristate lines, though the tools will still simulate them.)

When a register on the output of a logic block is enabled, routing and
logic feeding the register inputs will have dissipated power but after
the clock, routing and logic fed by the output will also dissipate
power regardless whether the next register will be enabled on the next
clock or not! In other words, the routing and logic can dissipate
power just because the inputs to the logic are changing even when that
logic is not needed.

Logic design is hard enough without trying to worry about every
last bit of power. I suppose for designs that specifically need
to last, such as digital watches, one should worry about it.

If the registers are placed at the input to a function block the
routing and logic will only dissipate power when the registers are
enabled allowing the register outputs and the logic inputs to change.
Why is this different from output registers? If your design is a
linear pipeline then it is not different. But that is the exception.

I happen to like linear pipelines, but, yes, that is rare.

With branching and looping of logic flow an output can feed multiple
other logic blocks. If multiple inputs to logic change at different
times this will also increase dissipation. When those other logic
blocks do not need this new data the power used in the routing and
logic is wasted.

There are stories about the Cray-1, and how many lines were
carefully measured to be the same length, such that signals would
arrive at the right time. (That was ECL, so I don't believe that
power was the reason.)

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

Well, there isn't that much you can do about it.

I don't think I am the first person to think of this. Since this is
not a part of vendors recommendations I think it is a pretty good
indicator that it is not a large enough factor to be useful. Has
anyone seen an analysis on this?

-- glen

rickman · Apr 12, 2011

On Apr 11, 9:16 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

rickman <gnu...@gmail.com> wrote:

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

Well, there isn't that much you can do about it.

I'm not sure what you mean. The point is that is you need to reduce
power in your design and it has certain features, this may be a useful
technique for reducing the power. I would like to work with the
Silicon Blue devices to measure some power figures for a variety of
designs I've done. When I get to that I may try this technique and
see if it gives useful results.

Rick

Hal Murray · Apr 12, 2011

In article <48a5df08-d39a-4c4e-bb4c-99b37bc624b1@l18g2000yqm.googlegroups.com>,
rickman <gnuarm@gmail.com> writes:

In CMOS devices the power consumed comes from charging and discharging
capacitance.

That was true in the old days.

With modern (really) thin oxide, you have to consider leakage currents.

--
These are my opinions, not necessarily my employer's. I hate spam.

backhus · Apr 12, 2011

On 12 Apr., 00:24, rickman <gnu...@gmail.com> wrote:

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

In CMOS devices the power consumed comes from charging and discharging
capacitance. So I would expect the clock trees with their constant
toggling to be a likely candidate for the most power consumption.
Second on my list is the routing since I would expect the capacitance
to be significant. I expect the LUTs to be next but may be fairly
close to the power consumed in the FFs.

With that in mind, I think the typical way of reducing power by the
use of clock enables on registers which are on the output of logic
block may not be optimal. This is the part that I have not fully
analyzed, but I think it could be significant.

When a register on the output of a logic block is enabled, routing and
logic feeding the register inputs will have dissipated power but after
the clock, routing and logic fed by the output will also dissipate
power regardless whether the next register will be enabled on the next
clock or not! In other words, the routing and logic can dissipate
power just because the inputs to the logic are changing even when that
logic is not needed.

If the registers are placed at the input to a function block the
routing and logic will only dissipate power when the registers are
enabled allowing the register outputs and the logic inputs to change.
Why is this different from output registers? If your design is a
linear pipeline then it is not different. But that is the exception.
With branching and looping of logic flow an output can feed multiple
other logic blocks. If multiple inputs to logic change at different
times this will also increase dissipation. When those other logic
blocks do not need this new data the power used in the routing and
logic is wasted.

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

I don't think I am the first person to think of this. Since this is
not a part of vendors recommendations I think it is a pretty good
indicator that it is not a large enough factor to be useful. Has
anyone seen an analysis on this?

Rick

Hi Rick,
have you taken a look at Xilinx Xpower Analyzer?
For a given design it calculates the power consumption with regard to
many variables.
It's interesting to see the impact of these variables on the power
consumption.
Regardles of the brand, the tendency should be similar for other FPGAs
too.
And maybe other companies have similar tools too.

Have a nice synthesis
Eilert

glen herrmannsfeldt · Apr 12, 2011

rickman <gnuarm@gmail.com> wrote:

On Apr 11, 9:16 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
rickman <gnu...@gmail.com> wrote:

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

Well, there isn't that much you can do about it.

I'm not sure what you mean. The point is that is you need to reduce
power in your design and it has certain features, this may be a useful
technique for reducing the power.

Say you have an XOR gate where the two inputs come from different
FF's clocked on the same clock, but through different paths.
If you can make those two paths the same length, then there won't
be extra transitions.

Now, with ASIC logic you get a lot of control over the wiring,
and could work to get path lengths equal. With FPGAs, you don't
have so much control. Even more, the routing paths are often
buffered, even though you don't see them.

But often the circuits are designed for high-speed, and very
rarely low power. I do remember the 74L series TTL, slower and
lower power. If you make the metal traces narrower (reduce
capacitance) you increase the resistance, slowing down the signal.
You can trade speed for power in many ways.

I would like to work with the
Silicon Blue devices to measure some power figures for a variety of
designs I've done. When I get to that I may try this technique and
see if it gives useful results.

So they give you more control than the usual FPGA tools?

-- glen

shyam · Apr 12, 2011

On Apr 12, 9:50 am, rickman <gnu...@gmail.com> wrote:

On Apr 11, 9:16 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

rickman <gnu...@gmail.com> wrote:

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

Well, there isn't that much you can do about it.

I'm not sure what you mean. The point is that is you need to reduce
power in your design and it has certain features, this may be a useful
technique for reducing the power. I would like to work with the
Silicon Blue devices to measure some power figures for a variety of
designs I've done. When I get to that I may try this technique and
see if it gives useful results.

Rick

Hi Rick, As a regular subsriber of the comp arch group and an
interested engineer in reconfigurable hardware, I started reading your
post. Incidently, I read about SiliconBlue FPGAs and gives me a great
sense of happiness for I work for the IP development on SiliconBlue.

Primarily, the clock gating is very much the prominent approach to
solve dynamic powers.
Operating on a bank wise basis is yet another.
Banks supporting multiple voltage are a feature in SiliconBlue that I
feel should be exploited. For instance once can always run a mDDR bank
at 1.8 and an SD bank at 3.3 and you get substantial reduction power.

Using Clock Enables are very tricky though I think they can do much as
far as power reduction is concerned. But this most times is
architecture specific. For instance in SiliconBlue, if you are having
a clock enable on a register, and if that register is implemented in a
logic block, then the clock enable is the same for other registers in
the logic block and if the design does not have other registers having
the same clock enable, then they are mapped to other logic blocks
which is primarily wastage of resources and additionally increases
dynamic power because there are additional switches used to route
outputs from the other logic block

I have been trying to measure the dynamic power with some design of
mine, but I couldn't get the real setup with software to keep track of
the power consumption on a real time (rendering numbers using
software)

Marc Jet · Apr 12, 2011

Your thoughts about where the power goes seem mostly correct.

My experience is the following. You can't do much about static power,
except by playing with the external voltages. Then there is I/O
consumption, like driving your outputs to the PCB, and hopefully no
floating input pins. Sometimes you can tune those things. But once
this is done, you have to live with the hardware.

In software, the ONE major cause for current consumption is toggling
interconnect.

Each route in reality consists of buffers, tracks, receivers, etc.
But the most important thing you can do to reduce the consumption of
the route is:

a) make the route "shorter", and/or
b) make it toggle less often.

Little else can be done, and therefore it's usually not necessary to
go into the details about what elements physically make up the route.

Clocks have a high togglerate, and high fanout converts it into lots
of routes used. Therefore the clock tree obviously is one of the
major consumers (if not the top one). Unfortunately the clock tree is
also one of the most difficult things to tune or get rid off. Slow
clocks (instead of clock enables) for the slow portions of your design
are probably the most fruitful thing to do with regards to clock.

Data routes however, can often be influenced at the HDL level (by
restructuring the design). Look at your design in the floorplanner
and FPGA editor, and at the CLB description in the datasheet. Figure
out a good way of mapping the functionality onto the available
hardware. Express that in your HDL, and the tools will (to some
degree) follow you through without explicit floorplanning or vendor
specific primitives.

The only other thing worth mentioning is LUT flicker. The different
input signals each have distinct arrival times (steming from their
individual route delay). Each time an input arrives, the LUT output
may toggle (depending on terms). If the LUT drives a "long" route,
this causes high consumption. Several approaches can be used to
reduce it. A simple one is to use few logic levels. Registering
after LUT is usually done in the same slice, thus a very short route
with little consumption during flicker. It has a good net result for
dense data with high togglerate, despite the bigger clock tree.
Obvously, the opposite can be true for slow data.

Best regards,
Marc

rickman · Apr 12, 2011

On Apr 12, 3:31 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

rickman <gnu...@gmail.com> wrote:
On Apr 11, 9:16 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
rickman <gnu...@gmail.com> wrote:
I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.
Well, there isn't that much you can do about it.
I'm not sure what you mean. The point is that is you need to reduce
power in your design and it has certain features, this may be a useful
technique for reducing the power.

Say you have an XOR gate where the two inputs come from different
FF's clocked on the same clock, but through different paths.
If you can make those two paths the same length, then there won't
be extra transitions.

Now, with ASIC logic you get a lot of control over the wiring,
and could work to get path lengths equal. With FPGAs, you don't
have so much control. Even more, the routing paths are often
buffered, even though you don't see them.

I'm not saying this is not a good idea, but it is not what I was
suggesting. Controlling the delay of paths so that edges line up
would be pretty hard to do in my opinion. I expect most devices focus
on just getting the design routed.

But often the circuits are designed for high-speed, and very
rarely low power. I do remember the 74L series TTL, slower and
lower power. If you make the metal traces narrower (reduce
capacitance) you increase the resistance, slowing down the signal.
You can trade speed for power in many ways.

I'm not suggesting a change to chips. In fact, from what I see, most
designs on most chips would get little or no benefit because of the
high static power. But on low power devices there may be some
advantage to moving the registers to logical block inputs rather than
the outputs. This may require more registers, but that shouldn't be
an issue since FPGAs are so register rich.

BTW, when I say "move the registers", I don't mean changing the chip
design. This is just a way of looking at a design and how you
implement the clock enables although it may require some register
duplication.

I would like to work with the
Silicon Blue devices to measure some power figures for a variety of
designs I've done. When I get to that I may try this technique and
see if it gives useful results.

So they give you more control than the usual FPGA tools?

I'm not sure what you are thinking here. What sort of control are you
looking for?

Rick

rickman · Apr 12, 2011

On Apr 12, 2:44 am, backhus <goous...@googlemail.com> wrote:

On 12 Apr., 00:24, rickman <gnu...@gmail.com> wrote:

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

In CMOS devices the power consumed comes from charging and discharging
capacitance. So I would expect the clock trees with their constant
toggling to be a likely candidate for the most power consumption.
Second on my list is the routing since I would expect the capacitance
to be significant. I expect the LUTs to be next but may be fairly
close to the power consumed in the FFs.

With that in mind, I think the typical way of reducing power by the
use of clock enables on registers which are on the output of logic
block may not be optimal. This is the part that I have not fully
analyzed, but I think it could be significant.

When a register on the output of a logic block is enabled, routing and
logic feeding the register inputs will have dissipated power but after
the clock, routing and logic fed by the output will also dissipate
power regardless whether the next register will be enabled on the next
clock or not! In other words, the routing and logic can dissipate
power just because the inputs to the logic are changing even when that
logic is not needed.

If the registers are placed at the input to a function block the
routing and logic will only dissipate power when the registers are
enabled allowing the register outputs and the logic inputs to change.
Why is this different from output registers? If your design is a
linear pipeline then it is not different. But that is the exception.
With branching and looping of logic flow an output can feed multiple
other logic blocks. If multiple inputs to logic change at different
times this will also increase dissipation. When those other logic
blocks do not need this new data the power used in the routing and
logic is wasted.

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

I don't think I am the first person to think of this. Since this is
not a part of vendors recommendations I think it is a pretty good
indicator that it is not a large enough factor to be useful. Has
anyone seen an analysis on this?

Rick

Hi Rick,
have you taken a look at Xilinx Xpower Analyzer?
For a given design it calculates the power consumption with regard to
many variables.
It's interesting to see the impact of these variables on the power
consumption.
Regardles of the brand, the tendency should be similar for other FPGAs
too.
And maybe other companies have similar tools too.

Have a nice synthesis
Eilert

One big difference between Xilinx and SiBlue is that the Xilinx parts
have such enormous static power. They do have their Coolrunner CPLD
series, but they are very limited in size and the prices go up very
quickly with size. So I don't consider them to be practical for the
sort of work that where power is a major issue. Small designs will
use small amounts of power anyway.

The SiBlue parts are shown going as high as 16 kLUTs which may not be
large in a Xilinx sense, but it is big enough to use in a lot of
apps.

Rick

rickman · Apr 12, 2011

On Apr 12, 3:08 am, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
Murray) wrote:

In article <48a5df08-d39a-4c4e-bb4c-99b37bc62...@l18g2000yqm.googlegroups..com>,

rickman <gnu...@gmail.com> writes:
In CMOS devices the power consumed comes from charging and discharging
capacitance.

That was true in the old days.

With modern (really) thin oxide, you have to consider leakage currents.

We are not on the same page. I am tossing out devices like the
monster Xilinx and Altera parts where your main concern is just
getting enough power into the part to allow it to boot without
glitching itself.

The SiBlue devices have a static current in the low double digit uA
range. The dynamic current is single digit mA for a chip filled with
16 bit counters running at 32 MHz. So clearly the dynamic power
consumption is much more significant than the static for these
devices.

Rick

rickman · Apr 12, 2011

On Apr 12, 7:15 am, Marc Jet <jetm...@hotmail.com> wrote:

Your thoughts about where the power goes seem mostly correct.

My experience is the following. You can't do much about static power,
except by playing with the external voltages. Then there is I/O
consumption, like driving your outputs to the PCB, and hopefully no
floating input pins. Sometimes you can tune those things. But once
this is done, you have to live with the hardware.

In software, the ONE major cause for current consumption is toggling
interconnect.

Each route in reality consists of buffers, tracks, receivers, etc.
But the most important thing you can do to reduce the consumption of
the route is:

a) make the route "shorter", and/or
b) make it toggle less often.

Little else can be done, and therefore it's usually not necessary to
go into the details about what elements physically make up the route.

Clocks have a high togglerate, and high fanout converts it into lots
of routes used. Therefore the clock tree obviously is one of the
major consumers (if not the top one). Unfortunately the clock tree is
also one of the most difficult things to tune or get rid off. Slow
clocks (instead of clock enables) for the slow portions of your design
are probably the most fruitful thing to do with regards to clock.

Data routes however, can often be influenced at the HDL level (by
restructuring the design). Look at your design in the floorplanner
and FPGA editor, and at the CLB description in the datasheet. Figure
out a good way of mapping the functionality onto the available
hardware. Express that in your HDL, and the tools will (to some
degree) follow you through without explicit floorplanning or vendor
specific primitives.

The only other thing worth mentioning is LUT flicker. The different
input signals each have distinct arrival times (steming from their
individual route delay). Each time an input arrives, the LUT output
may toggle (depending on terms). If the LUT drives a "long" route,
this causes high consumption. Several approaches can be used to
reduce it. A simple one is to use few logic levels. Registering
after LUT is usually done in the same slice, thus a very short route
with little consumption during flicker. It has a good net result for
dense data with high togglerate, despite the bigger clock tree.
Obvously, the opposite can be true for slow data.

Thanks for your comments Marc. The issue I am addressing is what you
call "LUT flicker". When an output register is not enabled there is
no need for the inputs to be changing which is what causes LUT flicker
power consumption. By moving the registers to the inputs of a logic
block, the inputs will only change when needed reducing LUT flicker
and routing power consumption.

Rick

glen herrmannsfeldt · Apr 12, 2011

rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)

Now, with ASIC logic you get a lot of control over the wiring,
and could work to get path lengths equal. With FPGAs, you don't
have so much control. Even more, the routing paths are often
buffered, even though you don't see them.

I'm not saying this is not a good idea, but it is not what I was
suggesting. Controlling the delay of paths so that edges line up
would be pretty hard to do in my opinion. I expect most devices focus
on just getting the design routed.

(snip)

I'm not suggesting a change to chips. In fact, from what I see, most
designs on most chips would get little or no benefit because of the
high static power. But on low power devices there may be some
advantage to moving the registers to logical block inputs rather than
the outputs. This may require more registers, but that shouldn't be
an issue since FPGAs are so register rich.

I believe the high leakage is still only on the high-end chips,
which are not likely the ones you want for low power designs.

BTW, when I say "move the registers", I don't mean changing the chip
design. This is just a way of looking at a design and how you
implement the clock enables although it may require some register
duplication.

That is an interesting idea. I know I have seen the tools
do strange things with registers. I believe I have seen
combining registers when two had the same inputs and same clocks.
That would make it harder to do what you say in the design, and
have it stick all the way through.

For this to make a large difference, you have to have a very
large fraction of the signals registered at each LUT output,
but systolic arrays do tend to do that.

-- glen

Jon Elson · Apr 12, 2011

On 04/11/2011 05:24 PM, rickman wrote:

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.
I had a problem recently where I had extreme EMI from transitions of

some CMOS chips, single-gate parts. When I measured the shoot-through
transient, I was astounded that is was in the neighborhood of 2 A per
chip! I switched to a different family, and it was reduced to the point
that I couldn't make a valid measurement.

What I take away from that is that some digital designs are made with no
consideration of dynamic current, and others take pains to reduce it.
This, of course, is built into the chips, and you can't affect it much
with the FPGA configuration. The dynamic loading of the FFs and LUTs is
kind of at the mercy of the routing tools, unless you are going to
hand-route (shudder!) Presumably, clock-enable won't affect this part
of the dynamic power, as the FFs weren't going to change state either
way, so you'd only save a little power in the FF itself.

Jon

Phil Jessop · Apr 12, 2011

"rickman" <gnuarm@gmail.com> wrote in message
news:48a5df08-d39a-4c4e-bb4c-99b37bc624b1@l18g2000yqm.googlegroups.com...

In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

In CMOS devices the power consumed comes from charging and discharging
capacitance. So I would expect the clock trees with their constant
toggling to be a likely candidate for the most power consumption.
Second on my list is the routing since I would expect the capacitance
to be significant. I expect the LUTs to be next but may be fairly
close to the power consumed in the FFs.

With that in mind, I think the typical way of reducing power by the
use of clock enables on registers which are on the output of logic
block may not be optimal. This is the part that I have not fully
analyzed, but I think it could be significant.

When a register on the output of a logic block is enabled, routing and
logic feeding the register inputs will have dissipated power but after
the clock, routing and logic fed by the output will also dissipate
power regardless whether the next register will be enabled on the next
clock or not! In other words, the routing and logic can dissipate
power just because the inputs to the logic are changing even when that
logic is not needed.

If the registers are placed at the input to a function block the
routing and logic will only dissipate power when the registers are
enabled allowing the register outputs and the logic inputs to change.
Why is this different from output registers? If your design is a
linear pipeline then it is not different. But that is the exception.
With branching and looping of logic flow an output can feed multiple
other logic blocks. If multiple inputs to logic change at different
times this will also increase dissipation. When those other logic
blocks do not need this new data the power used in the routing and
logic is wasted.

I guess the part I'm unclear on is whether this is truly significant
in a typical design. If the branching is not a large part of a design
or if the branching is only in the control logic and not the data
paths I would expect the difference to be small or negligible.

I don't think I am the first person to think of this. Since this is
not a part of vendors recommendations I think it is a pretty good
indicator that it is not a large enough factor to be useful. Has
anyone seen an analysis on this?

Rick

As far as I know the clocks on FFs are not actually enabled, that is, the
clocks are not gated on and off.
The enable works by routing the Q of the FF back through a mux whose other
input is the FFs D input. The 'enable' controls the mux.
This gets round clock skew problems that would exist with a gated clock
system.
Therefore all FFs on a clock tree will permanently clock and no power saving
can be made.

Phil

Andy · Apr 12, 2011

Some FPGAs have configurable clock trees that are only enabled in
areas where they need to serve FF's. There are also some FPGAs with
enablable clock buffers that can be used to safely gate off the clock
to a (usually large) group of FFs instead of disabling their clock
enables.

Andy

glen herrmannsfeldt · Apr 12, 2011

Jon Elson <jmelson@wustl.edu> wrote:

On 04/11/2011 05:24 PM, rickman wrote:
In considering the nature of power consumption in FPGA devices it
occurred to me to ask what components in the FPGA are responsible for
most of the power. The candidates are clock trees, routing, LUT and
misc logic and finally, FFs.

I had a problem recently where I had extreme EMI from transitions of
some CMOS chips, single-gate parts. When I measured the shoot-through
transient, I was astounded that is was in the neighborhood of 2 A per
chip! I switched to a different family, and it was reduced to the point
that I couldn't make a valid measurement.

It depends on where you set the thresholds on the FETs.
If I understand one page that comes up with google "shoot through" cmos,
it is easier on lower voltage processes, such that VthP + VthN can
be more than Vdd.

What I take away from that is that some digital designs are made with no
consideration of dynamic current, and others take pains to reduce it.

I am not sure about it, but I believe that as you make the gate
faster, which means that the driving transistor comes on sooner,
it gets worse. Again, optimize for speed or power.

-- glen

rickman · Apr 13, 2011

On Apr 12, 4:27 pm, Andy <jonesa...@comcast.net> wrote:

Some FPGAs have configurable clock trees that are only enabled in
areas where they need to serve FF's. There are also some FPGAs with
enablable clock buffers that can be used to safely gate off the clock
to a (usually large) group of FFs instead of disabling their clock
enables.

I remember parts from Concurrent Logic which was acquired by Atmel
that had this feature. The clock line of an entire column could be
turned off. I want to say this was a configuration feature, not under
control of the active design, but it was long time ago.

Still, the fact that a FF is receiving a clock does not mean it is
dissipating power because of it. The power consumption comes from the
gates changing state. If the enable to the FF is not set and the
output won't change, then there is no power consumption in the FF. I
would expect that any power consumption caused by the capacitance of
the clock input charging and discharging would be lumped into the
clock tree dissipation.

Rick

jc · Apr 13, 2011

Great discussion and interesting to hear others viewpoints.

My take on what has been mentioned so far is that there are two major
sources of "controllable" power consumption in a typical large-scale
fpga design: combinatorial "flicker" (good word, Marc) and clock
enables. (although the word "controllable" is used loosely here.)

It seems that the most control a designer has in reducing flicker is
to reduce logic levels. The first obvious step to this approach is to
add pipeline regs, though I suppose one could even influence the
actual combinatorial implementation in some creative way. But I
caution that the tools can undermine any attempts at reducing flicker
to a certain degree through optimization techniques such as resource
sharing and reg re-timing, just to name two. Also, it would appear to
me extremely hard to get a grasp on how much power can be saved by
such measures, especially a design is re-routed over and over again,
causing the tool to use a different "path" of optimization steps for
each iteration.

Clock enables are useful as an additional functional input. Of course,
they are preferred to actually gating the clock. But clock enables
could also be used for "don't care" data paths to reduce power.
Imagine a 64-bit data bus transitioning through a pipeline of regs.
Propagating a data valid signal that arrives in parallel with the data
bus into the pipeline is a convenient way to generate clock enables at
each subsequent pipe stage.

John

Jon Elson · Apr 13, 2011

On 04/12/2011 04:04 PM, glen herrmannsfeldt wrote:

I am not sure about it, but I believe that as you make the gate
faster, which means that the driving transistor comes on sooner,
it gets worse. Again, optimize for speed or power.
I can imagine a structure where you have a separate driver for each of

the output pad transistors. Then, you could design each of those
drivers to have just the right delay on each edge to assure almost
break-before-make operation of the final transistors. This might make
more sense in the case of simple logic parts like single gates and FFs
than in an FPGA.

Jon

Source of Dynamic Power Consumption in FPGAs

rickman

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Hal Murray

Guest

backhus

Guest

glen herrmannsfeldt

Guest

shyam

Guest

Marc Jet

Guest

rickman

Guest

rickman

Guest

rickman

Guest

rickman

Guest

glen herrmannsfeldt

Guest

Jon Elson

Guest

Phil Jessop

Guest

Andy

Guest

glen herrmannsfeldt

Guest

rickman

Guest

jc

Guest

Jon Elson

Guest

Log in

Welcome to EDABoard.com

Sponsor