Clock Enables and Power

J

Jon

Guest
Hi all,
I had a question about Xilinx Virtex II FPGA's. In general is there
an easy way to estimate the power increase by using clock enables vs.
generating multiple internal clocks. Has anyone had any experience
with coding a design both ways and looking at the power increase? I
assume that there must be some power increase because the clock is now
driving the input stage of all the Flops but internally it is gated by
the enable.

Thanks

Jon
 
Hi Jon,
You could try using the Xilinx Power Estimator tool. I'm not sure that
you're right that several separate clocks are better than a clock enabled
design. Here's my reasoning.
A major part of the power consumption is the energy used when a flip-flop
changes state, so this is the same for both designs. So, the only difference
between the two methods is the difference in power to charge and discharge
the global clock networks for the former case, and the power to charge and
discharge the clock enable signals in the latter case. I doubt that there's
much difference.
Why not try the power estimator and report back?
cheers, Syms.


"Jon" <jon8spam@yahoo.com> wrote in message
news:d68b01eb.0404191107.7fd1adf4@posting.google.com...
Hi all,
I had a question about Xilinx Virtex II FPGA's. In general is there
an easy way to estimate the power increase by using clock enables vs.
generating multiple internal clocks. Has anyone had any experience
with coding a design both ways and looking at the power increase? I
assume that there must be some power increase because the clock is now
driving the input stage of all the Flops but internally it is gated by
the enable.

Thanks

Jon
 
Symon wrote:
Hi Jon,
You could try using the Xilinx Power Estimator tool. I'm not sure that
you're right that several separate clocks are better than a clock enabled
design. Here's my reasoning.
A major part of the power consumption is the energy used when a flip-flop
changes state, so this is the same for both designs. So, the only difference
between the two methods is the difference in power to charge and discharge
the global clock networks for the former case, and the power to charge and
discharge the clock enable signals in the latter case. I doubt that there's
much difference.
Why not try the power estimator and report back?
cheers, Syms.
Actually, it may be the other way around. Driving the global nets is
likely to take more power than driving the inputs of the FFs. By having
multiple clocks, multiple sets of clock lines will require more power
vs. the extra power of driving the FF inputs. I guess it may depend on
the relative speeds of the clocks.

Which will take more power, a x N x F1 or (b + a x N) x F2 where a is
the power coefficient for a FF input only, N is the number of FFs
enabled at the lower speed and F1 is the high speed clock frequency; b
is the power coefficient for driving a clock line and F2 is the low
speed clock? Another way to express this is

A F2
- * N <?> -------
B F1 - F2

Or the breakeven point would be

B * F2
N = --------------
A * (F1 - F2)

If N is greater than this, the enabled FFs will use more power. If N is
less than this, the enabled FFs will use less power.

Obviously it is not a simple choice, but depends on several aspects of
your design and the FPGA. The design will even affect A and B somewhat
since FFs in more columns will require more column lines to be driven.
I am not sure the calculator will consider all these effects. But the
timing analyzer will. Too bad they can't combine the timing analysis
with power estimation. I think their is a lot more design info
available in the timing analyzer that could be used to calculate power
consumption.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
rickman wrote:
Symon wrote:

Hi Jon,
You could try using the Xilinx Power Estimator tool. I'm not sure that
you're right that several separate clocks are better than a clock enabled
design. Here's my reasoning.
A major part of the power consumption is the energy used when a flip-flop
changes state, so this is the same for both designs. So, the only difference
between the two methods is the difference in power to charge and discharge
the global clock networks for the former case, and the power to charge and
discharge the clock enable signals in the latter case. I doubt that there's
much difference.
Why not try the power estimator and report back?
cheers, Syms.


Actually, it may be the other way around. Driving the global nets is
likely to take more power than driving the inputs of the FFs. By having
multiple clocks, multiple sets of clock lines will require more power
vs. the extra power of driving the FF inputs. I guess it may depend on
the relative speeds of the clocks.

Which will take more power, a x N x F1 or (b + a x N) x F2 where a is
the power coefficient for a FF input only, N is the number of FFs
enabled at the lower speed and F1 is the high speed clock frequency; b
is the power coefficient for driving a clock line and F2 is the low
speed clock? Another way to express this is

A F2
- * N <?> -------
B F1 - F2

Or the breakeven point would be

B * F2
N = --------------
A * (F1 - F2)

If N is greater than this, the enabled FFs will use more power. If N is
less than this, the enabled FFs will use less power.

Obviously it is not a simple choice, but depends on several aspects of
your design and the FPGA. The design will even affect A and B somewhat
since FFs in more columns will require more column lines to be driven.
I am not sure the calculator will consider all these effects. But the
timing analyzer will. Too bad they can't combine the timing analysis
with power estimation. I think their is a lot more design info
available in the timing analyzer that could be used to calculate power
consumption.
Expanding on this, there was data posted not long ago here, about the
relative power of a 'true clock net', vs a signal used as clock.
ISTR someone from Altera also mentioned a tool /floorplan approach,
that trys to pack logic onto physical clock branches/stubs, and so
avoids driving un-used clock lines.
Would suit a stable design, and one where the saving was worth the effort

It is also good to see IC vendors starting to quote Clock power
figures for Enabled and Disabled counters - that gives a feel for ratios
of .CLK
and .Q power capacitances.
They could easily add this to the power estimator/post route analyser.
Probably just needs customer demand.... :)
-jg
 
Clock Enables vs multiple clocks is a trade-off.
If you are not concerned about power, then a single low0skew global clock
and a "sloppier" network of CEs requires the least amount of thinking.
Multiple derived clocks mean that you have to think about clock transfer
from one clock domain to the next, you may have to use multiple
PLL/DLL/DCMs.

In the extreme case, the use of CE will always save power.
Think of a design with 10 flip-flops clocked at 200 MHs, the remaining 500
flip-flops clocked at 1 MHz. It sure would reduce power when the fast clock
is only routed to the 10 flip-flops and the remaining 500 get that a slow
clock (vs 200 MHz all over the chip, plus a1 MHz CE signal to most
flip-flops)

Peter Alfke

From: "Symon" <symon_brewer@hotmail.com
Newsgroups: comp.arch.fpga
Date: Mon, 19 Apr 2004 13:39:48 -0700
Subject: Re: Clock Enables and Power

Hi Jon,
You could try using the Xilinx Power Estimator tool. I'm not sure that
you're right that several separate clocks are better than a clock enabled
design. Here's my reasoning.
A major part of the power consumption is the energy used when a flip-flop
changes state, so this is the same for both designs. So, the only difference
between the two methods is the difference in power to charge and discharge
the global clock networks for the former case, and the power to charge and
discharge the clock enable signals in the latter case. I doubt that there's
much difference.
Why not try the power estimator and report back?
cheers, Syms.


"Jon" <jon8spam@yahoo.com> wrote in message
news:d68b01eb.0404191107.7fd1adf4@posting.google.com...
Hi all,
I had a question about Xilinx Virtex II FPGA's. In general is there
an easy way to estimate the power increase by using clock enables vs.
generating multiple internal clocks. Has anyone had any experience
with coding a design both ways and looking at the power increase? I
assume that there must be some power increase because the clock is now
driving the input stage of all the Flops but internally it is gated by
the enable.

Thanks

Jon
 
"Peter Alfke" <peter@xilinx.com> wrote in message
news:BCA9A92A.5F50%peter@xilinx.com...
Clock Enables vs multiple clocks is a trade-off.
If you are not concerned about power, then a single low0skew global clock
and a "sloppier" network of CEs requires the least amount of thinking.
Multiple derived clocks mean that you have to think about clock transfer
from one clock domain to the next, you may have to use multiple
PLL/DLL/DCMs.
Indeed, your customer/boss/well-being is almost always far better served by
the
1) reduced time to market
2) improved design accuracy and stability
3) design portability
4) simplicity
of having a single clock with CEs for the slower stuff. It's hard to imagine
a situation where a multiple clock system would be worth the hassle. (Maybe
the use of legacy stuff would be one reason?)
IMO, Syms.
 
4) simplicity
of having a single clock with CEs for the slower stuff. It's hard to imagine
a situation where a multiple clock system would be worth the hassle. (Maybe
the use of legacy stuff would be one reason?)
How about battery operation where power consumption is critical?
(It turns into battery life.)

On the high speed/performance end, it might save a bit of heat.
Going from X watts to X-2 might be critical. (Look at the games
modern CPUs are doing to balance heat and performance.)

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Peter Alfke wrote:
Clock Enables vs multiple clocks is a trade-off.
If you are not concerned about power, then a single low0skew global clock
and a "sloppier" network of CEs requires the least amount of thinking.
Multiple derived clocks mean that you have to think about clock transfer
from one clock domain to the next, you may have to use multiple
PLL/DLL/DCMs.
That reminds me of another issue. When you use a single clock with
clock enables, you have to produce timing constraints to allow the
enabled parts of the circuit to be routed with lesser constraints. But
this is one part of the design process that is prone to error and has no
method of verification that I am aware of. Of course some would say
that you only need to "pay careful attention" to the timing constraints,
but you can make that argument about *any* part of the design process.
The point is that if you use separate clocks, the timing constraints are
very simple and much harder to mess up. With a single clock and
multiple clock enables, the timing constraints are not so simple and
very easy to make mistakes.

What is really needed is a method of verification of timing constraints,
just like we have verification of other aspects of the design process.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:408551E5.132C60F3@yahoo.com...
The point is that if you use separate clocks, the timing constraints are
very simple and much harder to mess up. With a single clock and
multiple clock enables, the timing constraints are not so simple and
very easy to make mistakes.
Hey Rick,
Very true! The only exception could be that bit in the multi-clock design
where the signals travel from one clock domain to another. If you go the
'enabled' route you need only worry about getting that enable correct. The
multi-clock route could have a lot of places where signals cross domains,
each needing attention in the timing constraints, both delay and skew.
So, I'm a convert to the 'enabled' way. I use that circuit you stuck on here
a few months back (ta very much!) to generate my enables, and I have a Perl
script to work out the MAXDELAYs from the two clock rates, net delays and
Tckos, Ticks. I've found that the
NET "CLK_EN" TNM=FFS "CLK_EN_FFS";
is pretty reliable these days, especially if you use the 'direct_enable'
directive in Synplify. It would be even nicer if you could group stuff into
timing groups in the source code, but that's not really there yet. Which
makes me think, why isn't easy timing constraints part of the RTL HDL
languages?
cheers, Syms.
 
"Symon" <symon_brewer@hotmail.com> wrote in message news:<c63qg2$7ovs9$1@ID-212844.news.uni-berlin.de>...
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:408551E5.132C60F3@yahoo.com...
The point is that if you use separate clocks, the timing constraints are
very simple and much harder to mess up. With a single clock and
multiple clock enables, the timing constraints are not so simple and
very easy to make mistakes.
Hey Rick,
Very true! The only exception could be that bit in the multi-clock design
where the signals travel from one clock domain to another. If you go the
'enabled' route you need only worry about getting that enable correct. The
multi-clock route could have a lot of places where signals cross domains,
each needing attention in the timing constraints, both delay and skew.
So, I'm a convert to the 'enabled' way. I use that circuit you stuck on here
a few months back (ta very much!) to generate my enables, and I have a Perl
script to work out the MAXDELAYs from the two clock rates, net delays and
Tckos, Ticks. I've found that the
NET "CLK_EN" TNM=FFS "CLK_EN_FFS";
is pretty reliable these days, especially if you use the 'direct_enable'
directive in Synplify. It would be even nicer if you could group stuff into
timing groups in the source code, but that's not really there yet. Which
makes me think, why isn't easy timing constraints part of the RTL HDL
languages?
Good question.

Recently I've been planing how to incorporate timing constraints in
Confluence. Because Confluence has implicit clock enables, I think
there is opportunity for semi-automated timing constraints. For
example, designers would only have to place constraints on multi-cycle
enables; the tools would then automatically determine which paths are
multi-cycle. And since clock domains are also implicit, it should be
farily straight forward to issue warnings on unconstrained
false-paths.

Regards,
Tom

--
Launchbird Design System, Inc.
http:www.launchbird.com
 

Welcome to EDABoard.com

Sponsor

Back
Top