Area Optimization

On Jun 15, 2:35 am, Christopher Head <ch...@is.invalid> wrote:
Lots of interesting advice here! In particular I read the Xilinx
whitepaper with interest. Unfortunately, a lot of the advice seemed to
be inapplicable to my problem. I can't look for the individual
submodule that's taking up most of the area, because my application is
a single long pipeline with a large number of very similar stages: the
area isn't taken up by any one stage, but more by the number of stages.
And because the design is a pipeline with general logic (mostly
bitwise, plus a small bit of basic arithmetic) between registers, I
don't really see any opportunities for special primitives like SRLs,
DSPs, or the like that would reduce area. I can probably solve my
problem by building a smaller pipeline and reusing it; I preferred not
to do that as it will decrease system performance but it looks like I
don't have much choice now.

Thanks anyway!
Chris
"General" logic is always ripe for optimization, or maybe I should
say, de-unoptimization. If I were you, I would code each stage as a
separate module and measure the size to compare to what you think it
should be.

I have seen many times where the tools took what I thought was pretty
straight forward code and blew it up to something ugly. Obviously it
was doing what I told it to, but I would have been able to do better
than the machine because I understood the logic better. So I had to
change my code to indicate how it could be simplified.

Don't worry about the special features of a chip. First figure out if
the tools did an ok job...

Rick
 
jt_eaton <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote:

(snip)
You can't avoid 100% of all async reset flops but you can easily do the
99.999% where sync will give you a smaller, faster design and your design
is still a black box equivalent to using the async reset.

With xilinx parts every flop with an async reset wastes 1 lut over a sync
reset. In asic design every async reset flop doubles the number of
endpoints needing timing closure from 1 to 2.
I thought (at least for some) if you do a global async reset that
it used the same reset as for configuration. If you reset from
a LUT output, then it needs something different. I am not sure
now which family that is for, though.

If you do a really lousy job
in designing your reset distribution then these async paths could become
critical paths and start taking routing resources away from your other more
important paths.
-- glen
 
After tweaking my pipeline a bit and discovering that even after
getting it down to well under 100% LUT utilization, it still utterly
fails to PAR, I'm going to reassess the overall algorithm.

Interesting reads from everyone though, thanks!
Chris

On Tue, 14 Jun 2011 23:35:33 -0700
Christopher Head <chead@is.invalid> wrote:

Lots of interesting advice here! In particular I read the Xilinx
whitepaper with interest. Unfortunately, a lot of the advice seemed to
be inapplicable to my problem. I can't look for the individual
submodule that's taking up most of the area, because my application is
a single long pipeline with a large number of very similar stages: the
area isn't taken up by any one stage, but more by the number of
stages. And because the design is a pipeline with general logic
(mostly bitwise, plus a small bit of basic arithmetic) between
registers, I don't really see any opportunities for special
primitives like SRLs, DSPs, or the like that would reduce area. I can
probably solve my problem by building a smaller pipeline and reusing
it; I preferred not to do that as it will decrease system performance
but it looks like I don't have much choice now.

Thanks anyway!
Chris
 
On Thu, 16 Jun 2011 23:58:16 -0700, Christopher Head wrote:

After tweaking my pipeline a bit and discovering that even after getting
it down to well under 100% LUT utilization, it still utterly fails to
PAR, I'm going to reassess the overall algorithm.

Interesting reads from everyone though, thanks! Chris
At this point, try to PAR with a ridiculously slow target clock.
If that works, increase the clock until it fails, and let the timing
report tell you which part of the pipeline is failing. Re-engineer that,
and repeat...

- Brian
 
On Jun 15, 8:40 pm, "jt_eaton"
<z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote:
As to the philosophical avoidance of async resets, I can't say I share
that belief.  As you point out, there is one async reset on the chip
that you can't eliminate, the PROGRAM pin.  Even if it doesn't reset
the FFs, it will stop the design from working and reload all the LUTs
and memory.

Rick

You can't avoid 100% of all async reset flops  but you can easily do the
99.999% where sync will give you a smaller, faster design and your design
is still a  black box equivalent to using the async reset.

With xilinx parts every flop with an async reset wastes 1 lut over a sync
reset. In asic design every async reset flop  doubles the number of
endpoints needing timing closure from 1 to 2. If you do a really lousy job
in designing your reset distribution then these async paths could become
critical paths and start taking routing resources away from your other more
important paths.

Async resets on flops are nothing but trouble.

John

---------------------------------------        
Posted throughhttp://www.FPGARelated.com
Actually you miss the point. There is no 99.999% issue. When you hit
the PROGRAM pin, it is an async input and your entire design stops
while the chips reconfigures. So any "analog" issues you may have
with async reset inputs applies to the PROGRAM pin. Not much you can
do but tie it off hard, but according to your description of the
problems an async input has this won't address your concerns.

Also, your analysis of the LUT utilization is flawed. There is only a
LUT savings in some cases where using the set and/or reset inputs to
the FF as sync inputs will save you a LUT. There are plenty of logic
cases where this is not true. Heck, there are plenty of cases where
no LUTs are used with a FF. So how can you save a LUT then?

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top