FF Replication with Xilinx ISE

L

Leo

Guest
Hi, I have a design that needs to stall a long pipeline (with the CE input of registers). The module receiving data from this pipeline sends a busy signal, that within 3 clock cycles must completely stall the pipeline. The long routing delays from the end to the beginning of the pipeline cause my design to fail timing.

Now, since the STALL signal comes from a shift register made of (3) Flip-Flops, how can I get the tool to replicate this flip-flop chain, i.e. make a flip flop tree in such a way that meets timing ?

I have tried putting a MAX_FANOUT constraint to the high fanout signal, but the only thing this does is replicate the buffers. Also tried applying "Register Duplication" and "Register Re-Timing" to no-avail (the registers aren't duplicated, they are moved around a little only).

I want this done automatically because the pipeline can vary it's size, but I don't know if it is practical anymore.
 
Leo wrote:
Hi, I have a design that needs to stall a long pipeline (with the CE input of registers). The module receiving data from this pipeline sends a busy signal, that within 3 clock cycles must completely stall the pipeline. The long routing delays from the end to the beginning of the pipeline cause my design to fail timing.

Now, since the STALL signal comes from a shift register made of (3) Flip-Flops, how can I get the tool to replicate this flip-flop chain, i.e. make a flip flop tree in such a way that meets timing ?

I have tried putting a MAX_FANOUT constraint to the high fanout signal, but the only thing this does is replicate the buffers. Also tried applying "Register Duplication" and "Register Re-Timing" to no-avail (the registers aren't duplicated, they are moved around a little only).

I want this done automatically because the pipeline can vary it's size, but I don't know if it is practical anymore.
The last time I tried to play with this in the Xilinx tools, I found
that I had to both turn on register duplication, and turn off equivalent
register removal. However, in the end I think even that didn't really
fix the timing and I ended up replicating the signal myself. Again
that gets fun because XST really wants to remove redundant logic,
so I usually tricked it by using a staggered startup (different
delay of reset to each of the replicated flops), which didn't
really affect operation in my system because nothing was happening
right after configuration anyway.

--
Gabor
 
On Tuesday, July 23, 2013 5:38:10 PM UTC-3, Gabor wrote:
Leo wrote:

Hi, I have a design that needs to stall a long pipeline (with the CE input of registers). The module receiving data from this pipeline sends a busy signal, that within 3 clock cycles must completely stall the pipeline. The long routing delays from the end to the beginning of the pipeline cause my design to fail timing.



Now, since the STALL signal comes from a shift register made of (3) Flip-Flops, how can I get the tool to replicate this flip-flop chain, i.e. make a flip flop tree in such a way that meets timing ?



I have tried putting a MAX_FANOUT constraint to the high fanout signal, but the only thing this does is replicate the buffers. Also tried applying "Register Duplication" and "Register Re-Timing" to no-avail (the registers aren't duplicated, they are moved around a little only).



I want this done automatically because the pipeline can vary it's size, but I don't know if it is practical anymore.



The last time I tried to play with this in the Xilinx tools, I found

that I had to both turn on register duplication, and turn off equivalent

register removal. However, in the end I think even that didn't really

fix the timing and I ended up replicating the signal myself. Again

that gets fun because XST really wants to remove redundant logic,

so I usually tricked it by using a staggered startup (different

delay of reset to each of the replicated flops), which didn't

really affect operation in my system because nothing was happening

right after configuration anyway.



--

Gabor
OK, now the problem would be how to do it dynamically. I mean, if the pipeline gets bigger or there is more than one pipeline running in parallel, how can I set the appropriate number of replicas of the signal (without trial-and-error)?
 
OK, now the problem would be how to do it dynamically. I mean, if the pipeline gets bigger or there is more than one pipeline running in parallel, how can I set the appropriate number of replicas of the signal (without trial-and-error)?
I think the attributes "equivalent_register_removal" and "shreg_extract"
(getting SRL16s doesn't help with timing...) could help you there.
Replicate the registers 'by hand' in your pipeline entity and set the
above attributes. By instantiating this module several times each
instance will have it's own set of registers....
You can set the attributes in the VHDL code. Example:
attribute equivalent_register_removal : string;
attribute equivalent_register_removal of [signal_name] : signal is "no";

HTH
 
On Wednesday, July 24, 2013 11:13:47 AM UTC-3, rndhro wrote:
OK, now the problem would be how to do it dynamically. I mean, if the pipeline gets bigger or there is more than one pipeline running in parallel, how can I set the appropriate number of replicas of the signal (without trial-and-error)?





I think the attributes "equivalent_register_removal" and "shreg_extract"

(getting SRL16s doesn't help with timing...) could help you there.

Replicate the registers 'by hand' in your pipeline entity and set the

above attributes. By instantiating this module several times each

instance will have it's own set of registers....

You can set the attributes in the VHDL code. Example:

attribute equivalent_register_removal : string;

attribute equivalent_register_removal of [signal_name] : signal is "no";



HTH
I have tried assigning attributes to signals/entities before. My question would be: If I have an entity (in VHDL or Module in Verilog) that has a Clock Enable (CE) input that goes to all the registers inside the entity, how can I make the entity portable and generic enough (i.e. to be use it in a different design), without needing to replicate the CE signal for each (large) set of registers (hence ending up with as many CE inputs as groups of registers are inside)?
 

Welcome to EDABoard.com

Sponsor

Back
Top