Fast Carry Chains in Xilinx SpartanII FPGA's

A

A Beaujean

Guest
I want to be able to use the fastest possible paths within a SpartanII
FPGA to create internal signals which are simple copies of each other
linked in a chain. Delay between each should be in the order of a few
tens to hundreds of picoseconds.
All of the created signals should however be usable by other internal
logic (in fact on D inputs of a chain of flip-flops clocked all the
same)
My first idea was to define a chain of BUF "components", and see what
happens.
As feared, the (Foundation) development tool just merged all the
signals (No BUF generated).
Forcing a KEEP attribute on all the signals just did not help.
I tried with LUT1's. This works but is much too slow for the
application.
Looking at the FPGA Editor gave me the idea of using the MUXCY,
MUXCY_L or MUXCY_D components of the SpartanII library. Some sort of a
miracle happened then: the dedicated carry chain was selected, running
thru the expected number of CLB's, and speed was excellent. But to my
great surprise, only one flip-flop out of two hooked onto the outputs
of the MUXCY components was selected as being part of the same cell.
The second FF was placed in a totally different CLB. This is not
exactly what I expected, since the application requires a very close
matching of delays.
Any idea why this happens ? Possible corrections ? Thank you
beforehand.
 
There may be other controls in your registers that make packing both
registers in one slice illegal. There can only be one clock enable or
set/reset for both registers in a slice.

You can extend your chain to twice the length and use one register per slice
and everything should flow.


"A Beaujean" <abeaujean@gillam-fei.be> wrote in message
news:8211d046.0404060730.5e01e294@posting.google.com...
I want to be able to use the fastest possible paths within a SpartanII
FPGA to create internal signals which are simple copies of each other
linked in a chain. Delay between each should be in the order of a few
tens to hundreds of picoseconds.
All of the created signals should however be usable by other internal
logic (in fact on D inputs of a chain of flip-flops clocked all the
same)
My first idea was to define a chain of BUF "components", and see what
happens.
As feared, the (Foundation) development tool just merged all the
signals (No BUF generated).
Forcing a KEEP attribute on all the signals just did not help.
I tried with LUT1's. This works but is much too slow for the
application.
Looking at the FPGA Editor gave me the idea of using the MUXCY,
MUXCY_L or MUXCY_D components of the SpartanII library. Some sort of a
miracle happened then: the dedicated carry chain was selected, running
thru the expected number of CLB's, and speed was excellent. But to my
great surprise, only one flip-flop out of two hooked onto the outputs
of the MUXCY components was selected as being part of the same cell.
The second FF was placed in a totally different CLB. This is not
exactly what I expected, since the application requires a very close
matching of delays.
Any idea why this happens ? Possible corrections ? Thank you
beforehand.
 
There are a couple things that can cause that. First, the clock enable
and resets at the register have to be common to both registers in a slice,
if not the placer won't allow both to go in the same slice. If that is
not the case (check the edif netlist to make sure CE's were not duplicated
and therefore different signal names), then it is possible the mapper
aligned the carry chain and the flip-flops differently. This can happen
if the first muxcy has something coming in the ci that causes it to stick
another muxcy at the bottom of the chain. It can also happen if the lsb
of either the carry chain or the flip-flops gets optimized out causing one
or the other to get mis-aligned. You can fix that by instantiating
primitives and putting placement constraints (RLOCs) on them.

A Beaujean wrote:

I want to be able to use the fastest possible paths within a SpartanII
FPGA to create internal signals which are simple copies of each other
linked in a chain. Delay between each should be in the order of a few
tens to hundreds of picoseconds.
All of the created signals should however be usable by other internal
logic (in fact on D inputs of a chain of flip-flops clocked all the
same)
My first idea was to define a chain of BUF "components", and see what
happens.
As feared, the (Foundation) development tool just merged all the
signals (No BUF generated).
Forcing a KEEP attribute on all the signals just did not help.
I tried with LUT1's. This works but is much too slow for the
application.
Looking at the FPGA Editor gave me the idea of using the MUXCY,
MUXCY_L or MUXCY_D components of the SpartanII library. Some sort of a
miracle happened then: the dedicated carry chain was selected, running
thru the expected number of CLB's, and speed was excellent. But to my
great surprise, only one flip-flop out of two hooked onto the outputs
of the MUXCY components was selected as being part of the same cell.
The second FF was placed in a totally different CLB. This is not
exactly what I expected, since the application requires a very close
matching of delays.
Any idea why this happens ? Possible corrections ? Thank you
beforehand.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Another thought: If you're instantiating the MUXCY primitives, you may need
to add an XORCY between the MUXCY and the register. Take a look at the
slice configuration - the Virtex functional data sheet illustration is more
detailed than that found in the Spartan-II's - and you'll see that the MUXCY
output from one "stage" of your carry chain feeds the XORCY of the register
one half slice above.

Also, rather than instantiating 12 MUXCY and XORCY primitives, try using the
13-bit result from 13'h1fff+In where you may need to add some directives to
keep a smart synthesizer from reducing your equation to an equivalent 1-bit
result. Use the top 12 of the 13 bits for your "deterministic" delay and
you're there. It may be cleaner and easier to implement in the end.


"A Beaujean" <abeaujean@gillam-fei.be> wrote in message
news:8211d046.0404060730.5e01e294@posting.google.com...
I want to be able to use the fastest possible paths within a SpartanII
FPGA to create internal signals which are simple copies of each other
linked in a chain. Delay between each should be in the order of a few
tens to hundreds of picoseconds.
All of the created signals should however be usable by other internal
logic (in fact on D inputs of a chain of flip-flops clocked all the
same)
My first idea was to define a chain of BUF "components", and see what
happens.
As feared, the (Foundation) development tool just merged all the
signals (No BUF generated).
Forcing a KEEP attribute on all the signals just did not help.
I tried with LUT1's. This works but is much too slow for the
application.
Looking at the FPGA Editor gave me the idea of using the MUXCY,
MUXCY_L or MUXCY_D components of the SpartanII library. Some sort of a
miracle happened then: the dedicated carry chain was selected, running
thru the expected number of CLB's, and speed was excellent. But to my
great surprise, only one flip-flop out of two hooked onto the outputs
of the MUXCY components was selected as being part of the same cell.
The second FF was placed in a totally different CLB. This is not
exactly what I expected, since the application requires a very close
matching of delays.
Any idea why this happens ? Possible corrections ? Thank you
beforehand.
 
"John_H" <johnhandwork@mail.com> wrote in message news:<yzEcc.7$mx3.84@news-west.eli.net>...
Another thought: If you're instantiating the MUXCY primitives, you may need
to add an XORCY between the MUXCY and the register. Take a look at the
slice configuration - the Virtex functional data sheet illustration is more
detailed than that found in the Spartan-II's - and you'll see that the MUXCY
output from one "stage" of your carry chain feeds the XORCY of the register
one half slice above.

Also, rather than instantiating 12 MUXCY and XORCY primitives, try using the
13-bit result from 13'h1fff+In where you may need to add some directives to
keep a smart synthesizer from reducing your equation to an equivalent 1-bit
result. Use the top 12 of the 13 bits for your "deterministic" delay and
you're there. It may be cleaner and easier to implement in the end.


"A Beaujean" <abeaujean@gillam-fei.be> wrote in message
news:8211d046.0404060730.5e01e294@posting.google.com...
I want to be able to use the fastest possible paths within a SpartanII
FPGA to create internal signals which are simple copies of each other
linked in a chain. Delay between each should be in the order of a few
tens to hundreds of picoseconds.
All of the created signals should however be usable by other internal
logic (in fact on D inputs of a chain of flip-flops clocked all the
same)
My first idea was to define a chain of BUF "components", and see what
happens.
As feared, the (Foundation) development tool just merged all the
signals (No BUF generated).
Forcing a KEEP attribute on all the signals just did not help.
I tried with LUT1's. This works but is much too slow for the
application.
Looking at the FPGA Editor gave me the idea of using the MUXCY,
MUXCY_L or MUXCY_D components of the SpartanII library. Some sort of a
miracle happened then: the dedicated carry chain was selected, running
thru the expected number of CLB's, and speed was excellent. But to my
great surprise, only one flip-flop out of two hooked onto the outputs
of the MUXCY components was selected as being part of the same cell.
The second FF was placed in a totally different CLB. This is not
exactly what I expected, since the application requires a very close
matching of delays.
Any idea why this happens ? Possible corrections ? Thank you
beforehand.

OK: I had a bit more time working again on this idea.
Your idea to define the result of say "111111" + ("00000"&Input) and
register it is the most staightforward choice. Works fine.
All FF's are now perfectly facing the corresponding carry logic. Time
is now pratically deterministic.

Thanks.
 

Welcome to EDABoard.com

Sponsor

Back
Top