Spartan 3A counter speed ?

J

Jon Elson

Guest
Hello,

Does anybody have a very rough estimate of how fast
you can run a 32-bit counter in a Spartan 3AN FPGA?

Thanks,

Jon
 
[This followup was posted to comp.arch.fpga and a copy was sent to the
cited author.]

In article <XtKdnbZLV6lQVPHSnZ2dnUVZ_oadnZ2d@giganews.com>,
jmelson@wustl.edu says...
Hello,

Does anybody have a very rough estimate of how fast
you can run a 32-bit counter in a Spartan 3AN FPGA?

Thanks,

Jon
The answer to your question will be had by reading the Xilinx Product
Data Sheets for the basic information which comes in a modular format
including timing for I/O, logic cells, routing and globally distributed
networks. The actual timing achievable will be determined by the
specific design of the counter and how that design is instantiated in
the part.

--

Michael Karas
Carousel Design Solutions
http://www.carousel-design.com
 
Michael Karas wrote:

[This followup was posted to comp.arch.fpga and a copy was sent to the
cited author.]

In article <XtKdnbZLV6lQVPHSnZ2dnUVZ_oadnZ2d@giganews.com>,
jmelson@wustl.edu says...

Hello,

Does anybody have a very rough estimate of how fast
you can run a 32-bit counter in a Spartan 3AN FPGA?

Thanks,

Jon

The answer to your question will be had by reading the Xilinx Product
Data Sheets for the basic information which comes in a modular format
including timing for I/O, logic cells, routing and globally distributed
networks. The actual timing achievable will be determined by the
specific design of the counter and how that design is instantiated in
the part.

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.

Jon
 
On Mon, 26 Mar 2012 14:33:33 -0500
Jon Elson <jmelson@wustl.edu> wrote:

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.

Jon
If I recall correctly, carry chain propagation was on the order of 700
ps/2 bits, but don't quote me on that.

Part of the problem is that I don't think your question is answerable
in the general case of "Spartan 3A". Different device sizes may or may
not allow you to run 32-bits all on one carry chain. If you have to
use two columns instead of just one, the additional performance hit of
that next bit would be substantial.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
 
Rob Gaddi wrote:
On Mon, 26 Mar 2012 14:33:33 -0500
Jon Elson <jmelson@wustl.edu> wrote:

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.

Jon

If I recall correctly, carry chain propagation was on the order of 700
ps/2 bits, but don't quote me on that.

Part of the problem is that I don't think your question is answerable
in the general case of "Spartan 3A". Different device sizes may or may
not allow you to run 32-bits all on one carry chain. If you have to
use two columns instead of just one, the additional performance hit of
that next bit would be substantial.
Well, it took me about 5 minutes to code up a simple project with
a 32-bit counter and enough registers to prevent other logic from
being the worst-case path. In a XC3S50AN-5 there are enough rows
to keep 32-bits in a single carry chain. With no constraints, the
design built and reported 4.402 ns minimum clock period (after
place & route) or about 227 MHz.

YMMV

-- Gabor
 
Gabor wrote:
Rob Gaddi wrote:
On Mon, 26 Mar 2012 14:33:33 -0500
Jon Elson <jmelson@wustl.edu> wrote:

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.

Jon

If I recall correctly, carry chain propagation was on the order of 700
ps/2 bits, but don't quote me on that.

Part of the problem is that I don't think your question is answerable
in the general case of "Spartan 3A". Different device sizes may or may
not allow you to run 32-bits all on one carry chain. If you have to
use two columns instead of just one, the additional performance hit of
that next bit would be substantial.


Well, it took me about 5 minutes to code up a simple project with
a 32-bit counter and enough registers to prevent other logic from
being the worst-case path. In a XC3S50AN-5 there are enough rows
to keep 32-bits in a single carry chain. With no constraints, the
design built and reported 4.402 ns minimum clock period (after
place & route) or about 227 MHz.

YMMV

-- Gabor
Timing constraint: TS_clk = PERIOD TIMEGRP "clk" 4.4 ns HIGH 50%;
For more information, see Period Analysis in the Timing Closure User
Guide (UG612).
574 paths analyzed, 124 endpoints analyzed, 1 failing endpoint
1 timing error detected. (1 setup error, 0 hold errors, 0 component
switching limit errors)
Minimum period is 4.402ns.

--------------------------------------------------------------------------------


Paths for end point count_31 (SLICE_X11Y23.CIN), 30 paths

--------------------------------------------------------------------------------

Slack (setup path): -0.002ns (requirement - (data path - clock
path skew + uncertainty))
Source: count_0 (FF)
Destination: count_31 (FF)
Requirement: 4.400ns
Data Path Delay: 4.363ns (Levels of Logic = 16)
Clock Path Skew: -0.039ns (0.230 - 0.269)
Source Clock: clk_BUFGP rising at 0.000ns
Destination Clock: clk_BUFGP rising at 4.400ns
Clock Uncertainty: 0.000ns

Maximum Data Path: count_0 to count_31
Location Delay type Delay(ns) Physical Resource
Logical
Resource(s)
-------------------------------------------------
-------------------
SLICE_X11Y8.XQ Tcko 0.495 count<0>
count_0
SLICE_X11Y8.F3 net (fanout=1) 0.318 count<0>
SLICE_X11Y8.COUT Topcyf 1.026 count<0>

Mcount_count_lut<0>_INV_0
Mcount_count_cy<0>
Mcount_count_cy<1>
SLICE_X11Y9.CIN net (fanout=1) 0.000 Mcount_count_cy<1>
SLICE_X11Y9.COUT Tbyp 0.130 count<2>
Mcount_count_cy<2>
Mcount_count_cy<3>
SLICE_X11Y10.CIN net (fanout=1) 0.000 Mcount_count_cy<3>
SLICE_X11Y10.COUT Tbyp 0.130 count<4>
Mcount_count_cy<4>
Mcount_count_cy<5>
SLICE_X11Y11.CIN net (fanout=1) 0.000 Mcount_count_cy<5>
SLICE_X11Y11.COUT Tbyp 0.130 count<6>
Mcount_count_cy<6>
Mcount_count_cy<7>
SLICE_X11Y12.CIN net (fanout=1) 0.000 Mcount_count_cy<7>
SLICE_X11Y12.COUT Tbyp 0.130 count<8>
Mcount_count_cy<8>
Mcount_count_cy<9>
SLICE_X11Y13.CIN net (fanout=1) 0.000 Mcount_count_cy<9>
SLICE_X11Y13.COUT Tbyp 0.130 count<10>

Mcount_count_cy<10>

Mcount_count_cy<11>
SLICE_X11Y14.CIN net (fanout=1) 0.000
Mcount_count_cy<11>
SLICE_X11Y14.COUT Tbyp 0.130 count<12>

Mcount_count_cy<12>

Mcount_count_cy<13>
SLICE_X11Y15.CIN net (fanout=1) 0.000
Mcount_count_cy<13>
SLICE_X11Y15.COUT Tbyp 0.130 count<14>

Mcount_count_cy<14>

Mcount_count_cy<15>
SLICE_X11Y16.CIN net (fanout=1) 0.000
Mcount_count_cy<15>
SLICE_X11Y16.COUT Tbyp 0.130 count<16>

Mcount_count_cy<16>

Mcount_count_cy<17>
SLICE_X11Y17.CIN net (fanout=1) 0.000
Mcount_count_cy<17>
SLICE_X11Y17.COUT Tbyp 0.130 count<18>

Mcount_count_cy<18>

Mcount_count_cy<19>
SLICE_X11Y18.CIN net (fanout=1) 0.000
Mcount_count_cy<19>
SLICE_X11Y18.COUT Tbyp 0.130 count<20>

Mcount_count_cy<20>

Mcount_count_cy<21>
SLICE_X11Y19.CIN net (fanout=1) 0.000
Mcount_count_cy<21>
SLICE_X11Y19.COUT Tbyp 0.130 count<22>

Mcount_count_cy<22>

Mcount_count_cy<23>
SLICE_X11Y20.CIN net (fanout=1) 0.000
Mcount_count_cy<23>
SLICE_X11Y20.COUT Tbyp 0.130 count<24>

Mcount_count_cy<24>

Mcount_count_cy<25>
SLICE_X11Y21.CIN net (fanout=1) 0.000
Mcount_count_cy<25>
SLICE_X11Y21.COUT Tbyp 0.130 count<26>

Mcount_count_cy<26>

Mcount_count_cy<27>
SLICE_X11Y22.CIN net (fanout=1) 0.000
Mcount_count_cy<27>
SLICE_X11Y22.COUT Tbyp 0.130 count<28>

Mcount_count_cy<28>

Mcount_count_cy<29>
SLICE_X11Y23.CIN net (fanout=1) 0.000
Mcount_count_cy<29>
SLICE_X11Y23.CLK Tcinck 0.704 count<30>

Mcount_count_cy<30>

Mcount_count_xor<31>
count_31
-------------------------------------------------
---------------------------
Total 4.363ns (4.045ns logic,
0.318ns route)
(92.7% logic,
7.3% route)

That's the worst-case path in the timing report after adding a period
constraint of 4.4 ns. Same achievable period of 4.402...

-- Gabor
 
Jon Elson <jmelson@wustl.edu> wrote:

(snip)
The answer to your question will be had by reading the Xilinx Product
Data Sheets for the basic information which comes in a modular format
including timing for I/O, logic cells, routing and globally distributed
networks. The actual timing achievable will be determined by the
specific design of the counter and how that design is instantiated in
the part.

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.
Well, also it depends on how you use the counter. If you need to be
able to latch the bits from the counter, then the timing might
depend on that, and not the counter itself. (In race terms, to be
able to get lap times while the counter continues to run.)

-- glen
 
Gabor wrote:


Well, it took me about 5 minutes to code up a simple project with
a 32-bit counter and enough registers to prevent other logic from
being the worst-case path. In a XC3S50AN-5 there are enough rows
to keep 32-bits in a single carry chain. With no constraints, the
design built and reported 4.402 ns minimum clock period (after
place & route) or about 227 MHz.
OK, thanks very much. The request was to run a counter at 100 MHz,
and it sounds like this is doable. There are some concerns about crossing
clock boundaries that need to be figured out, but it looks like
it can be done. Thanks VERY much for doing the leg work on this!

I've been learning how to set up a GUI for the project, the one area
I really didn't know enough about. The FPGA part seemed simple except
for the performance.

Jon
 
glen herrmannsfeldt wrote:


Well, also it depends on how you use the counter. If you need to be
able to latch the bits from the counter, then the timing might
depend on that, and not the counter itself. (In race terms, to be
able to get lap times while the counter continues to run.)
Yes, this is part of the design. I guess one needs to make a constraint
so that the counter latches get a coherent sample. I'm thinking I should
synchronize the external clock for each counter to a 150 MHz internal clock,
and use a clock edge detector in the external clock domain to activate the
clock enable of the counter on the internal clock.

Thanks

Jon
 
Rob Gaddi <rgaddi@technologyhighland.invalid> wrote:

On Mon, 26 Mar 2012 14:33:33 -0500
Jon Elson <jmelson@wustl.edu> wrote:

Thanks, that's as close to a non-answer as you can get. The tricky
part is the carry chain for long counters, and they really don't
give you much info there, unless there's a secret manual I have
not been able to find.

Jon

If I recall correctly, carry chain propagation was on the order of 700
ps/2 bits, but don't quote me on that.

Part of the problem is that I don't think your question is answerable
in the general case of "Spartan 3A". Different device sizes may or may
not allow you to run 32-bits all on one carry chain. If you have to
use two columns instead of just one, the additional performance hit of
that next bit would be substantial.
Maybe but you could divide the counter in two (or more) parts. Have
the second counter run on a delayed carry chain. By using registers to
delay the output of the first counter you can align the output results
of the counters.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------
 
Jon Elson <jmelson@wustl.edu> wrote:

(snip, I wrote)
Well, also it depends on how you use the counter. If you need to be
able to latch the bits from the counter, then the timing might
depend on that, and not the counter itself. (In race terms, to be
able to get lap times while the counter continues to run.)

Yes, this is part of the design. I guess one needs to make a
constraint so that the counter latches get a coherent sample.
Without that constraint, you might get to 300MHz or so.

If S3A isn't so different from S3E, 100MHz shouldn't be so
hard with the latch.

I'm thinking I should synchronize the external clock for each
counter to a 150 MHz internal clock, and use a clock edge
detector in the external clock domain to activate the clock
enable of the counter on the internal clock.
I think that sounds right.

You have to meet the setup and hold times for the latch.

-- glen
 

Welcome to EDABoard.com

Sponsor

Back
Top