New(?) fast binary counter for FPGAs without carry logic (e.

Hello,

On Wednesday, September 12, 2012 5:24:43 PM UTC+2, rickman wrote:
I've seen this used before. They added delay lines after the counter
bits to produce a count output that is simple binary. This was in a
high speed network interface and the front end ran very fast relative to
the now antiquated FPGA technology. The actual circuit may not have
been a counter, it may have been an adder, but it did have a carry chain.

In essence, this circuit is a pipelined, bit serial counter. You still
need to wait for all the bits to be counted or use the conversion formula..
interesting!

1. *please*, could you find the original work?

2.
Actually, my initial design containted a bank of delay lines, recovering the binary counting (maybe unnecessary -- I must check Gabor's post about desynchronizing bits to plain binary counting). The drawback is, there is O(n^2) DFFs if implemented using shift registers. The k-th bit has to be delayed by (n-k) mod 2^k bits, what is still O(n^2). In practice, it gives huge DFF counts for 32..64bit counters.

The delayers may be also implemented using embedded counters -- since there is no need to delay arbitrary signal, only a pulse "p" (and then divide :2). Such a counter may be of ordinary architecture, because it has only log(n) bits. However, all this seem to me to be too complicated and resource usage may asymptotically drop to O(n log(n)), but in practice, it can hardly be less than using shift registers.

I must try Gabor's Verilog to see, what happens.

Thank you very much,
Marek
 
On Wednesday, September 12, 2012 6:40:30 PM UTC+2, rickman wrote:
(..)
If the counter is free running, you really only need to phase each bit
correctly. The first bit is 0, 1, 0, 1... so there are only two phases,
either one or no FFs to delay it rather than n-1. The second bit has a
pattern of four states so the delay is modulo 4 and can be 0, 1, 2 or 3
rather than n-2. Does that help? It should take some of the sting out
of a long counter.
Yes, but not much. As I wrote, it's (n-k) mod 2^k.

(..)
Have you thought of switching to a device with a built in carry chain?
Yes and no. Currently, the Actel architecture suits our needs for
aerospace, radiation tolerant design quite well. Of course we will
port our design to other families, e.g. (much slower) Atmel and (more
recent) Spartan-6.

I only tried to mention the idea -- because it may be useful somewhere
again, in general.


Marek
 
Am Dienstag, 11. September 2012 20:47:01 UTC+2 schrieb robotron:

- does this design exists, is it being used, and if so, what is its name?
If I understand your description correctly it is a carry save accumulator:
http://en.wikipedia.org/wiki/Carry-save_adder

Quote: "To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous calculation and not the current one. In each clock cycle, carries only have to move one step along, and not n steps as in conventional addition."

Kolja Sulimma
www.cronologic.de
 
robotron wrote:
I've seen this used before.  They added delay lines after the
counter bits to produce a count output that is simple binary.
snip
1. *please*, could you find the original work?

Similar sorts of carry pipelining were common in the
early Xilinx XC2000/3000 parts; I recall there being
some fast counter techniques in application notes and
Xcell journals of that era.

Pipelined carry chains at one or two bits per carry were
also commonly used for accumulators and counters in the
GHz clock rate GaAs standard cell GaAs designs that I
worked on in the early 90's.

The pictures from the following TriQuint patent show a
few variants of the input/output deskew trees that can
be implemented for delay equalization of a loadable
accumulator having carry pipelining:
http://www.google.com/patents/US5140540

( disclaimer : I worked with some of the authors back
when I was doing a foundry design through TriQuint )

- Brian
 
Dear colleagues,

thank you for the pointers to prior art.
I have included link to this newsgroup thread to the project page.

Best regards,
Marek
 
Here are some more links regarding counter & accumulator
carry techniques.

--------------------
Links to early Xilinx counter app notes:

Ultra-Fast Synchronous Counters in XC3000 & XC4000 FPGAs
http://www.cs.york.ac.uk/rts/docs/Xilinx-datasource-2003-q1/appnotes/xapp014.pdf

Loadable Binary Counters in a XC3000 FPGA
http://www.cs.york.ac.uk/rts/docs/Xilinx-datasource-2003-q1/appnotes/xapp004.pdf

pages 15-18 of Xcell Journal #7
http://www.xilinx.com/publications/archives/xcell/Xcell7.pdf

--------------------
Haven't found a pdf for XAPP 001 yet:
"
" High-Speed Synchronous Prescaler Counter
" (XAPP 001)
"
" This simple design provides a very basic non-loadable,
" up counter with a count-enable control. However, this
" simplicity permits it to be both the densest and the
" second fastest design.
"
" A prescaler (CEP/CET) technique is used to gain speed,
" permitting the ripple-carry portion of the counter
" eight clock periods in which to settle. Without special
" adaptation, however, this technique precludes loading
" the counter. As a non-loadable counter, three bits can
" be implemented in three CLBs (1 CLB/bit), with the least
" significant six bits requiring only four CLBs; this
" explains the compactness. Only one TILO delay is incurred
" in the ripple-carry path for each three bits.
"

This technique of making the low N bits run fast, with
the upper bits running slower by 2^N, should map well
into a compact yet fast implementation of a non-loadable
binary counter for your Actel part.

I.e., use something like the pcounter scheme for the low
few bits, then make the upper bits with a ripple carry,
enabled by the carry out of the low bits.

You probably will need to add special timing constraints
to get the tools to understand the multicycle carry, and
that the ripple chain is a false path after FF reset.

The advantage of this is that you would now only need to
deskew N LSB's of the counter for straight binary output.

--------------------
ORCA-3 FPGAs had an optional register in the dedicated carry chain:

" Fast-carry logic and routing to adjacent PFUs for
" nibble-wide, byte-wide, or longer arithmetic functions,
" with the option to register the PFU carry-out.

--------------------
More carry-pipelined accumulator references:

( I've mentioned accumulators because they are a more
general carry design problem than are counters, and
because I know where to look for literature describing
high speed pipelined versions.)

"Direct Digital Synthesizers: Theory, Design and Applications", Vankka
lib.tkk.fi/Diss/2000/isbn9512253186/isbn9512253186.pdf
See pages 48-49 for accumulator pipelining techniques.


"Single Chip 500 MHz Function Generator"
P.H. Saul, W. Barber, D.G. Taylor, T. Ward
IEE Proceedings, Vol. 138, No. 2, pp 239-243, April 1991

Reprinted in "Direct Digital Frequency Synthesizers", Kroupa (ed), IEEE Press, 1999

Fig. 2 shows the one-bit-per-carry accumulator structure
Fig. 5 shows the accumulator output deskew tree

-Brian
 

Welcome to EDABoard.com

Sponsor

Back
Top