Multiple inputs adder

M

MariuszK

Guest
Hello,

I am looking for effective vhdl multiple input adders.
Inputs count could be up to 1024 inputs!
Requirements:
- adder should be full parameterized (inputs count, adder words width,
etc.)
- adder should be optimized from viewpoint of time (frequency) and
resources
- adder should be pipelined

Currently, I want build next tree structure.
a0+a1+a2+a3+a4+a5+a6+a7+......a1023 =

a0+a1 a2+a3 a4+a5 a6 +a7
a01+a23 a45+a67
a0123+a4567
a01234567 + ...
.........................................................
a0123....1023

This architecture for 1024 input adder will be included
- 1023 two inputs adder
- log2(1024)=10 pipeline stages


Thank you for any answer.

Best Regards
Mariusz
 
It would help to know how you get the input data (all at once, in order
one operand per clock, random order one operand per clock, etc.)

If you can only get one input per clock anyway, a simple accumulator
(sum = sum + input) will be as high performance as you can get (the sum
pops out 1 clock after the last input).

The only time you'd need a binary tree type of implementation is if you
have all inputs available at once (doubtful for 1024 inputs).

Most synthesis tools are pretty good at turning a straight sum into a
tree anyway:

a+b+c+d => (a+b)+(c+d)

But there are certainly limits as to how far they can go (1024
inputs?).

Andy


MariuszK wrote:
Hello,

I am looking for effective vhdl multiple input adders.
Inputs count could be up to 1024 inputs!
Requirements:
- adder should be full parameterized (inputs count, adder words width,
etc.)
- adder should be optimized from viewpoint of time (frequency) and
resources
- adder should be pipelined

Currently, I want build next tree structure.
a0+a1+a2+a3+a4+a5+a6+a7+......a1023 =

a0+a1 a2+a3 a4+a5 a6 +a7
a01+a23 a45+a67
a0123+a4567
a01234567 + ...
........................................................
a0123....1023

This architecture for 1024 input adder will be included
- 1023 two inputs adder
- log2(1024)=10 pipeline stages


Thank you for any answer.

Best Regards
Mariusz
 
Andy napisal(a):
It would help to know how you get the input data (all at once, in order
one operand per clock, random order one operand per clock, etc.)

If you can only get one input per clock anyway, a simple accumulator
(sum = sum + input) will be as high performance as you can get (the sum
pops out 1 clock after the last input).

The only time you'd need a binary tree type of implementation is if you
have all inputs available at once (doubtful for 1024 inputs).

Most synthesis tools are pretty good at turning a straight sum into a
tree anyway:

a+b+c+d => (a+b)+(c+d)

But there are certainly limits as to how far they can go (1024
inputs?).

Andy
I will have all input data per clock - all at once. I want calculate
this sum as fast as possible and with pipeline "possibility".

By the way. How can I force synthesis tools to change straight sum into
a tree calculation.
Is that conversion made automatically? = a+b+c+d => (a+b)+(c+d)
Additionally that "tree calculation" in my opinion should be made
in few clocks tact to calculate i.e. 1024 inputs.
Is pipeline architecture will made automatically by synthesis tools??


Mariusz
 
On 17 Jul 2006 12:50:30 -0700, MariuszK
<mariusz.kwiczala@gmail.com> wrote:

I will have all input data per clock - all at once. I want calculate
this sum as fast as possible and with pipeline "possibility".

By the way. How can I force synthesis tools to change straight sum into
a tree calculation.
Consider a recursive function or a recursive generate.

Basically, to calculate sum(array):
if array has one element ->
result = that element,
else ->
result = sum(left half of array) + sum(right half of array)

Some care is required to get everything exactly right if the
array size is not an exact power of 2. Extreme care is
required to get the bit widths of the sums right.

Is that conversion made automatically? = a+b+c+d => (a+b)+(c+d)
Yes, for small calculations; probably not, for huge ones.

Additionally that "tree calculation" in my opinion should be made
in few clocks tact to calculate i.e. 1024 inputs.
If you use a straightforward tree it will have 1023 adders over
10 levels. So it would be very easy to pipeline it over 10 clocks,
or perhaps over 5 clocks if you can squeeze two adds into
one clock cycle.

Depending on your target technology, other structures
such as a Wallace tree may be more efficient than a
simple tree of adders. It's harder to parameterise, though.

Is pipeline architecture will made automatically by synthesis tools??
Not usually, although the better tools can do some pipelining if
you give them appropriate hints. Note also that, unless you
have a behavioural synth tool, you must describe the delays that
will be introduced by the pipelining - not necessarily in the
right place (the tool will move the delay registers into the
pipeline where they will do the most good) but you must
get the right number of cycles of delay.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
Hello,

Depending on your target technology, other structures
such as a Wallace tree may be more efficient than a
simple tree of adders. It's harder to parameterise, though.
My target platform is Xilinx.

Wallace tree structure could be used for multiple input adder???
I use Wallace tree for multiplication, only.
Could you send me any information (links, sample) how can I built
wallace tree structure for multiple input adder? What others structure
(except simple tree adder) could be used to build multiple input adder?

Thanks
Mariusz
 

Welcome to EDABoard.com

Sponsor

Back
Top