Code for Verilog 8bit * 8bit pipelined multiplier

Guest
I am looking for an 8bit * 8 bit pipelined multiplier.

The algorithm I am trying to implement involves:

1 of the 8bits is used as the multiplier and the other, the
multiplicand.

A clock signal is used to activate this process. After the clock cycle:
the first bit of the multiplier is looked at: if it

is a "1", the multiplicand, ie 8 bits are copied into a buffer for
storage, if the first bit is a "0" all zeroes ie. 00000000

are copied into the awaiting buffer.
Secondly, the second bit of the multiplier is looked at, if this is a
"1" the contents of the multiplicand are copied into

the same buffer, but shifted one place to the left. An extra "0" is
thus inserted in the space left after the shift. If the

multiplier bit is a "0", then all zeroes are copied into the same
buffer and shifted one place to the left, with an extra

zero occupying the space left after the shift ie "000000000"
This procedure is continued for all bits of the multiplier, shifting
the product by 3 or 3 or 4 etc. The product results

stored in the same buffer. Therefore this buffer would be kind of
large.
After this "logic coding" exercise we should have 8 products in the
buffer, ie. P0, P1, P2, P3, P4, P5, P6, P7, P8.

As seen above the clock signal is used for synchronization.

After the buffer, other clock signals, attached to the first clock
signal are used to push these products into several

awaiting carry save adders (CSA). These CSA take 3 inputs, one of
course the clock cycle, the others the products P0, P1 etc.

After much research I think this CSA format is in the shape of what is
called a Wallace tree.

After these several summations, two sums are found, these are then
inserted into a CPA, for the final summation. This result

is the product of the 8bit * 8bit pipelined multiplier.

I hope what i've stated above makes sense.
I would really appreciate your assistance as soon as possible, since I
have never used Verilog before.

Any help will be welcomed and appreciated.

Thank You!

Stephen
 
stephen.horsford@gmail.com schrieb:

I am looking for an 8bit * 8 bit pipelined multiplier.
Not pipelined, but the recommended solution:

reg [7:0] result;
reg [7:0] a;
reg [7:0] b;

assign result = a * b;

This would result AFAIK in an unsigned multiplication. I VHDL (sorry) I
would write:

result_u<=std_ulogic_vector( unsigned(a) * unsigend(b) );
result_s<=std_ulogic_vector( signed(a) * sigend(b) );


Why do you need pipelining for such narrow bitwidths?


A clock signal is used to activate this process. After the clock cycle:
the first bit of the multiplier is looked at: if it

is a "1", the multiplicand, ie 8 bits are copied into a buffer for
storage, if the first bit is a "0" all zeroes ie. 00000000

are copied into the awaiting buffer.
Do you want to build a one addition per clock serial multiplier (in
other words: some logic plus an accumulator)?


After this "logic coding" exercise we should have 8 products in the
buffer, ie. P0, P1, P2, P3, P4, P5, P6, P7, P8.
I guess so many buffers plus (de)mux logic may be almost bigger than a
pure combinational multiplier.
Why do you need all these buffers and don't use an accumulator?


After much research I think this CSA format is in the shape of what is
called a Wallace tree.
The wallace tree is a pure combinational multiplier. There is no
pipelining (except you break the tree into some parts and add a pipeline).

For these narrow bitwidths a tree is not so much faster than an array
multiplier. Much more important is the sign extension. Baugh-Wooley is a
suitable algorithm for this and Booth-Encoding is also very good.

I did power comparisons for several low-bitwidth multipliers.
http://www.ralf-hildebrandt.de/publication/icm2004/IEEE_icm2004_webpaper_hildebrandt.pdf
http://www.ralf-hildebrandt.de/publication/icm2004/icm2004_hildebrandt_lecture.pdf

A Booth/Wallace tree is quite fast, not too expensive in terms of power,
but is is said, that trees are harder to route than arrays.
For low bitwidths the Leapfrog-Carry-Save-Array (especially my modified
verion) is a good chioce in terms of power.


But you can save a lot of time if you just use the multiplier that is
inferred by your synthesis tool using the '*' Operator.


Ralf
 
Thanx so much for the reply Ralf!
The articles are very interesting.
I am doing some research for a course in college on Pipelining
My professor said this looked like a good example algorithm to use.
From one of your papers i see the type of multipler is a carry save
multiplier. The final result is found by using a CPA.
I sent my design block diagram to your email account:
Ralf-Hildebra...@gmx.de I hope that is alright.
How wou code this using Verilog or VHDL? Any one is sufficient.
As u can see each output is stored in some sort of buffer and each
stage is connected to the same clocked signal.

Thank you for all you assistance!
I really do appreciate it!

Stephen.


Ralf Hildebrandt wrote:
stephen.horsford@gmail.com schrieb:

I am looking for an 8bit * 8 bit pipelined multiplier.

Not pipelined, but the recommended solution:

reg [7:0] result;
reg [7:0] a;
reg [7:0] b;

assign result = a * b;

This would result AFAIK in an unsigned multiplication. I VHDL (sorry) I
would write:

result_u<=std_ulogic_vector( unsigned(a) * unsigend(b) );
result_s<=std_ulogic_vector( signed(a) * sigend(b) );


Why do you need pipelining for such narrow bitwidths?


A clock signal is used to activate this process. After the clock cycle:
the first bit of the multiplier is looked at: if it

is a "1", the multiplicand, ie 8 bits are copied into a buffer for
storage, if the first bit is a "0" all zeroes ie. 00000000

are copied into the awaiting buffer.

Do you want to build a one addition per clock serial multiplier (in
other words: some logic plus an accumulator)?


After this "logic coding" exercise we should have 8 products in the
buffer, ie. P0, P1, P2, P3, P4, P5, P6, P7, P8.

I guess so many buffers plus (de)mux logic may be almost bigger than a
pure combinational multiplier.
Why do you need all these buffers and don't use an accumulator?


After much research I think this CSA format is in the shape of what is
called a Wallace tree.

The wallace tree is a pure combinational multiplier. There is no
pipelining (except you break the tree into some parts and add a pipeline).

For these narrow bitwidths a tree is not so much faster than an array
multiplier. Much more important is the sign extension. Baugh-Wooley is a
suitable algorithm for this and Booth-Encoding is also very good.

I did power comparisons for several low-bitwidth multipliers.
http://www.ralf-hildebrandt.de/publication/icm2004/IEEE_icm2004_webpaper_hildebrandt.pdf
http://www.ralf-hildebrandt.de/publication/icm2004/icm2004_hildebrandt_lecture.pdf

A Booth/Wallace tree is quite fast, not too expensive in terms of power,
but is is said, that trees are harder to route than arrays.
For low bitwidths the Leapfrog-Carry-Save-Array (especially my modified
verion) is a good chioce in terms of power.


But you can save a lot of time if you just use the multiplier that is
inferred by your synthesis tool using the '*' Operator.


Ralf
 
stephen.horsford@gmail.com schrieb:


I am doing some research for a course in college on Pipelining
My professor said this looked like a good example algorithm to use.
Yes - for educational purpose it is a suitable practice. For the real
world I would recommend the '*' Operator.

From one of your papers i see the type of multipler is a carry save
multiplier.
Most of them are Carry-Save-Trees (Wallace Tree) or Carry-Save-Arrays.

The final result is found by using a CPA.
Yes - at last you get 2 addends.


How wou code this using Verilog or VHDL? Any one is sufficient.
I would code this as I would code it any other design. You need two
things: Combinational logic and Flipflops. For your design latches are
not needed.

always @(negedge reset OR posedge clock) -- flipflop
begin
if (reset==1'b0) begin
data <= resetvalue;
end else /*if posedge clock*/ begin
data <= data_in;
end; //if
end //always


always @(a OR b OR c) -- combinational logic
begin
if (c==1'b1) begin
result = a + b;
end else begin
result = a;
end; //if
end //always


Ralf
 
Thanx so much for your assistance Ralf!


Ralf Hildebrandt wrote:
stephen.horsford@gmail.com schrieb:


I am doing some research for a course in college on Pipelining
My professor said this looked like a good example algorithm to use.

Yes - for educational purpose it is a suitable practice. For the real
world I would recommend the '*' Operator.

From one of your papers i see the type of multipler is a carry save
multiplier.

Most of them are Carry-Save-Trees (Wallace Tree) or Carry-Save-Arrays.

The final result is found by using a CPA.

Yes - at last you get 2 addends.


How wou code this using Verilog or VHDL? Any one is sufficient.

I would code this as I would code it any other design. You need two
things: Combinational logic and Flipflops. For your design latches are
not needed.

always @(negedge reset OR posedge clock) -- flipflop
begin
if (reset==1'b0) begin
data <= resetvalue;
end else /*if posedge clock*/ begin
data <= data_in;
end; //if
end //always


always @(a OR b OR c) -- combinational logic
begin
if (c==1'b1) begin
result = a + b;
end else begin
result = a;
end; //if
end //always


Ralf
 

Welcome to EDABoard.com

Sponsor

Back
Top