expressing operator * as pipelined

C

Chris Hinsley

Guest
This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end

Chris
 
In article <2011050416141597354-chrishinsley@gmailcom>,
Chris Hinsley <chris.hinsley@gmail.com> wrote:
This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end
Check your synthesis guide for your FPGA target. It should give you
a template for how to infer the correct multiplier. For a pipelined
multiplier, it's usually in the form of:

assign mult_out = a * b;
always @( posedge clk )
begin
pipelined0 <= mult_out;
pipelined1 <= pipelined0;
end

I.e. just delay the output by the required number of pipeline
stages. During synthesis it'll infer the multiplier, then suck
in the boundary FFs. You may have to register the inputs
too. As I said, check your specific FPGA user guides for
multiplier inference. Heck read that whole section on
logic inference. It'll be helpful.

--Mark
 
On 2011-05-04 16:14:15 +0100, Chris Hinsley said:

This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end

Chris
Maybe somthing more like this..

module mul(clk, ina, inb, out);
input clk;
input [31:0] ina, inb;
output reg [31:0] out;
reg [1:0] [31:0] o, i;
reg chan;
always @(posedge clk)
begin
chan <= ~chan;
out <= o[chan];
i[chan] <= ina * inb;
o[~chan] <= i[~chan];
end
endmodule

Chris
 
On 2011-05-04 16:39:38 +0100, Chris Hinsley said:

On 2011-05-04 16:14:15 +0100, Chris Hinsley said:

This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end

Chris

Maybe somthing more like this..

module mul(clk, ina, inb, out);
input clk;
input [31:0] ina, inb;
output reg [31:0] out;
reg [1:0] [31:0] o, i;
reg chan;
always @(posedge clk)
begin
chan <= ~chan;
out <= o[chan];
i[chan] <= ina * inb;
o[~chan] <= i[~chan];
end
endmodule

Chris
Sorry Mark, posting lag, just seen your reply.

Chris
 
On 2011-05-04 16:41:34 +0100, Chris Hinsley said:

On 2011-05-04 16:39:38 +0100, Chris Hinsley said:

On 2011-05-04 16:14:15 +0100, Chris Hinsley said:

This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end

Chris

Maybe somthing more like this..

module mul(clk, ina, inb, out);
input clk;
input [31:0] ina, inb;
output reg [31:0] out;
reg [1:0] [31:0] o, i;
reg chan;
always @(posedge clk)
begin
chan <= ~chan;
out <= o[chan];
i[chan] <= ina * inb;
o[~chan] <= i[~chan];
end
endmodule

Chris

Sorry Mark, posting lag, just seen your reply.

Chris
I think my idea still might have some merit though, maybe on a FPGA
without a pipelined mul, you may be able to use up 2 multiply blocks
and roll your own 2 cycle latency, 1 mul per clock multiply ?

Chris
 
In article <2011050416460763860-chrishinsley@gmailcom>,
Chris Hinsley <chris.hinsley@gmail.com> wrote:
On 2011-05-04 16:41:34 +0100, Chris Hinsley said:

On 2011-05-04 16:39:38 +0100, Chris Hinsley said:

On 2011-05-04 16:14:15 +0100, Chris Hinsley said:

This might just not be possible, but I'll ask before spending ages
trying to get it working.

I have a multiply I did myself that works fine, but takes a fair
ammount of resources, and has a latency of several cycles. I can feed
it new inputs each clock though.

I'd like to use the FPGA's internal multiply blocks to get the mul done
to save resource, but I don't know if you can use the * operator in a
way that will let me put in a simple bit of behavoural code to express
a pipelined version of the mul ?

I hope you see what I'm getting at.

I'd like to say 'put me in this mul, but know that it wil take 2 cycles
for the result to appear.'

My first thought was maybe do somthing like this, with the results
getting written every other cycle sort of thing ? As useual advise
gratefuly recieved.

always @(posedge clk)
begin
reg [1:0] cnt;
cnt <= cnt + 1;
if (cnt[1])
outa = ina * inb;
else
outb = inc * ind;
end

Chris

Maybe somthing more like this..

module mul(clk, ina, inb, out);
input clk;
input [31:0] ina, inb;
output reg [31:0] out;
reg [1:0] [31:0] o, i;
reg chan;
always @(posedge clk)
begin
chan <= ~chan;
out <= o[chan];
i[chan] <= ina * inb;
o[~chan] <= i[~chan];
end
endmodule

Chris

Sorry Mark, posting lag, just seen your reply.

Chris

I think my idea still might have some merit though, maybe on a FPGA
without a pipelined mul, you may be able to use up 2 multiply blocks
and roll your own 2 cycle latency, 1 mul per clock multiply ?
This is kind-of the opposite of pipelined - a multicycle multiply.
Then using two multicycle multipliers to fill in even/odd cycles?

Well first things first - I don't think you'll find an FPGA family
without a pipelined multiply - and one that operates much faster
than any of the logic around it. (DSP48s in Xilinx family's
operate upwards of the 300 MHz domain.) The biggest
problem with achieving these data rates is actually getting
all the data to them at these rates.

Point two - what you're doing is certainly doable. However, we've
found that it's best to avoid multi-cycle paths. You'll need
to properly constrain the tools to identify which
paths are multicycle. And for us, contraint management is becoming
an even bigger problem. It just gets, well for lack of
a better term, "Icky." So avoiding any timing exceptions like these
is important.

I know others may disagree on the multi-cycle stuff, but
that's our style.

Regards,

Mark
 
Coming in rather late here, I think you already got lots of
good advice but I can't resist introducing you to a cute
little feature of Verilog that may be interesting. Sadly
it's not supported by any synthesis tools as far as I know,
but it's a really neat way to *model* pipelined activity
for use in testbenches etc.

To get a register after some complicated combinational
function such as a multiplier, of course you do...

always @(posedge clock)
out <= some_function(various_inputs);

To get a pipelined version of the same thing, taking
N clock cycles from presentation of input to seeing
the output, you can use an intra-assignment delay:

always @(posedge clock)
out <= repeat (N-1) @(posedge clock)
some_function(various_inputs);

In this formulation there's no need to create
any explicit pipeline register storage - the
simulator does it for you! Too bad it's
simulation-only...
--
Jonathan Bromley
 
In article <93pgs6dmbdhvovncq80rbmufkc2sr424er@4ax.com>,
Jonathan Bromley <spam@oxfordbromley.plus.com> wrote:
Coming in rather late here, I think you already got lots of
good advice but I can't resist introducing you to a cute
little feature of Verilog that may be interesting. Sadly
it's not supported by any synthesis tools as far as I know,
but it's a really neat way to *model* pipelined activity
for use in testbenches etc.

To get a register after some complicated combinational
function such as a multiplier, of course you do...

always @(posedge clock)
out <= some_function(various_inputs);

To get a pipelined version of the same thing, taking
N clock cycles from presentation of input to seeing
the output, you can use an intra-assignment delay:

always @(posedge clock)
out <= repeat (N-1) @(posedge clock)
some_function(various_inputs);

In this formulation there's no need to create
any explicit pipeline register storage - the
simulator does it for you! Too bad it's
simulation-only...
Jonathan,

Now I've learned something new. Didn't know about that event_control
option ( along with the repeat ) in the intra-assignment delay.
That's interesting. Not sure how I'd use it, but good to know.

Was this added during one of the Verilog revisions, or has it always
been there, and I just never noticed?

Thanks,

Mark
 
On 5/9/2011 4:09 PM, Mark Curry wrote:
In article<93pgs6dmbdhvovncq80rbmufkc2sr424er@4ax.com>,
Jonathan Bromley<spam@oxfordbromley.plus.com> wrote:
Coming in rather late here, I think you already got lots of
good advice but I can't resist introducing you to a cute
little feature of Verilog that may be interesting. Sadly
it's not supported by any synthesis tools as far as I know,
but it's a really neat way to *model* pipelined activity
for use in testbenches etc.

To get a register after some complicated combinational
function such as a multiplier, of course you do...

always @(posedge clock)
out<= some_function(various_inputs);

To get a pipelined version of the same thing, taking
N clock cycles from presentation of input to seeing
the output, you can use an intra-assignment delay:

always @(posedge clock)
out<= repeat (N-1) @(posedge clock)
some_function(various_inputs);

In this formulation there's no need to create
any explicit pipeline register storage - the
simulator does it for you! Too bad it's
simulation-only...

Jonathan,

Now I've learned something new. Didn't know about that event_control
option ( along with the repeat ) in the intra-assignment delay.
That's interesting. Not sure how I'd use it, but good to know.

Was this added during one of the Verilog revisions, or has it always
been there, and I just never noticed?
I believe this has been there since the beginning. It is in 1364-1995.
I've seen this in behavioral models a few times. I think it was always
used to represent a pipelined delay of some kind.

Cary
 

Welcome to EDABoard.com

Sponsor

Back
Top