Constant Mult: The State of High Level Synth (Part II)

K

Kevin Neilson

Guest
Here's another example that shows just how far removed we are from high-level synthesis.

I need to multiply a 10-bit number by 956. I could use a DSP48, but it's overkill, and to meet timing I'd probably have to pipeline it fully and have 3 cycles of latency. So I write this code and force Vivado not to use DSP48s:

reg [9:0] x; // multiplicand
reg [19:0] p; // product
always@(posedge clk)
p <= x*956;

This has 58 LUTs, 3 separate columns of carry chains, and fails timing at my clock speed.

I'd guess that the synthesizer fully optimizes a simple constant multiplier, but just for fun I go down a level of abstraction, looking at the binary representation of 956, and making a CSD (canonical signed digit) version:

reg [9:0] x; // multiplicand
reg [19:0] p; // product
always@(posedge clk)
p <= (x<<10) - (x<<6) - (x<<2); // same as x*956

Results: 19 LUTs, meets timing with 850ps slack. There is only 1 carry chain (ternary add?).

So what I am saying is that I can't even use the '*' sign. I have to go down a level of abstraction in order to make a multiplier that will meet timing. If I can't use the multiply sign, why would I believe I could use C code?
 
I realized that in my application, x probably only needs to be 6 bits wide. So I synthesized that, and got something with carry chains in it. Wait a minute... if each LUT is a 64-bit lookup table, wouldn't you synthesize a constant multiplication by a 6-bit multiplicand as a lookup, with 1 LUT per output bit and 1 level of logic? Yes, you would. No carry chains. But in order to do that, I had to write the logic like this:

reg [5:0] x;
always@(posedge clk)
case(x)
0: p<=956*0;
1: p<= 956*1;
....
63: p<=956*63;
endcase

Yes, that's how I had to write it. High-level synthesis!
 
Interestingly, the following loop also implements the multiply as a lookup table with 1 LUT per output bit without having to write out the case statement.

reg [5:0] x;
integer ii;
always@(posedge i_clk)
for (ii=0; ii<64; ii=ii+1)
if (x==ii) p <= 956*ii;


One thing you have to do in this field is to constantly look at the synthesized design to see if it came out how you'd planned. That is something high-level users aren't supposed to have to do. Imagine if a Javascript coder had to look at the compiled machine code to see if the compiler did what he wanted. I don't think they work on that level.
 
On 07/28/2016 03:48 AM, Kevin Neilson wrote:
Interestingly, the following loop also implements the multiply as a
lookup table with 1 LUT per output bit without having to write out
the case statement.

reg [5:0] x; integer ii; always@(posedge i_clk) for (ii=0; ii<64;
ii=ii+1) if (x==ii) p <= 956*ii;


One thing you have to do in this field is to constantly look at the
synthesized design to see if it came out how you'd planned. That is
something high-level users aren't supposed to have to do. Imagine if
a Javascript coder had to look at the compiled machine code to see if
the compiler did what he wanted. I don't think they work on that
level.
looks like they don't do optimization on that stuff...
have you tried yosys? I'm pretty sure It'll do that for you.
or perhaps bare ABC... if you have some other way to convert
to blif or AIG or sthg...

or perhaps you've missed some synth option...?
 
I imagine Synplify would do a better job, but I don't have it. It seems like Vivado just wants to do any multiplication in a DSP48, and if you don't do that, it has poor heuristics. If I were going to need this a lot, I'd just make my own function with some good heuristics (e.g., if it's a constant mult and the constant has a CSD representation with <=3 ones, then use a ternary adder...). I would've expected these rules to have been build into the synthesizer, though. It's pretty basic.
 
On Sat, 30 Jul 2016 12:11:28 -0700, Kevin Neilson wrote:

I imagine Synplify would do a better job, but I don't have it. It seems
like Vivado just wants to do any multiplication in a DSP48, and if you
don't do that, it has poor heuristics. If I were going to need this a
lot, I'd just make my own function with some good heuristics (e.g., if
it's a constant mult and the constant has a CSD representation with <=3
ones, then use a ternary adder...). I would've expected these rules to
have been build into the synthesizer, though. It's pretty basic.

Speaking as someone who does mostly software and circuit design, with
enough FPGA knowledge to be dangerous -- the optimizer doesn't seem to be
trying very hard.

--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
 
On Sat, 30 Jul 2016 12:11:28 -0700, Kevin Neilson wrote:

I imagine Synplify would do a better job, but I don't have it. It seems
like Vivado just wants to do any multiplication in a DSP48, and if you
don't do that, it has poor heuristics. If I were going to need this a
lot, I'd just make my own function with some good heuristics (e.g., if
it's a constant mult and the constant has a CSD representation with <=3
ones, then use a ternary adder...). I would've expected these rules to
have been build into the synthesizer, though. It's pretty basic.

Could it be that when you tell the thing to not use a DSP block it's
interpreting that to mean "just don't optimize this"?

--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
 

Welcome to EDABoard.com

Sponsor

Back
Top