modified booth or mux based (Pekmestzi) multiplier

T

transformer

Guest
I have to build a signed (unsigned) multiplier. IEEE single precision
and 32 bit. I know from the IEEE Trans. Comp. Jan. 1999 p. 15. paper
that the MUX based Pekmestzi multiplier is superior to the commonly
used modified booth multiplier.
However, from 1999 to now no other paper was released about the MUX
based one and no other comparison is available. All papers still use
the modified booth encoding.

Has anybody implemented the MUX based one in FPGA/ASIC? Is there
really an advantage over the modified booth one in terms of delay and
area?

Thanx.
 
On Jul 25, transformer wrote:

I have to build a signed (unsigned) multiplier. IEEE single precision
and 32 bit. I know from the IEEE Trans. Comp. Jan. 1999 p. 15. paper
that the MUX based Pekmestzi multiplier is superior to the commonly
used modified booth multiplier.
However, from 1999 to now no other paper was released about the MUX
based one and no other comparison is available. All papers still use
the modified booth encoding.

Has anybody implemented the MUX based one in FPGA/ASIC? Is there
really an advantage over the modified booth one in terms of delay and
area?
I remember I considered that article but in the end I went for another
solution for one multiplier we built. Honestly I don't remember the
details anymore: it was about 5 years ago!

The greek thing was a nice idea, but the logic depth wasn't less than a
booth encoded one: probably they account for over two gates for one XOR,
which is in most cases far too pessimistic - as I wrote, it was a lot of
time ago and I don't remember exactly.

There was another later article on the Journal of Solid-State Circuits
about a regular multiplier (full-adder based), mabe in 2000, try looking
for it: there might be comparisons.

Some years ago there was also one english startup who claimed they can
outperform current design by a large factor (two). I think it was 2001
and the company was named AutoPD. I'm quite sure they exaggerated their
claim, but that's usual marketing and it's a part of the game.

Nevertheless, their basic idea was to dynamically generate the compressor
blocks, and this is ok! The best previous method I know of was by
allocating skewed full-adders (Oklobdzjia) and this leaves a bit of place
for optimizations, so I'm pretty sure that there is a way to get a bit
more speed (maybe 10-15% on a 32b signed MAC, for sure not 200%, unless
you use *very strange* basic cells ;-))

I think after 2001 there was generally a lack of interest (and money -
good bye new economy ...) to squeeze the last picoseconds out of an adder
tree structure: this also makes sense, there is much more to win in other
parts of the datapath and careful layout and transistor sizing really take
a lot of time - no matter how good you are at doing "black magic" ;-)

The skewed full-adders appoach (TDM + Booth + some refinement) is a good
framework and leads to good results, probably the optimum can be achieved
by tuning the compressor blocks.

So, I wouldn't go for the MUX-based thing. Of course if you work on
"strange" stuff (bad cells, bad XOR implementation, strange process,
strange FPGA switch blocks ...) things might change a bit.

--
Lorenzo Di Gregorio
 

Welcome to EDABoard.com

Sponsor

Back
Top