Xilinx dedicated multiers vs multipliers in slice fabric

Ken · Oct 8, 2003

Hello folks,

Found a load of archived posts on this topic.

I was wondering what the current thoughts are on what is ultimately faster:

(a) dedicated 18x18 multipliers
(b) 18x18 pipelined multiplier in slice logic

Also, are there any advantages/disadvantages you can think of between the
two options other than the following:

(1)
If all dedicated multipliers are used, you have no choice but to use slice
logic

(2)
Power considerations?

Thanks for your time,

Ken

Nicholas C. Weaver · Oct 8, 2003

In article <bm1b5h$352$1@dennis.cc.strath.ac.uk>,
Ken <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote:

Hello folks,

Found a load of archived posts on this topic.

I was wondering what the current thoughts are on what is ultimately faster:

(a) dedicated 18x18 multipliers
(b) 18x18 pipelined multiplier in slice logic

Also, are there any advantages/disadvantages you can think of between the
two options other than the following:

(1)
If all dedicated multipliers are used, you have no choice but to use slice
logic

(2)
Power considerations?

3: Use of the BlockRAMs. Since the BlockRAMs and multipliers share
interconnect, there are limits on when they can be used
simultaneously.

4: Pipelined, throughput-optimized performance. The fixed multipliers
are unpipelined or single-stage, a LUT multiplier can be much more
finely pipelined (higher thorughput).
--
Nicholas C. Weaver nweaver@cs.berkeley.edu

Ken · Oct 9, 2003

<snip>

3: Use of the BlockRAMs. Since the BlockRAMs and multipliers share
interconnect, there are limits on when they can be used
simultaneously.

4: Pipelined, throughput-optimized performance. The fixed multipliers
are unpipelined or single-stage, a LUT multiplier can be much more
finely pipelined (higher thorughput).

Ok - it is my understanding that there are registers just before and just
after the dedicated multipliers that can be used to speed them up.

But what you are saying is that the LUT multipliers will have a higher max
MHz when both solutions are as pipelined as they can be?

Thanks for your time,

Ken

Ray Andraka · Oct 9, 2003

Bzzzt. The 'pipeline' register in the multiplier is in the middle. the setup
and clock to Q of the 'pipelined' multiplier is substantial. In order to get
the data sheet max performance, you need to add CLB registers to the
multiplier I/O AND you need to place them in the slices where there are direct
connects to the multiplier. If you do this, and as long as you don't have
'stepping 0' parts, the embedded multipliers can be clocked faster than an 18
bit carry chain. The advantage of in the fabric multipliers is that you can
make them whatever size you need, and put them where they are convenient
rather than being restricted to the mult/bram columns. In the fabric, you can
also take advantage of cases where you have multiple clocks per sample to
reduce the size of the multiplier. I look at the FPGA sort of like a bin of
different Legos (tm). You use what you have in the box to the best advantage
for your particular project. Sometimes there are more multipliers than you
need, so you can use them for things like shifters or muxes if you get real
cute about it. Other times, there are not enough, so you pick and choose what
goes where.

Ken wrote:

snip

3: Use of the BlockRAMs. Since the BlockRAMs and multipliers share
interconnect, there are limits on when they can be used
simultaneously.

4: Pipelined, throughput-optimized performance. The fixed multipliers
are unpipelined or single-stage, a LUT multiplier can be much more
finely pipelined (higher thorughput).

Ok - it is my understanding that there are registers just before and just
after the dedicated multipliers that can be used to speed them up.

But what you are saying is that the LUT multipliers will have a higher max
MHz when both solutions are as pipelined as they can be?

Thanks for your time,

Ken

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Ken · Oct 10, 2003

Great answer Ray - thanks very much.

Ken

"Ray Andraka" <ray@andraka.com> wrote in message
news:3F85D4D1.EFC91BDC@andraka.com...

Bzzzt. The 'pipeline' register in the multiplier is in the middle. the
setup
and clock to Q of the 'pipelined' multiplier is substantial. In order to
get
the data sheet max performance, you need to add CLB registers to the
multiplier I/O AND you need to place them in the slices where there are
direct
connects to the multiplier. If you do this, and as long as you don't have
'stepping 0' parts, the embedded multipliers can be clocked faster than an
18
bit carry chain. The advantage of in the fabric multipliers is that you
can
make them whatever size you need, and put them where they are convenient
rather than being restricted to the mult/bram columns. In the fabric, you
can
also take advantage of cases where you have multiple clocks per sample to
reduce the size of the multiplier. I look at the FPGA sort of like a bin
of
different Legos (tm). You use what you have in the box to the best
advantage
for your particular project. Sometimes there are more multipliers than
you
need, so you can use them for things like shifters or muxes if you get
real
cute about it. Other times, there are not enough, so you pick and choose
what
goes where.

Ken wrote:

snip

3: Use of the BlockRAMs. Since the BlockRAMs and multipliers share
interconnect, there are limits on when they can be used
simultaneously.

4: Pipelined, throughput-optimized performance. The fixed multipliers
are unpipelined or single-stage, a LUT multiplier can be much more
finely pipelined (higher thorughput).

Ok - it is my understanding that there are registers just before and
just
after the dedicated multipliers that can be used to speed them up.

But what you are saying is that the LUT multipliers will have a higher
max
MHz when both solutions are as pipelined as they can be?

Thanks for your time,

Ken

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Xilinx dedicated multiers vs multipliers in slice fabric

Ken

Guest

Nicholas C. Weaver

Guest

Ken

Guest

Ray Andraka

Guest

Ken

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Xilinx dedicated multiers vs multipliers in slice fabric

Ken

Guest

Nicholas C. Weaver

Guest

Ken

Guest

Ray Andraka

Guest

Ken

Guest

Log in

Welcome to EDABoard.com

Sponsor