Division by a constant

Rob Gaddi · Apr 9, 2015

So I just had a thought. Most synthesis tools (in VHDL, and I assume in
Verilog) will allow you to use the division operator to perform
truncating division by a constant in synthesizable code, so long as that
constant is a power of 2.

That seems like a reasonable restriction; that you can only divide when
it's just a right shift, right up until you think a bit longer. Because
I do synthesizable division by a constant all the time, actually, as
multiplication by the reciprocal. So I wind up writing things like

y := x * (2**17 / 3) / 2**17.

It obscures the logic a bit, but works. But I was thinking, and not only
does it obscure the logic, but it forces assumptions into my code about
what the underlying multiplier block looks like. Why 2**17? Because I'm
assuming a 18 bit signed multiplier, because that's what happens to be on
some architecture (Altera Cyclone4 if I remember right).

It seems trivial for the synthesizer to do that transformation, division
by a constant => multiplication by the reciprocal, in a way that is
optimized for the underlying hardware. Any non-braindamaged C compiler
will do it without being asked. And maybe the synth tools do, it's just
been forever since I've actually checked.

Has anyone looked at this in a while? Are any of the synth tools smart
enough to handle this on their own these days?

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.

Thomas Stanka · Apr 10, 2015

Am Donnerstag, 9. April 2015 18:58:41 UTC+2 schrieb Rob Gaddi:

That seems like a reasonable restriction; that you can only divide when
it's just a right shift, right up until you think a bit longer. Because
I do synthesizable division by a constant all the time, actually, as
multiplication by the reciprocal. So I wind up writing things like

y := x * (2**17 / 3) / 2**17.

There are a few nontrivial problems when using division in digital logic that also apply for constant operation.

Multiplicate with the reziprocal value is a valid function for real number (mathematic term), but not with fixed point (unsigned with shifted decimal).. If a number is 2^^n, its reciprog is well defined for fix point. For 3 you will find no exact reciproc with fixpoint notation, regardless of the number of digits you spend after fixpoint.

If you use floating point your chances of correct calculation result are far better, but you are not safe.
The term "(a-b)*c" is not always equivalent to "a*c - b*c" even in floating point arithmetic with a and b beeing significant different in size.

best regards

Thomas

kaz · Apr 10, 2015

if you multiply by fraction e.g y = x*0.333... then you are just leadin
the compiler to avoid division.

I don't use fractions but new libraries support them.

Ka
--------------------------------------
Posted through http://www.FPGARelated.com

Allan Herriman · Apr 10, 2015

On Thu, 09 Apr 2015 16:57:43 +0000, Rob Gaddi wrote:

So I just had a thought. Most synthesis tools (in VHDL, and I assume in
Verilog) will allow you to use the division operator to perform
truncating division by a constant in synthesizable code, so long as that
constant is a power of 2.

That seems like a reasonable restriction; that you can only divide when
it's just a right shift, right up until you think a bit longer. Because
I do synthesizable division by a constant all the time, actually, as
multiplication by the reciprocal. So I wind up writing things like

y := x * (2**17 / 3) / 2**17.

It obscures the logic a bit, but works. But I was thinking, and not
only does it obscure the logic, but it forces assumptions into my code
about what the underlying multiplier block looks like. Why 2**17?
Because I'm assuming a 18 bit signed multiplier, because that's what
happens to be on some architecture (Altera Cyclone4 if I remember
right).

It seems trivial for the synthesizer to do that transformation, division
by a constant => multiplication by the reciprocal, in a way that is
optimized for the underlying hardware. Any non-braindamaged C compiler
will do it without being asked. And maybe the synth tools do, it's just
been forever since I've actually checked.

Has anyone looked at this in a while? Are any of the synth tools smart
enough to handle this on their own these days?

Hi Rob,

I'm still not sure I'd trust a synthesiser to handle that sort of
thing portably.

I don't think I've ever actually used a multiplier or divider in
a synthesisable design. There always seems to be some way to avoid
them, even for DSP, usually by employing smarter design at the system
level and careful selection of scaling factors, filter coefficients,
etc.
(I don't do the sort of filtering that needs
extremely precise coefficients. YMMV.)

When trying to avoid the use of the hard multipliers,
I would consider employing tricks like Booth recoding:
http://en.wikipedia.org/wiki/Booth%27s_multiplication_algorithm#How_it_works
which can sometimes help with a fixed multiplicand that has
a long string of 1s.

I would also look for repeating patterns in a fixed multiplicand.
Repeating patterns often arise when taking the reciprocal of a constant,
e.g. 1/5 = (binary) 0.00110011001100110011001100 etc.
This is equal to 11 * 10001 * 100000001 * ... (shifted right)
and a small number of adders can produce this result to any
desired precision.

Both techniques ought to be amenable to automation in a
synthesiser. But the synth tool I use doesn't even
support VHDL 2008 yet (thanks Xilinx!) so I won't hold
my breath waiting for comprehensive tool support for
multiplication other than the basic/obvious use of the
built-in hard blocks.

Regards,
Allan

Rob Gaddi · Apr 10, 2015

On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote:

Multiplicate with the reziprocal value is a valid function for real
number (mathematic term), but not with fixed point (unsigned with
shifted decimal). If a number is 2^^n, its reciprog is well defined for
fix point. For 3 you will find no exact reciproc with fixpoint notation,
regardless of the number of digits you spend after fixpoint.

Yes and no. I believe one can prove (for values of one excluding me)
that for a bounded integer numerator, you can always define a reciprocal
multiply that will give the exact same result as "floor division" for all
numerators in those bounds. Those differences you're talking about due
to integers being "unreal" numbers are all pushed down into the
remainder. And the same should therefore hold for fixed_point, since
you're always looking for the quotient to be in some finite format past
which you don't care about the errors.

I did a quick Python script just to test with integers. It tests the
dumb way, through complete exhaustion of the input set, but my arbitrary
poking about sure implies that a) you can always find an answer and b)
that answer will require no more than 1 bit more than the numerator.

#!/usr/bin/env python3

"""
Test the theory that, given a bounded numerator, there is a reciprocal
multiply
that will always give the same result as floor division.
"""

import numpy as np

nbits = 22
divide_by = 65537

# Proof through exhaustion, create all possible numerators
numerators = np.arange(2**nbits, dtype=int)
quotients = numerators // divide_by

class FoundAnswer(Exception): pass

try:
expected_dbits = nbits + divide_by.bit_length()
for dbits in range(expected_dbits-1, expected_dbits+2):
basic_recip = (2**dbits) // divide_by
for recip in range(basic_recip, basic_recip + 2):
approx_quotients = (numerators * recip) >> dbits
if np.all(approx_quotients == quotients):
print(
'For all {nbits} bit numerators N//{div} == N*{recip}>>
{dbits}'.format(
nbits = nbits, div = divide_by, recip=recip, dbits=dbits
))
print('{recip} requires {b} bits.'.format(recip=recip,
b=recip.bit_length()))
raise FoundAnswer()
except FoundAnswer:
pass

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.

glen herrmannsfeldt · Apr 17, 2015

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote:

On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote:

Multiplicate with the reziprocal value is a valid function for real
number (mathematic term), but not with fixed point (unsigned with
shifted decimal).

(snip)

Yes and no. I believe one can prove (for values of one excluding me)
that for a bounded integer numerator, you can always define a reciprocal
multiply that will give the exact same result as "floor division" for all
numerators in those bounds. Those differences you're talking about due
to integers being "unreal" numbers are all pushed down into the
remainder. And the same should therefore hold for fixed_point, since
you're always looking for the quotient to be in some finite format past
which you don't care about the errors.

With the multiply instruction on many processors, that generates
a double length signed product, I believe that for many constants,
maybe half of them, there is a multiplier that will generate the
appropriate truncated quotient in the high half of the product.

But in the case the OP asked, it isn't so obvious that it
should do that.

Another choice would be a primitive that would generate the
appropriate multiplier.

Often when you want a divider, you want it pipelined, which
is unlikely to be synthesized.

-- glen

Rob Gaddi · Apr 17, 2015

On Thu, 16 Apr 2015 23:01:16 +0000, glen herrmannsfeldt wrote:

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote:
On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote:

Multiplicate with the reziprocal value is a valid function for real
number (mathematic term), but not with fixed point (unsigned with
shifted decimal).

(snip)
Yes and no. I believe one can prove (for values of one excluding me)
that for a bounded integer numerator, you can always define a
reciprocal multiply that will give the exact same result as "floor
division" for all numerators in those bounds. Those differences you're
talking about due to integers being "unreal" numbers are all pushed
down into the remainder. And the same should therefore hold for
fixed_point, since you're always looking for the quotient to be in some
finite format past which you don't care about the errors.

With the multiply instruction on many processors, that generates a
double length signed product, I believe that for many constants,
maybe half of them, there is a multiplier that will generate the
appropriate truncated quotient in the high half of the product.

Right, and I've never seen a multiplier block in an FPGA that doesn't do
the same. For a while they were 18x18=>36, then I started hitting
18*25=>43, but regardless, the hard multiplier blocks always generate
P'length = A'length + B'length.

But in the case the OP asked, it isn't so obvious that it should do
that.

Another choice would be a primitive that would generate the appropriate
multiplier.

Ugh, but then you'd have to instantiate it as a separate block and wire
it in. That's even uglier than having to put the bit-shifting logic in
manually.

Often when you want a divider, you want it pipelined, which is unlikely
to be synthesized.

Not really. Often when I want a divider I have to pipeline it because
the division algorithm is inherently serial. But I don't _want_ it to be
pipelined, that's just the only choice I've got for a true X/Y divide.

But for X/K with constant K, it can (in every case I've seen) be
implemented with a multiplier block, or simply by wire if K is a power of
2. That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies
spanning multiple blocks if I ask for a stupidly large K. But the same
can be said for X*K, and the tool lets me request that just fine, and if
I ask for a stupid K I get a mess of logic that either a) only meets
timing with a slow clock or b) requires me to make a few stages of
register pushback available.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.

Rob Gaddi · Apr 17, 2015

On Thu, 16 Apr 2015 20:00:45 -0400, rickman wrote:

On 4/16/2015 7:23 PM, Rob Gaddi wrote:

But for X/K with constant K, it can (in every case I've seen) be
implemented with a multiplier block, or simply by wire if K is a power
of 2. That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies
spanning multiple blocks if I ask for a stupidly large K. But the same
can be said for X*K, and the tool lets me request that just fine, and
if I ask for a stupid K I get a mess of logic that either a) only meets
timing with a slow clock or b) requires me to make a few stages of
register pushback available.

I don't think a large K will create any problems that require more
multiplier blocks. The resolution required only depends on... the
resolution required. If you are working with truncated integers it
doesn't matter if the divisor is large. That just means you get smaller
results, not more math to do. Or are you thinking you can save logic by
using small values of K which can reduce the logic required? If using a
block multiplier you can't use less than one... or can you? Hmmm...

There are probably some trivial cases where the divide by N reduces to a
multiply by something silly like 3 followed by a bit-shift that might get
implemented on fabric, but I'd tend to assume that any time I did a
divide, like anytime I did a multiply, I'm likely to commit a multiplier
to it. If I get lucky, all the better.

If it takes a lot of bits to accurately represent K, it takes a lot of
bits to accurately represent 1/K, subject to many of the same caveats
about factoring out powers of 2.

Likewise, if I tell the tools that I want to use a 32-bit numerator,
that'll take cross-multiplies too. But all that already gets handled
correctly in the multiplication case.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.

glen herrmannsfeldt · Apr 17, 2015

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote:

(snip, I wrote)

With the multiply instruction on many processors, that generates a
double length signed product, I believe that for many constants,
maybe half of them, there is a multiplier that will generate the
appropriate truncated quotient in the high half of the product.

Right, and I've never seen a multiplier block in an FPGA that
doesn't do the same. For a while they were 18x18=>36, then
I started hitting 18*25=>43, but regardless, the hard
multiplier blocks always generate
P'length = A'length + B'length.

But in the case the OP asked, it isn't so obvious that it should do
that.

Another choice would be a primitive that would generate the appropriate
multiplier.

Ugh, but then you'd have to instantiate it as a separate block and wire
it in. That's even uglier than having to put the bit-shifting logic in
manually.

Yes it is ugh, but you know that you are asking for one.

Often when you want a divider, you want it pipelined,
which is unlikely to be synthesized.

Not really. Often when I want a divider I have to pipeline it because
the division algorithm is inherently serial. But I don't _want_ it to be
pipelined, that's just the only choice I've got for a true X/Y divide.

Well, I usually go the FPGA route when I want something done fast,
which means pipelined. Maybe not everyone does that.

But for X/K with constant K, it can (in every case I've seen) be
implemented with a multiplier block, or simply by wire if K is
a power of 2. That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies
spanning multiple blocks if I ask for a stupidly large K. But the same
can be said for X*K, and the tool lets me request that just fine, and if
I ask for a stupid K I get a mess of logic that either a) only meets
timing with a slow clock or b) requires me to make a few stages of
register pushback available.

For software, you usually have (N)*(N)=(2N) and (2N)/(N)=(N)

In the FPGA case, even though the hardware is 18 bits, you can
choose any number of bits for your actual values.

I am pretty sure that if you have one more bit in the constant
than you need in the quotient, that it is enough. I am not sure
that it always is when you have the same number of bits.

That is, I am not sure that you can generate a 32 bit signed
quotient as the high half of a 32 bit multiply for all possible 32
bit signed divisors.

-- glen

rickman · Apr 17, 2015

On 4/16/2015 7:23 PM, Rob Gaddi wrote:

On Thu, 16 Apr 2015 23:01:16 +0000, glen herrmannsfeldt wrote:

Rob Gaddi <rgaddi@technologyhighland.invalid> wrote:
On Fri, 10 Apr 2015 04:26:26 -0700, Thomas Stanka wrote:

Multiplicate with the reziprocal value is a valid function for real
number (mathematic term), but not with fixed point (unsigned with
shifted decimal).

(snip)
Yes and no. I believe one can prove (for values of one excluding me)
that for a bounded integer numerator, you can always define a
reciprocal multiply that will give the exact same result as "floor
division" for all numerators in those bounds. Those differences you're
talking about due to integers being "unreal" numbers are all pushed
down into the remainder. And the same should therefore hold for
fixed_point, since you're always looking for the quotient to be in some
finite format past which you don't care about the errors.

With the multiply instruction on many processors, that generates a
double length signed product, I believe that for many constants,
maybe half of them, there is a multiplier that will generate the
appropriate truncated quotient in the high half of the product.

Right, and I've never seen a multiplier block in an FPGA that doesn't do
the same. For a while they were 18x18=>36, then I started hitting
18*25=>43, but regardless, the hard multiplier blocks always generate
P'length = A'length + B'length.

But in the case the OP asked, it isn't so obvious that it should do
that.

Another choice would be a primitive that would generate the appropriate
multiplier.

Ugh, but then you'd have to instantiate it as a separate block and wire
it in. That's even uglier than having to put the bit-shifting logic in
manually.

Often when you want a divider, you want it pipelined, which is unlikely
to be synthesized.

Not really. Often when I want a divider I have to pipeline it because
the division algorithm is inherently serial. But I don't _want_ it to be
pipelined, that's just the only choice I've got for a true X/Y divide.

Maybe I am missing something, but it doesn't need to be pipelined. Just
take out the registers and it's no longer pipelined.

But for X/K with constant K, it can (in every case I've seen) be
implemented with a multiplier block, or simply by wire if K is a power of
2. That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies
spanning multiple blocks if I ask for a stupidly large K. But the same
can be said for X*K, and the tool lets me request that just fine, and if
I ask for a stupid K I get a mess of logic that either a) only meets
timing with a slow clock or b) requires me to make a few stages of
register pushback available.

I don't think a large K will create any problems that require more
multiplier blocks. The resolution required only depends on... the
resolution required. If you are working with truncated integers it
doesn't matter if the divisor is large. That just means you get smaller
results, not more math to do. Or are you thinking you can save logic by
using small values of K which can reduce the logic required? If using a
block multiplier you can't use less than one... or can you? Hmmm...

--

Rick

Nikolaos Kavvadias · Apr 17, 2015

Hi Rob and Glen

have you used kdiv, my constant division routine generator?

It produces low-level ("assembly") and C implementations for constant division.

http://github.com/nkkav/kdiv

Many people are happy with it; it is based on Warren's Hackers' Delight.

Best regards,
Nikolaos Kavvadias
http://www.nkavvadias.com

(snip, I wrote)
With the multiply instruction on many processors, that generates a
double length signed product, I believe that for many constants,
maybe half of them, there is a multiplier that will generate the
appropriate truncated quotient in the high half of the product.

Right, and I've never seen a multiplier block in an FPGA that
doesn't do the same. For a while they were 18x18=>36, then
I started hitting 18*25=>43, but regardless, the hard
multiplier blocks always generate
P'length = A'length + B'length.

But in the case the OP asked, it isn't so obvious that it should do
that.

Another choice would be a primitive that would generate the appropriate
multiplier.

Ugh, but then you'd have to instantiate it as a separate block and wire
it in. That's even uglier than having to put the bit-shifting logic in
manually.

Yes it is ugh, but you know that you are asking for one.

Often when you want a divider, you want it pipelined,
which is unlikely to be synthesized.

Not really. Often when I want a divider I have to pipeline it because
the division algorithm is inherently serial. But I don't _want_ it to be
pipelined, that's just the only choice I've got for a true X/Y divide.

Well, I usually go the FPGA route when I want something done fast,
which means pipelined. Maybe not everyone does that.

But for X/K with constant K, it can (in every case I've seen) be
implemented with a multiplier block, or simply by wire if K is
a power of 2. That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies
spanning multiple blocks if I ask for a stupidly large K. But the same
can be said for X*K, and the tool lets me request that just fine, and if
I ask for a stupid K I get a mess of logic that either a) only meets
timing with a slow clock or b) requires me to make a few stages of
register pushback available.

For software, you usually have (N)*(N)=(2N) and (2N)/(N)=(N)

In the FPGA case, even though the hardware is 18 bits, you can
choose any number of bits for your actual values.

I am pretty sure that if you have one more bit in the constant
than you need in the quotient, that it is enough. I am not sure
that it always is when you have the same number of bits.

That is, I am not sure that you can generate a 32 bit signed
quotient as the high half of a 32 bit multiply for all possible 32
bit signed divisors.

-- glen

Division by a constant

Rob Gaddi

Guest

Thomas Stanka

Guest

kaz

Guest

Allan Herriman

Guest

Rob Gaddi

Guest

glen herrmannsfeldt

Guest

Rob Gaddi

Guest

Rob Gaddi

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Nikolaos Kavvadias

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Division by a constant

Rob Gaddi

Guest

Thomas Stanka

Guest

kaz

Guest

Allan Herriman

Guest

Rob Gaddi

Guest

glen herrmannsfeldt

Guest

Rob Gaddi

Guest

Rob Gaddi

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Nikolaos Kavvadias

Guest

Log in

Welcome to EDABoard.com

Sponsor