The best way to implement this non-power-of-two modulo-like

M

Murph

Guest
Hello!
I've got two a subtypes that represents a piece on a gameboard made
from 20 rows.

subtype row is interger range 0 to 19;
subtype boardSize is integer range 0 to 199;

thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.

Is there any efficient way to get the related "row" from a "boardSize"?
I know I can't do non power-of-two modulo operations. I need to do this
a couple dozen times within my program.

I'd love to just write a function getRow that I could call, but I don't
want it to synthesize into something impossibly complex (Latest Xilinx
Webpack --> Spartan 3).

If i wrote the function with < operators, for example, would it have to
synthesize twenty sets of comparators for each time I called it, or
would it reuse them? (example function below)

Or would it be a better idea to create a temporary variable and then
use repeated subtractions in a for loop?

Or just to write a huge if statement that would choose the right return
values based on all 200 possible inputs?

I don't really know which method would be the most "managable" when
synthesized.

Thanks for your time,
--Murph


function getRow(input : boardSize) return row is
variable result : row := 0;
begin
if (input < 10) then
result := 0;
elsif (input <20) then
result := 1;
-- ...
else
result := 19;
end if;
return result;
end function;
 
On 4 Dec 2006 15:06:50 -0800, "Murph" <mattfinn@gmail.com> wrote:

Hello!
I've got two a subtypes that represents a piece on a gameboard made
from 20 rows.

subtype row is interger range 0 to 19;
subtype boardSize is integer range 0 to 199;

thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.

Is there any efficient way to get the related "row" from a "boardSize"?
I know I can't do non power-of-two modulo operations. I need to do this
a couple dozen times within my program.
Is there anything stopping you from implementing the board with rows 32
bits wide but only using the first 20 locations?

That would allow the obvious trivial implementation of "mod" and
simplify addressing considerably.

If only the first 20 locations are accessed, the synthesis tools should
optimise away resources related to the remaining 12. If "board" storage
is in block RAM, the excess capacity is probably going to waste anyway.
Whether it uses less resources than the "straight" implementation can
probably only be determined by experiment.

Alternatively, remember that division can also be accomplished by
multiplication by the reciprocal, or an approximation to the reciprocal.
Given a limited range of inputs, the question is, how coarse an
approximation can you get away with? (Or in FPGA, are the multiplier
blocks good enough?)

Given only 200 inputs, you can simulate this, even in a spreadsheet if
you have to, and verify correct output (or otherwise) for any
approximation you wish to try.

- Brian
 
"Brian Drummond" <brian_drummond@btconnect.com> wrote in message
news:r5pan21inj4natafao96pljscupfqapa4k@4ax.com...
On 4 Dec 2006 15:06:50 -0800, "Murph" <mattfinn@gmail.com> wrote:

Hello!
I've got two a subtypes that represents a piece on a gameboard made
from 20 rows.

subtype row is interger range 0 to 19;
subtype boardSize is integer range 0 to 199;

thus, row 0 is boardSize from 0-9, row 1 is boardSize from 10-19, etc.

Is there any efficient way to get the related "row" from a "boardSize"?
I know I can't do non power-of-two modulo operations. I need to do this
a couple dozen times within my program.

Is there anything stopping you from implementing the board with rows 32
bits wide but only using the first 20 locations?

That would allow the obvious trivial implementation of "mod" and
simplify addressing considerably.
This was my first thought too. Looks like your gameboard is 10x20 for a
total of 200 squares. You can store that in 8 bits, but then you run into
the problem of having to do a modulo-10 or modulo-20 operation.

If instead you use a 4-bit field to hold the column number (0-9) and 5 bits
for the row (0-19), you only have one extra bit to store but the row/column
addressing is simplified. If you subsequently need the "boardsize" value
(0-199) you can simply do 20*C + R or 10*R + C, depending on whether you're
a row-major or column-major person. Those constant-coefficient multiplies
are *really* cheap (20*C == (C + C<<2) << 2) - in fact just a single
addition!

Of course it *can* be done the other way, but it's almost certainly going to
be a bigger and slower circuit. If you need a single-cycle implementation,
my initial idea would be just to work out the equations for each binary
digit of the result. The MSBs are much simpler than the LSBs. For example:

result(4) = (input > 160)
result(3) = (input > 80) && !(input > 160);
result(2) = ((input > 40) && !(input > 80)) || ((input > 120) && !(input >
160));
result(1) = ((input > 20) && !(input > 40)) || ...

Continue that pattern and you've got yourself a circuit. Depending on your
synthesis tool, the original description you gave with an if...elsif...elsif
might produce similar result. Try it and see!

Good luck,

-Ben-
 
Thanks for all the help.

Before I go through and switch to a row/column storage method (which
sounds like a good idea, but would result in some more work for other
things), I was considering just implementing the whole function into a
look-up-table that could be stored in a bram.

I tried to create a constant array that'd represent this look-up-table:

constant getRow : rowArray (boardSize) := (
0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,
....
19,19,19,19,19,19,19,19,19,19);

However, I get this info message when it synthesizes:

INFO:Xst:2504 - HDL ADVISOR - Initial contents of this register
prevents it from being combined with the ROM for implementation as
read-only block RAM.
INFO:Xst:1651 - Address input of ROM <Mrom__mux0026> is tied to
register <pieceLocations_0>.

Is there some way I can modify how I am declaring/using the "initial
contents" of the currentPiece register so that it will be implemented
as a bram? or is there some way to really know if that's talking about
the same ROM as I just created? :)

--Murph
 
Hi Murph,

"Murph" <mattfinn@gmail.com> wrote in message
news:1165363905.169965.315670@79g2000cws.googlegroups.com...
Thanks for all the help.

Before I go through and switch to a row/column storage method (which
sounds like a good idea, but would result in some more work for other
things), I was considering just implementing the whole function into a
look-up-table that could be stored in a bram.

I tried to create a constant array that'd represent this look-up-table:

constant getRow : rowArray (boardSize) := (
0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,
...
19,19,19,19,19,19,19,19,19,19);

However, I get this info message when it synthesizes:

INFO:Xst:2504 - HDL ADVISOR - Initial contents of this register
prevents it from being combined with the ROM for implementation as
read-only block RAM.
INFO:Xst:1651 - Address input of ROM <Mrom__mux0026> is tied to
register <pieceLocations_0>.

Is there some way I can modify how I am declaring/using the "initial
contents" of the currentPiece register so that it will be implemented
as a bram? or is there some way to really know if that's talking about
the same ROM as I just created? :)
I think it probably *is* talking about that same ROM. As I recall, the
address registers embedded in the BRAM cannot be initialized; well, they can
be initialized with all-zeros, but that's it.

Without seeing the rest of the code it's hard to say, but I expect somewhere
you have a line in a clocked process more or less like

somethingOrOther <= getRow(currentPiece);

If the default value of currentPiece is non-zero, then the rowArray ROM
cannot be packed into a read-only BRAM.

I actually think a distributed ROM implementation is better than a BRAM for
this table. It only requires 5x200 = 1kbit, whereas you are targeting
Spartan-3, which has 18kbit BRAM resources. So rather than waste a BRAM, you
could just leave it to be implemented in the fabric. This also allows XST to
make optimizations based on the (simple) pattern of the contents.

So I just ran this table through XST targeting a LUT ROM, and it comes out
as just 10 slices.... that's probably going to be hard to beat.

Cheers,

-Ben-
 

Welcome to EDABoard.com

Sponsor

Back
Top