Smart coding for big multiplexer

Massi · Apr 17, 2009

Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.

Jonathan Bromley · Apr 17, 2009

On Fri, 17 Apr 2009 03:29:22 -0700 (PDT), Massi <massi_srb@msn.com>
wrote:

Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?

Funny, we just answered a query from a customer about
almost exactly the same topic.

Do you really mean 1024 bytes WIDE? That's way scary -
an 8192-bit data path

I guess you mean that each
RAM block is in fact 1024 locations, each 8 bits wide.
That's normally known as a "depth" of 1024.

We've found that XST does a better job of optimizing
wide MUXes if you code them as an explicit AND-OR
structure. I don't know why this is, and I don't
know if it will always be true; you could imagine,
for example, that a synthesis tool might be able
to exploit carry chains to build the big OR gates.
Anyway, here's a sketch of the code:

-- useful declarations
subtype byte is std_logic_vector(7 downto 0);
type byte_array is array(natural range <&gt

of byte;

-- one result from each of your 128 RAM blocks
signal RAM_read_data: byte_array(0 to 127);

-- final output
signal mux_data: byte;

-- memory selector, chooses one from 128
signal which_RAM: integer range RAM_read_data'range;

...
process (RAM_read_data, which_RAM)
variable mux_result: byte;
begin
mux_result := (others => '0');
for i in RAM_read_data'range loop
if i = which_RAM then
mux_result := mux_result OR RAM_read_data(which_RAM);
end if;
end loop;
mux_data <= mux_result;
end process;

If this trick doesn't provide the improvement you need,
the next step is to consider pipelining. It won't reduce
the area, but will give you better Fmax.

I'm sure other folk will have more, better ideas.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Jonathan Bromley · Apr 17, 2009

On Fri, 17 Apr 2009 04:10:17 -0700 (PDT), Massi wrote:

I really appreciate your help, I'll immediatly try to integrate your
code in my design...thank you!

OOOOOH, don't do that just yet... sorry....

mux_result := mux_result OR RAM_read_data(which_RAM);

No, don't do that. My mistake. Instead,

mux_result := mux_result OR RAM_read_data(i);

The difference is that, when the loop is unrolled, you
are subscripting the array with a CONSTANT (i) rather
than with a variable. It can be important for optimization,
even though the two are functionally identical.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Massi · Apr 17, 2009

Do you really mean 1024 bytes WIDE? That's way scary -
an 8192-bit data path I guess you mean that each
RAM block is in fact 1024 locations, each 8 bits wide.
That's normally known as a "depth" of 1024.

Silly me....of course I meant depth, that's my bad english fault.

We've found that XST does a better job of optimizing
wide MUXes if you code them as an explicit AND-OR
structure. I don't know why this is, and I don't
know if it will always be true; you could imagine,
for example, that a synthesis tool might be able
to exploit carry chains to build the big OR gates.
Anyway, here's a sketch of the code:

-- useful declarations
subtype byte is std_logic_vector(7 downto 0);
type byte_array is array(natural range <&gt of byte;

-- one result from each of your 128 RAM blocks
signal RAM_read_data: byte_array(0 to 127);

-- final output
signal mux_data: byte;

-- memory selector, chooses one from 128
signal which_RAM: integer range RAM_read_data'range;

...
process (RAM_read_data, which_RAM)
variable mux_result: byte;
begin
mux_result := (others => '0');
for i in RAM_read_data'range loop
if i = which_RAM then
mux_result := mux_result OR RAM_read_data(which_RAM);
end if;
end loop;
mux_data <= mux_result;
end process;

If this trick doesn't provide the improvement you need,
the next step is to consider pipelining. It won't reduce
the area, but will give you better Fmax.

I'm sure other folk will have more, better ideas.

I really appreciate your help, I'll immediatly try to integrate your
code in my design...thank you!

Chris Maryan · Apr 17, 2009

The difference is that, when the loop is unrolled, you
are subscripting the array with a CONSTANT (i) rather
than with a variable. It can be important for optimization,
even though the two are functionally identical.

Yes, that's VERY important. I ran into something like this a while ago
with Synplify, where the constant version properly instantiated a mux
and the variable version implemented some sort of variable shift
widget that was about an order of magnitude larger.

Chris

Andy · Apr 17, 2009

On Apr 17, 5:29 am, Massi <massi_...@msn.com> wrote:

Which is the
smartest way to implement such a huge multiplexer?

The smartest way is to let the synthesis tool do as much of the work
as possible. Don't try to outsmart it unless you have to. If the
simplest, easiest to read, understand or write description will work
(i.e. meet timing, area, etc. requirements), then use that.

Borrowing Jonathan's definitions:

-- 128-to-1, byte wide multiplexer:
mux_data <= RAM_read_data(which_RAM);

If you don't know your requirements, then you won't know whether the
implementation you used is good enough, no matter how fast/small/cool/
elegant it is.

Andy

Mike Treseler · Apr 18, 2009

Massi wrote:

Hi everyone, I'm working on a Xilinx Virtex 5 FPGA with ISE 10.1. In
my design I have to instantiate 128 ram blocks, each one of them is
1024 bytes wide. The outuput of my device depends on only one ram
block at a time, therefore I have to multiplex them. Which is the
smartest way to implement such a huge multiplexer?
Thanks a lot for you help.

I agree with Andy.
I don't solve a synthesis problem until I have one.
The cleanest mux description is an array selection.
Give ISE a crack at it and have a look at
the RTL viewer and static timing.

I also agree with Jonathan.
Declare register/port dimensions first.
VHDL gives us an unfair advantage here.

-- Mike

Dal · Apr 20, 2009

If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones? This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.

Darrin

Andy · Apr 20, 2009

On Apr 20, 1:30 am, Dal <darrin.n...@gmail.com> wrote:

If you only need one RAM at a time could you merge the rams into a
smaller number of larger ones? This would require that you only write
to one ram at a time too.

Also, I have used tbufs in the past to do this, however it appears
that V5's don't have these.

Darrin

Tri-state bus code is translated into equivalent multiplexer type
circuits. The tristate enables are assumed to be mutually exclusive
for the multiplexor implementation. This actually comes in handy in
some applications where it is difficult to convince the synthesis tool
that separate inputs are mutually exclusive.

Andy

Smart coding for big multiplexer

Massi

Guest

Jonathan Bromley

Guest

Jonathan Bromley

Guest

Massi

Guest

Chris Maryan

Guest

Andy

Guest

Mike Treseler

Guest

Dal

Guest

Andy

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Smart coding for big multiplexer

Massi

Guest

Jonathan Bromley

Guest

Jonathan Bromley

Guest

Massi

Guest

Chris Maryan

Guest

Andy

Guest

Mike Treseler

Guest

Dal

Guest

Andy

Guest

Log in

Welcome to EDABoard.com

Sponsor