Synthesis and mapping of ALU

C

chthon

Guest
Dear all,

I have the following ALU code as part of a data path:

-- purpose: simple ALU
-- type : combinational
-- inputs : a, b, op_sel
-- outputs: y
alu : PROCESS (a, b, op_sel)
BEGIN -- PROCESS alu
CASE op_sel IS
WHEN "0000" => -- increment
y <= a + 1;
WHEN "0001" => -- decrement
y <= a - 1;
WHEN "0010" => -- test for zero
y <= a;
WHEN "0111" => -- addition
y <= a + b;
WHEN "1000" => -- subtract, compare
y <= a - b;
WHEN "1010" => -- logical and
y <= a AND b;
WHEN "1011" => -- logical or
y <= a OR b;
WHEN "1100" => -- logical xor
y <= a XOR b;
WHEN "1101" => -- logical not
y <= NOT a;
WHEN "1110" => -- shift left logical
y <= a SLL 1;
WHEN "1111" => -- shift right logical
y <= a SRL 1;
WHEN OTHERS =>
y <= a;
END CASE;
END PROCESS alu;

Is it normal that this is synthesized as 14 separate functions, which are then multiplexed through a 14:1 multiplexer onto the y bus? I am just trying to find out if this is the fastest implementation that can be had, or if it is possible to get a faster implementation by mapping more functions into a single slice, so that the multiplexer becomes smaller.

As an example of something similar, the multiplexers generated by default by ISE tend to cascade, while building them from LUT6 gives the possibility to build wide but not deep multiplexers (XAPP522).

Regards,

Jurgen
 
chthon wrote:

Dear all,

I have the following ALU code as part of a data path:

-- purpose: simple ALU
-- type : combinational
-- inputs : a, b, op_sel
-- outputs: y
alu : PROCESS (a, b, op_sel)
BEGIN -- PROCESS alu
CASE op_sel IS
WHEN "0000" => -- increment
y <= a + 1;
WHEN "0001" => -- decrement
y <= a - 1;
WHEN "0010" => -- test for zero
y <= a;
WHEN "0111" => -- addition
y <= a + b;
WHEN "1000" => -- subtract, compare
y <= a - b;
WHEN "1010" => -- logical and
y <= a AND b;
WHEN "1011" => -- logical or
y <= a OR b;
WHEN "1100" => -- logical xor
y <= a XOR b;
WHEN "1101" => -- logical not
y <= NOT a;
WHEN "1110" => -- shift left logical
y <= a SLL 1;
WHEN "1111" => -- shift right logical
y <= a SRL 1;
WHEN OTHERS =
y <= a;
END CASE;
END PROCESS alu;

Is it normal that this is synthesized as 14 separate functions, which are
then multiplexed through a 14:1 multiplexer onto the y bus? I am just
trying to find out if this is the fastest implementation that can be had,
or if it is possible to get a faster implementation by mapping more
functions into a single slice, so that the multiplexer becomes smaller.
This is a functional description, to get the correct logical output.
it goes into the synthesis program and gets massively reduced. For
instance, the separate increment/decrement logic will get folded into
one adder logic with a few extra gates to perform the two functions.
So, the resulting logic actually implemented in an FPGA will not resemble
what you describe at all, but will be pretty well optimized for the
specific chip architecture.

Jon
 
chthon wrote:
Dear all,

I have the following ALU code as part of a data path:

-- purpose: simple ALU
-- type : combinational
-- inputs : a, b, op_sel
-- outputs: y
alu : PROCESS (a, b, op_sel)
BEGIN -- PROCESS alu
CASE op_sel IS
WHEN "0000" => -- increment
y <= a + 1;
WHEN "0001" => -- decrement
y <= a - 1;
WHEN "0010" => -- test for zero
y <= a;
WHEN "0111" => -- addition
y <= a + b;
WHEN "1000" => -- subtract, compare
y <= a - b;
WHEN "1010" => -- logical and
y <= a AND b;
WHEN "1011" => -- logical or
y <= a OR b;
WHEN "1100" => -- logical xor
y <= a XOR b;
WHEN "1101" => -- logical not
y <= NOT a;
WHEN "1110" => -- shift left logical
y <= a SLL 1;
WHEN "1111" => -- shift right logical
y <= a SRL 1;
WHEN OTHERS =
y <= a;
END CASE;
END PROCESS alu;

Is it normal that this is synthesized as 14 separate functions, which are then multiplexed through a 14:1 multiplexer onto the y bus? I am just trying to find out if this is the fastest implementation that can be had, or if it is possible to get a faster implementation by mapping more functions into a single slice, so that the multiplexer becomes smaller.

As an example of something similar, the multiplexers generated by default by ISE tend to cascade, while building them from LUT6 gives the possibility to build wide but not deep multiplexers (XAPP522).

Regards,

Jurgen

1 - I only see 12 items in the case, where did 14 come from?

2 - What synthesis tool are you using & what part are you targetting? A
lot of optimization is architecture-specific. If your part is good
at implementing wide muxes, maybe the solution your synthesis tool
found is a good one. If it were me using Xilinx tools, I'd look at
the worst case delays after map (route delays not included) to see
how many levels of logic they have and how fast they go. Note that
some "logic levels" (e.g. LUT) cause more delay than others (e.g.
carry chain).

--
Gabor
 

Welcome to EDABoard.com

Sponsor

Back
Top