C
chthon
Guest
Dear all,
I have the following ALU code as part of a data path:
-- purpose: simple ALU
-- type : combinational
-- inputs : a, b, op_sel
-- outputs: y
alu : PROCESS (a, b, op_sel)
BEGIN -- PROCESS alu
CASE op_sel IS
WHEN "0000" => -- increment
y <= a + 1;
WHEN "0001" => -- decrement
y <= a - 1;
WHEN "0010" => -- test for zero
y <= a;
WHEN "0111" => -- addition
y <= a + b;
WHEN "1000" => -- subtract, compare
y <= a - b;
WHEN "1010" => -- logical and
y <= a AND b;
WHEN "1011" => -- logical or
y <= a OR b;
WHEN "1100" => -- logical xor
y <= a XOR b;
WHEN "1101" => -- logical not
y <= NOT a;
WHEN "1110" => -- shift left logical
y <= a SLL 1;
WHEN "1111" => -- shift right logical
y <= a SRL 1;
WHEN OTHERS =>
y <= a;
END CASE;
END PROCESS alu;
Is it normal that this is synthesized as 14 separate functions, which are then multiplexed through a 14:1 multiplexer onto the y bus? I am just trying to find out if this is the fastest implementation that can be had, or if it is possible to get a faster implementation by mapping more functions into a single slice, so that the multiplexer becomes smaller.
As an example of something similar, the multiplexers generated by default by ISE tend to cascade, while building them from LUT6 gives the possibility to build wide but not deep multiplexers (XAPP522).
Regards,
Jurgen
I have the following ALU code as part of a data path:
-- purpose: simple ALU
-- type : combinational
-- inputs : a, b, op_sel
-- outputs: y
alu : PROCESS (a, b, op_sel)
BEGIN -- PROCESS alu
CASE op_sel IS
WHEN "0000" => -- increment
y <= a + 1;
WHEN "0001" => -- decrement
y <= a - 1;
WHEN "0010" => -- test for zero
y <= a;
WHEN "0111" => -- addition
y <= a + b;
WHEN "1000" => -- subtract, compare
y <= a - b;
WHEN "1010" => -- logical and
y <= a AND b;
WHEN "1011" => -- logical or
y <= a OR b;
WHEN "1100" => -- logical xor
y <= a XOR b;
WHEN "1101" => -- logical not
y <= NOT a;
WHEN "1110" => -- shift left logical
y <= a SLL 1;
WHEN "1111" => -- shift right logical
y <= a SRL 1;
WHEN OTHERS =>
y <= a;
END CASE;
END PROCESS alu;
Is it normal that this is synthesized as 14 separate functions, which are then multiplexed through a 14:1 multiplexer onto the y bus? I am just trying to find out if this is the fastest implementation that can be had, or if it is possible to get a faster implementation by mapping more functions into a single slice, so that the multiplexer becomes smaller.
As an example of something similar, the multiplexers generated by default by ISE tend to cascade, while building them from LUT6 gives the possibility to build wide but not deep multiplexers (XAPP522).
Regards,
Jurgen