a faster ALU in verilog

R

rekz

Guest
I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

module ALU(Reg1, Reg2, Control, Result);

input signed[31:0] Reg1;
input signed[31:0] Reg2;
input[2:0] Control;
output[31:0] Result;

reg signed[31:0] Result;
reg[63:0] mul;

always @(Reg1, Reg2, Control) begin
case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;
3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];
end
3'b101:
begin


end
default:
Result = Reg1;

endcase

end
endmodule


how can I optimize it more? Should I create a gate level adder? Will
it speed up more in the synthesis report
 
On May 1, 12:55 am, rekz <aditya15...@gmail.com> wrote:
I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

module ALU(Reg1, Reg2, Control, Result);

input signed[31:0] Reg1;
input signed[31:0] Reg2;
input[2:0] Control;
output[31:0] Result;

reg signed[31:0] Result;
reg[63:0] mul;

always @(Reg1, Reg2, Control) begin
case (Control)
  3'b001: // ADD
           Result =Reg1+Reg2;
  3'b010: // SUBTRACT
           Result =Reg1-Reg2;
  3'b011: // AND
                Result=Reg1&Reg2;
  3'b100: // MUL
           begin
                        mul = Reg1 * Reg2;
                        Result = mul[31:0];
                end
  3'b101:
                begin

                end
  default:
                Result = Reg1;

endcase

end
endmodule

how can I optimize it more? Should I create a gate level adder? Will
it speed up more in the synthesis report
Is it not meeting timing?
Is the area larger than you like and you're implementing thousands of
these?
Do you think the implementation isn't "pretty?"

Underlying question: How it it not "optimum?"

Your target is ASIC, yes? Very different answer for FPGAs.
 
rekz <aditya15417@gmail.com> wrote:
I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:
(snip)

case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;
Traditionally, one could optimize many operations using
a very small amount of logic. See, for example, the 74181.
(There is even a Wikipedia page for it.) Subtract is done
by complementing the second input and forcing the carry in.

3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];
But when you get to multiply, there isn't much point in
optimizing the rest. Also, in FPGAs the problem is very
different, though not gone. FPGAs likely can still optimize
the case of add/subtract, and maybe AND/OR/XOR, but not multiply.

-- glen
 
On May 1, 5:31 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
rekz <aditya15...@gmail.com> wrote:
I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

(snip)

case (Control)
 3'b001: // ADD
          Result =Reg1+Reg2;
 3'b010: // SUBTRACT
          Result =Reg1-Reg2;
 3'b011: // AND
               Result=Reg1&Reg2;

Traditionally, one could optimize many operations using
a very small amount of logic.  See, for example, the 74181.
(There is even a Wikipedia page for it.)  Subtract is done
by complementing the second input and forcing the carry in.

 3'b100: // MUL
          begin
                       mul = Reg1 * Reg2;
                       Result = mul[31:0];

But when you get to multiply, there isn't much point in
optimizing the rest.  Also, in FPGAs the problem is very
different, though not gone.  FPGAs likely can still optimize
the case of add/subtract, and maybe AND/OR/XOR, but not multiply.

-- glen
how can I optimize the multiply further on then? why can't xilinx
optimize multiply when we synthesize it?
 
rekz <aditya15417@gmail.com> wrote:
(snip)

how can I optimize the multiply further on then? why can't xilinx
optimize multiply when we synthesize it?
In the days when gates were expensive, one could combine the
logic for add/subtract/and/or to share gates. Multiply was
done by shift and add (find references to the MQ register).

A combinatorial (not clocked) multiplier is so much bigger
than an adder that it doesn't make much sense to try to
combine them to reduce gate count. (Well, what do you mean
by optimize?) Especially note that many FPGAs have a multiply
block on chip. (Usually more than one.) Note alsot that
the multiplexer needed to select the different functions
is likely bigger than the adder.

-- glen
 

Welcome to EDABoard.com

Sponsor

Back
Top