a faster ALU in verilog

rekz · May 1, 2010

I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

module ALU(Reg1, Reg2, Control, Result);

input signed[31:0] Reg1;
input signed[31:0] Reg2;
input[2:0] Control;
output[31:0] Result;

reg signed[31:0] Result;
reg[63:0] mul;

always @(Reg1, Reg2, Control) begin
case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;
3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];
end
3'b101:
begin

end
default:
Result = Reg1;

endcase

end
endmodule

how can I optimize it more? Should I create a gate level adder? Will
it speed up more in the synthesis report

John_H · May 1, 2010

On May 1, 12:55 am, rekz <aditya15...@gmail.com> wrote:

I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

module ALU(Reg1, Reg2, Control, Result);

input signed[31:0] Reg1;
input signed[31:0] Reg2;
input[2:0] Control;
output[31:0] Result;

reg signed[31:0] Result;
reg[63:0] mul;

always @(Reg1, Reg2, Control) begin
case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;
3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];
end
3'b101:
begin

end
default:
Result = Reg1;

endcase

end
endmodule

how can I optimize it more? Should I create a gate level adder? Will
it speed up more in the synthesis report

Is it not meeting timing?
Is the area larger than you like and you're implementing thousands of
these?
Do you think the implementation isn't "pretty?"

Underlying question: How it it not "optimum?"

Your target is ASIC, yes? Very different answer for FPGAs.

glen herrmannsfeldt · May 1, 2010

rekz <aditya15417@gmail.com> wrote:

I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

(snip)

case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;

Traditionally, one could optimize many operations using
a very small amount of logic. See, for example, the 74181.
(There is even a Wikipedia page for it.) Subtract is done
by complementing the second input and forcing the carry in.

3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];

But when you get to multiply, there isn't much point in
optimizing the rest. Also, in FPGAs the problem is very
different, though not gone. FPGAs likely can still optimize
the case of add/subtract, and maybe AND/OR/XOR, but not multiply.

-- glen

rekz · May 3, 2010

On May 1, 5:31 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

rekz <aditya15...@gmail.com> wrote:
I have the following ALU as a part of my datapath and I want to
optimize it more as I think that doing the way below is slow:

(snip)

case (Control)
3'b001: // ADD
Result =Reg1+Reg2;
3'b010: // SUBTRACT
Result =Reg1-Reg2;
3'b011: // AND
Result=Reg1&Reg2;

Traditionally, one could optimize many operations using
a very small amount of logic. See, for example, the 74181.
(There is even a Wikipedia page for it.) Subtract is done
by complementing the second input and forcing the carry in.

3'b100: // MUL
begin
mul = Reg1 * Reg2;
Result = mul[31:0];

But when you get to multiply, there isn't much point in
optimizing the rest. Also, in FPGAs the problem is very
different, though not gone. FPGAs likely can still optimize
the case of add/subtract, and maybe AND/OR/XOR, but not multiply.

-- glen

how can I optimize the multiply further on then? why can't xilinx
optimize multiply when we synthesize it?

glen herrmannsfeldt · May 3, 2010

rekz <aditya15417@gmail.com> wrote:
(snip)

how can I optimize the multiply further on then? why can't xilinx
optimize multiply when we synthesize it?

In the days when gates were expensive, one could combine the
logic for add/subtract/and/or to share gates. Multiply was
done by shift and add (find references to the MQ register).

A combinatorial (not clocked) multiplier is so much bigger
than an adder that it doesn't make much sense to try to
combine them to reduce gate count. (Well, what do you mean
by optimize?) Especially note that many FPGAs have a multiply
block on chip. (Usually more than one.) Note alsot that
the multiplexer needed to select the different functions
is likely bigger than the adder.

-- glen

a faster ALU in verilog

rekz

Guest

John_H

Guest

glen herrmannsfeldt

Guest

rekz

Guest

glen herrmannsfeldt

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

a faster ALU in verilog

rekz

Guest

John_H

Guest

glen herrmannsfeldt

Guest

rekz

Guest

glen herrmannsfeldt

Guest

Log in

Welcome to EDABoard.com

Sponsor