64-bit comparator.

P

pallav

Guest
I am designing a simple FPU comparison unit for 32/64 bit floating
point. My RTL code is simulating OK and now I'm trying to improve the
code to make it more efficient. I'm trying to improve a 64-bit
comparison. In my unit, the comparison can be either 31 bits (single-
precision) or 63 bits (double precision), excluding sign bit. Right
now, what I do is put a 32-bit multiplexer in front of the comparator
and zero pad the single-precision value to make it 63 bits. Is there a
way to recode this to get ride of these multiplexers to handle both
cases? I've been able to do it pretty easily for detecting A == B == 0
or A != 0, B != 0.

My code is below. The line in question is "assign altb64 = ...."

Thanks for any ideas.

Kind regards.


////////////////////////////////////////////////////////////////////////////////
// FPU Compare Unit
////////////////////////////////////////////////////////////////////////////////
module fpucmpU(/*AUTOARG*/
// Outputs
aeqb, aleb, altb, invalidexception,
// Inputs
srcA, srcB, sp, eqop
);

input [63:0] srcA, srcB; // operands A/B
input sp; // 1 - float32, 0 - float64
input eqop; // if this is strictly a == b
operation
output aeqb, aleb, altb; // a == b, a <= b, a < b
output invalidexception; // if invalid exception is thrown

/*AUTOWIRE*/
// Beginning of automatic wires (for undeclared instantiated-module
outputs)
wire aNaN; // From cmpnan of
fpuNaNU.v
wire asigNaN; // From cmpnan of
fpuNaNU.v
wire bNaN; // From cmpnan of
fpuNaNU.v
wire bsigNaN; // From cmpnan of
fpuNaNU.v
// End of automatics

wire aorbNaN, aorbsigNaN;
wire equal, less;
wire signmismatch, aandbzero32, aandbzero64;
wire aeqb32, aeqb64, altb64;

fpuNaNU cmpnan(/*AUTOINST*/
// Outputs
.aNaN (aNaN),
.bNaN (bNaN),
.asigNaN (asigNaN),
.bsigNaN (bsigNaN),
// Inputs
.srcA (srcA[63:0]),
.srcB (srcB[63:0]),
.sp (sp));

// if A or B is a NaN / signaling NAN.
assign aorbNaN = aNaN | bNaN;
assign aorbsigNaN = asigNaN | bsigNaN;

// IEEE standard states that equality comparison don't throw an
invalid
// exception unless the operand is a signaling NaN. for <=/<, the
exception
// is thrown if atleast one of the operands is NaN.
assign invalidexception = eqop ? aorbsigNaN : aorbNaN;

// determine if A/B have same sign.
assign signmismatch = sp ? srcA[31] ^ srcB[31] : srcA[63] ^ srcB
[63];

// determine if A == B == 0 for single/double precision.
assign aandbzero32 = ~(| (srcA[30:0] | srcB[30:0]));
assign aandbzero64 = aandbzero32 & ~(| (srcA[62:31] | srcB
[62:31]));

// determine if A == B for single/double precision.
assign aeqb32 = (srcA[30:0] == srcB[30:0]);
assign aeqb64 = aeqb32 & (srcA[62:31] == srcB[62:31]);

// assign aleb32 = (srcA[30:0] < srcB[30:0]);
// assign aeqb64 = aleb32 & (

assign altb64 = {sp ? 32'b0 : srcA[62:31], srcA[30:0]} < {sp ?
32'b0 : srcB[62:31], srcB[30:0]};

// check for equality.
// a) if signs of A/B mismatch, A == B == 0.
// b) if signs match, A == B if all the bits are equal.
assign equal = sp ? (signmismatch & aandbzero32) | (~signmismatch &
aeqb32)
: (signmismatch & aandbzero64) | (~signmismatch &
aeqb64);

// check for less than.
// a) if signs of A/B mismatch, A < B if A.sign == 1 and A != 0 and
B != 0.
// b) if signs match, A < B, if (sign ^ (A < B)).
assign less = sp ? (signmismatch & srcA[31] & ~aandbzero32) |
(~signmismatch & ~equal & (srcA[31] ^ /*(srcA
[30:0] < srcB[30:0])*/ altb64))
: (signmismatch & srcA[63] & ~aandbzero64) |
(~signmismatch & ~equal & (srcA[63] ^ /*(srcA
[62:0] < srcB[62:0])*/ altb64));

// determine the relations.
assign aeqb = ~(aorbNaN | aorbsigNaN) & equal;
assign altb = ~aorbNaN & less;
assign aleb = ~aorbNaN & (aeqb | altb);
endmodule // fpucompareU
 
Here is the pastebin that might be better for viewing:

http://p.bubash.org/paste/7683.html

Question is regarding line 64.
 
pallav <pallavgupta@gmail.com> wrote:

< I am designing a simple FPU comparison unit for 32/64 bit floating
< point. My RTL code is simulating OK and now I'm trying to improve the
< code to make it more efficient. I'm trying to improve a 64-bit
< comparison. In my unit, the comparison can be either 31 bits (single-
< precision) or 63 bits (double precision), excluding sign bit. Right
< now, what I do is put a 32-bit multiplexer in front of the comparator

Assuming it is built of LUT4s, there is one extra input on the LUT
that build the comparator. That input could, in no more logic than
the comparator itself, indicate that the low bits should be ignored.

My guess is that the tools will figure that out from a mux in front
of the comparator, but I wouldn't say for sure without seeing the
actual logic.

If not combined, a two input MUX is about as big as the comparator.
It might be faster to generate two comparators and select the output
as appropriate. That would work best if the select logic combined
with later logic.

Most of the logic optimization rules from the TTL gate days don't
apply in LUT logic.

-- glen
 
On Jun 3, 6:05 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
pallav <pallavgu...@gmail.com> wrote:

I am designing a simple FPU comparison unit for 32/64 bit floating
point. My RTL code is simulating OK and now I'm trying to improve the
code to make it more efficient. I'm trying to improve a 64-bit
comparison. In my unit, the comparison can be either 31 bits (single-
precision) or 63 bits (double precision), excluding sign bit. Right
now, what I do is put a 32-bit multiplexer in front of the comparator

Assuming it is built of LUT4s, there is one extra input on the LUT
that build the comparator.  That input could, in no more logic than
the comparator itself, indicate that the low bits should be ignored.

My guess is that the tools will figure that out from a mux in front
of the comparator, but I wouldn't say for sure without seeing the
actual logic.  

If not combined, a two input MUX is about as big as the comparator.
It might be faster to generate two comparators and select the output
as appropriate.  That would work best if the select logic combined
with later logic.

Most of the logic optimization rules from the TTL gate days don't
apply in LUT logic.  

-- glen
Thanks for the response. I was targeting this more for CMOS logic and
ASIC
design flow. However, I had planned to run it on an FPGA for
verification.
I haven't gotten around to writing the synthesis scripts to see what
Synopsys
DC compiler would generate. Perhaps I should look into that quickly.

Kind regards.
 
I think I got this to work:

Here is a snippet of the modified code:

// determine if A == B for single/double precision.
assign aeqb32 = (srcA[30:0] == srcB[30:0]);
assign aeqbupper32 = (srcA[62:31] == srcB[62:31]);
assign aeqb64 = aeqb32 & aeqbupper32;

// determine if A < B (excluding sign).
assign altb32 = srcA[30:0] < srcB[30:0];
assign altb64 = srcA[62:31] < srcB[62:31] | (aeqbupper32 & altb32);

assign altbf = sp ? altb32 : altb64;


This gets rid of the 32-bit muxes in front of the comparator inputs
and puts a 1-bit 2:1 mux at the output.
 
pallav <pallavgupta@gmail.com> wrote:
< I think I got this to work:
(snip on modified comparator)

< This gets rid of the 32-bit muxes in front of the
< comparator inputs and puts a 1-bit 2:1 mux at the output.

So many posts are for FPGA that was I thought about first.

The delay is presumably not so different, but much less logic.
Since you can't have 64 bit input gates in CMOS, how does the
equality test work? Otherwise, for the less than part you can
modify the carry logic to ignore the low half.

-- glen
 
On Jun 3, 8:28 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
pallav <pallavgu...@gmail.com> wrote:

I think I got this to work:
(snip on modified comparator)

This gets rid of the 32-bit muxes in front of the
comparator inputs and puts a 1-bit 2:1 mux at the output.

So many posts are for FPGA that was I thought about first.

The delay is presumably not so different, but much less logic.
Since you can't have 64 bit input gates in CMOS, how does the
equality test work?  Otherwise, for the less than part you can
modify the carry logic to ignore the low half.

-- glen

The logic structure for 32 bit equality comparator would just be a
bunch of XNOR (equivalence) gates that compute A == B. This
32-bit result can then be fed into a binary tree of AND gates
(basically
detect if all 32 bits are 1s). So that's 1 + 5 (log2 32) = 6 stages of
logic
if we assume 2-input AND gates. However, if we have 4 input AND gates,
then that reduces to 1 + 3 = 4 stages. XNOR can be made fast with
mirror logic
in static CMOS. Of course, I'm not counting the inverters needed for
the AND gate
as a logic stage.

Usually, many cell libraries have basic gates with fanins (inputs) of
up to 5-6 (maybe more perhaps, I think).
 
pallav <pallavgupta@gmail.com> wrote:
(snip, I wrote)

<> The delay is presumably not so different, but much less logic.
<> Since you can't have 64 bit input gates in CMOS, how does the
<> equality test work? ?Otherwise, for the less than part you can
<> modify the carry logic to ignore the low half.

< The logic structure for 32 bit equality comparator would just be a
< bunch of XNOR (equivalence) gates that compute A == B. This
< 32-bit result can then be fed into a binary tree of AND gates
< (basically detect if all 32 bits are 1s). So that's 1 + 5
< (log2 32) = 6 stages of logic if we assume 2-input AND gates.

Last I knew, it was four inputs for the widest CMOS gates.
The reason for the question is that you might be able to force
the low half to indicate equality with minimal logic and no
additional gate delay. Then you only need to force the carry
logic in a similar way.

< However, if we have 4 input AND gates, then that reduces
< to 1 + 3 = 4 stages. XNOR can be made fast with mirror logic
< in static CMOS. Of course, I'm not counting the inverters needed
< for the AND gate as a logic stage.

Maybe you aren't worried about delay. The delay model is very
different in an FPGA.

-- glen
 

Welcome to EDABoard.com

Sponsor

Back
Top