Leading Zero count

D

DaveW

Guest
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
==============================================
:
input clk;
input [23:0] xn;
:
:
reg [23:0] lz_count_reg;
reg [4:0] count;

always @( posedge clk )
begin
lz_count_reg=24;
for( count=0; count<24; count=count+1 )
if( xn[ count ] )
lz_count_reg=23-count;
end
=================================================

Thanks in anticipation...
 
The counter you defined is very clean and will produce valid results.

The question I have is whether you intend to synthesize this in a very fast
design. Most synthesizers will do a reasonable job for your target device
but if you need high performance, coding the loop in a more
architecture-friendly fashion may produce better results. The code will be
harder to understand at a glance but your performance can change
significantly.


"DaveW" <dave_wooff@hotmail.com> wrote in message
news:fa8fa75b.0402250421.6223129d@posting.google.com...
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
==============================================
:
input clk;
input [23:0] xn;
:
:
reg [23:0] lz_count_reg;
reg [4:0] count;

always @( posedge clk )
begin
lz_count_reg=24;
for( count=0; count<24; count=count+1 )
if( xn[ count ] )
lz_count_reg=23-count;
end
=================================================

Thanks in anticipation...
 
Dave Wooff asked:
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
I think you can fid a variety of different implementations under the
names "find first 1 bit" and "priority encoder". Different
implementations are likely to cause the synthesizer to generate
different circuits and they are likely have different trade-offs in
terms of speed, size, etc.

Consider, for example:

casez(xn)
24'b0: count = 24;
24'b1: count = 23;
24'b1?: count = 22;
24'b1??: count = 21;
...
24'b1???????????????????????: count = 0;
endcase

or this version

if (xn[23:12] == 0)
begin
count = count + 12;
end
else
begin
xn = xn >> 12;
end;
if (xn[11:6] == 0)
begin
count = count + 6;
end
else
begin
xn = xn >> 6;
end;
if (xn[5:3] == 0)
begin
count = count + 3;
end
else
begin
xn = xn >> 3;
end;
if (xn[2:1] == 0)
begin
count = count + 2;
end
else
begin
xn = xn >> 2;
end;
if (xn[0] == 0)
begin
count = count + 1;
end

There may be other versions that are more friendly to your particular
technology and needs.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
Dave:

First of all the code doesn't work.
Yes lz_count_reg gets the correct value on the first 1 found in the input,
but you gotta break out of the loop now in case there is another 1 e.g.:

for( count=0; count<24 && !xn[ count ]; count=count+1 )

now this doesn't work either because the loop test comes before your
internal test --- but why not make the loop test key...

i.e. for(count=23; count>0 && !xn[ count ]; count = count-1) ;

now you don't need the lz_count_reg (count gives you the answer) - and
you don't need the subractor you will instantiate with line:
lz_count_reg=23-count.
btw why is lz_count_reg 24 bits -- it need only be 5.

That's the silly stuff you did.
More subtitle is the test of xn[ count ]; This instantiates a 24 to 1
multiplexor -- a fairly large piece logic.
It might be better to shift the input value in a shift register and only
test the top bit ---


And then you didn't mention speed --- but the result can be obtained
immediately -- at some cost in logic one reasonable
way to document:

egg count = !xn[23] ? 0 :
!xn[22] ? 1 :
!xn[21] ? 2 :
!xn[20] ? 3 :



etc -- it wouldn't be a bad idea to make the execution order of above
explicit by using parenthesis.

--
bj Porcella
http://pages.sbcglobal.net/bporcella/
"DaveW" <dave_wooff@hotmail.com> wrote in message
news:fa8fa75b.0402250421.6223129d@posting.google.com...
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
==============================================
:
input clk;
input [23:0] xn;
:
:
reg [23:0] lz_count_reg;
reg [4:0] count;

always @( posedge clk )
begin
lz_count_reg=24;
for( count=0; count<24; count=count+1 )
if( xn[ count ] )
lz_count_reg=23-count;
end
=================================================

Thanks in anticipation...
 
John_H <johnhandwork@mail.com> wrote in message
news:f74%b.8$tc4.20291@news-west.eli.net...
The counter you defined is very clean and will produce valid results.

The question I have is whether you intend to synthesize this in a very
fast
design. Most synthesizers will do a reasonable job for your target device
but if you need high performance, coding the loop in a more
architecture-friendly fashion may produce better results. The code will
be
harder to understand at a glance but your performance can change
significantly.
Thanks. I'm using an FPGA. I come from a DSP Software background really so
I'm a little green on this subject. I guessed that this code may be
inefficient so I was interested to know if the approach I had taken had any
merit whatsoever.
 
Chris F Clark <cfc@shell01.TheWorld.com> wrote in message
news:sddfzczf3ha.fsf@shell01.TheWorld.com...
Dave Wooff asked:
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.

I think you can fid a variety of different implementations under the
names "find first 1 bit" and "priority encoder". Different
implementations are likely to cause the synthesizer to generate
different circuits and they are likely have different trade-offs in
terms of speed, size, etc.

Consider, for example:......
Thanks for this, it is good food for thought. I can see how your alternate
approach tries to reduce the number of steps involved and I guess this would
result in faster logic, although I would not know for sure without
experimenting. I trust your comments in any case.
 
bj Porcella <bporcella@sbcglobal.net> wrote in message
news:IL6%b.30244$ol6.173@newssvr25.news.prodigy.com...
Dave:

First of all the code doesn't work.
Well, it seems to on my simulator, I think you need to re-examine the code
snippet. Yes the result could be 5 bits, that was a mistake in simplifying
the code to put in my original post.


That's the silly stuff you did.
More subtitle is the test of xn[ count ]; This instantiates a 24 to 1
multiplexor -- a fairly large piece logic.
It did seem to use up not an insubstantial amount of logic, but not so much
that I was worried. I take your point though.

Thankyou.
 
I think my other motivation for posting this was that this problem must be
one of the more common ones as it is used in floating point arithmetic, so I
would have thought it had been done to death and that a standard Verilog
algorithm would have resulted by now. I did search for this, however, and
didn't find all that much really, so I was a little surprised.

--
Dave Wooff
dave@dmwooff.freeserve.co.uk
DaveW <dave_wooff@hotmail.com> wrote in message
news:fa8fa75b.0402250421.6223129d@posting.google.com...
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
==============================================
:
input clk;
input [23:0] xn;
:
:
reg [23:0] lz_count_reg;
reg [4:0] count;

always @( posedge clk )
begin
lz_count_reg=24;
for( count=0; count<24; count=count+1 )
if( xn[ count ] )
lz_count_reg=23-count;
end
=================================================

Thanks in anticipation...
 
On Wed, 25 Feb 2004 21:52:00 -0000, "David Wooff"
<dave@dmwooff.freeserve.co.uk> wrote:

John_H <johnhandwork@mail.com> wrote in message
news:f74%b.8$tc4.20291@news-west.eli.net...
The counter you defined is very clean and will produce valid results.

The question I have is whether you intend to synthesize this in a very
fast
design. Most synthesizers will do a reasonable job for your target device
but if you need high performance, coding the loop in a more
architecture-friendly fashion may produce better results. The code will
be
harder to understand at a glance but your performance can change
significantly.

Thanks. I'm using an FPGA. I come from a DSP Software background really so
I'm a little green on this subject. I guessed that this code may be
inefficient so I was interested to know if the approach I had taken had any
merit whatsoever.
I did a 200MHz 16 input / five output priority encoder in an FPGA a
few years back. I used a carry chain to find the first bit set, then
some straightforward logic (in a tree structure) to convert the bit
position to binary.

Regards,
Allan.
 
dave_wooff@hotmail.com (DaveW) wrote in message news:<fa8fa75b.0402250421.6223129d@posting.google.com>...
I came up with the following (simplified) code snippet for a leading
zero counter - question is: is this the "best" way to do it? I know it
works but I wonder if there is a more efficient/correct approach.
==============================================
:
input clk;
input [23:0] xn;
:
:
reg [23:0] lz_count_reg;
reg [4:0] count;

always @( posedge clk )
begin
lz_count_reg=24;
for( count=0; count<24; count=count+1 )
if( xn[ count ] )
lz_count_reg=23-count;
end
=================================================

Thanks in anticipation...
You need to think with a HW hat on to get really good performance. Dig
up an old TI TTL book and check out the section on Priority Encoding.
Things have't changed much since then. The natural way to think is
recursive division. Typically for large N say 16 or 64 partition into
4 ways and solve that and recombine results. When you know how to
recombine, you only need to PE 4bits and build result from there.

Should be possible to do arbitrary encoder for N (power of 2 or 4) in
about O logN time where O might be 1 or 2 gate delays. In an ASIC, O
might be closer to transit time so even faster. For an FPGA you will
be better off with adder type circuit and XST doesn't seem to synth
fast adders with ripple adder until the length is long enough say 4b,
so recursion doesn't work here. And doing math with multiple adder
fragments in one pipe will be slow too.

If your input word is all 0's but for a single 1, the solution is much
easier, then a tree decoder works quickly. If you have 1 or more 1s,
1st build a force all right 1s to 0 stage infront. This stage looks
triangular if you do it bitwise, but again recursion can help here
too, break into 4 or 8 bit sub words. The right most 1 or lsb needs to
be killed by all the 1's to the left, and big wide gates in FPGA are
also slow.

Its a solved problem, but not neccesary encoded into any tool I know
of. Also look at the big blue Verilog/VHDL book by Smith IIRC that
describes hundreds of synth code modules in both langs with
schematics. PEs should be in there. its a must have book for any HW
guy.

johnjakson_usa_com
 
Greetings, bj Porcella.

"bj Porcella" <bporcella@sbcglobal.net> wrote in message
news:IL6%b.30244$ol6.173@newssvr25.news.prodigy.com...
Dave:

First of all the code doesn't work.
This form of code has worked fine for my own synthesis and simulation,
relying on the blocking operator (=) to provide a loop-based priority
encoder. The later iterations of the loop are higher in the priority chain.

Yes lz_count_reg gets the correct value on the first 1 found in the input,
but you gotta break out of the loop now in case there is another 1 e.g.:
The idea with the loop isn't to find the first one (note that the counter is
incrementing from 0 to find the MSbit that is one) but to find the *last*
one.

; count=count+1 )

now this doesn't work either because the loop test comes before your
internal test --- but why not make the loop test key...

i.e. for(count=23; count>0 && !xn[ count ]; count = count-1) ;
This form of conditional loop exit will probably wreak havok on synthesis
though simulators would probably be okay with it.

now you don't need the lz_count_reg (count gives you the answer) - and
you don't need the subractor you will instantiate with line:
lz_count_reg=23-count.
The subtraction doesn't get implemented in logic but provides a constant
that feeds the priority encoder.

btw why is lz_count_reg 24 bits -- it need only be 5.

That's the silly stuff you did.
More subtitle is the test of xn[ count ]; This instantiates a 24 to 1
multiplexor -- a fairly large piece logic.
The 24-stage priority encoder could be nasty in general but the case of a 24
stage priority encode of *constants* the situation is much cleaner. I
believe it was Peter Alfke who proposed using BlockRAMs instead of general
logic to do most of the work for a count like this.

It might be better to shift the input value in a shift register and only
test the top bit ---


And then you didn't mention speed --- but the result can be obtained
immediately -- at some cost in logic one reasonable
way to document:
I love immediate logic. Even 10 ps of propagation delay is such a bother.

? 0 :
!xn[22] ? 1 :
!xn[21] ? 2 :
!xn[20] ? 3 :



etc -- it wouldn't be a bad idea to make the execution order of above
explicit by using parenthesis.

--
bj Porcella
http://pages.sbcglobal.net/bporcella/
[ original post omitted ]
 
Dave and John:

mea culpa.... It didn't even cross my mind that you would be checking
from the bottom up. Sorry. The code does work. That's the
good news.

On the other hand that is SO wistful of time....... start checking from
the MSB and stop when you hit a 1. If data is uniformly distributed
the circuit will finish on any given check 50% of the time -- much faster
on average than going through all the bits.

And yes I agree with Dave that my "loop code" is not good for synthesis --
I would never write it that way -- but might implement the
basic idea in a different fashion. ---- Personally I never use for
loops in RTL code --- I like to think more in terms of hardware
structure...

like:

//assume begin_check is a pulse indicating good xn data and I am looking
for the MS 1 )...

always @(posedge clk)
if (begin_check) test_bit <= 23;
else if (xn[test_bit] == 0) test_bit <= (test_bit >0) ? test_bit-1 : 0;

wire done = xn[test_bit]; // when done test_bit points to MSB 1.



--
bj Porcella
http://pages.sbcglobal.net/bporcella/
"John_H" <johnhandwork@mail.com> wrote in message
news:WOp%b.13$tc4.27341@news-west.eli.net...
Greetings, bj Porcella.

"bj Porcella" <bporcella@sbcglobal.net> wrote in message
news:IL6%b.30244$ol6.173@newssvr25.news.prodigy.com...
Dave:

First of all the code doesn't work.

This form of code has worked fine for my own synthesis and simulation,
relying on the blocking operator (=) to provide a loop-based priority
encoder. The later iterations of the loop are higher in the priority
chain.

Yes lz_count_reg gets the correct value on the first 1 found in the
input,
but you gotta break out of the loop now in case there is another 1
e.g.:

The idea with the loop isn't to find the first one (note that the counter
is
incrementing from 0 to find the MSbit that is one) but to find the *last*
one.

for( count=0; count<24 && !xn[ count ]; count=count+1 )

now this doesn't work either because the loop test comes before your
internal test --- but why not make the loop test key...

i.e. for(count=23; count>0 && !xn[ count ]; count = count-1) ;

This form of conditional loop exit will probably wreak havok on synthesis
though simulators would probably be okay with it.

now you don't need the lz_count_reg (count gives you the answer) -
and
you don't need the subractor you will instantiate with line:
lz_count_reg=23-count.

The subtraction doesn't get implemented in logic but provides a constant
that feeds the priority encoder.

btw why is lz_count_reg 24 bits -- it need only be 5.

That's the silly stuff you did.
More subtitle is the test of xn[ count ]; This instantiates a 24 to
1
multiplexor -- a fairly large piece logic.

The 24-stage priority encoder could be nasty in general but the case of a
24
stage priority encode of *constants* the situation is much cleaner. I
believe it was Peter Alfke who proposed using BlockRAMs instead of general
logic to do most of the work for a count like this.

It might be better to shift the input value in a shift register and only
test the top bit ---


And then you didn't mention speed --- but the result can be obtained
immediately -- at some cost in logic one reasonable
way to document:

I love immediate logic. Even 10 ps of propagation delay is such a bother.

egg count = !xn[23] ? 0 :
!xn[22] ? 1 :
!xn[21] ? 2 :
!xn[20] ? 3 :



etc -- it wouldn't be a bad idea to make the execution order of
above
explicit by using parenthesis.

--
bj Porcella
http://pages.sbcglobal.net/bporcella/

[ original post omitted ]
 

Welcome to EDABoard.com

Sponsor

Back
Top