Vry silly noob question

C

Chris Hinsley

Guest
Sorry for asking such a simple question, but it's really anoying me !

I have a pice of code that does:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
cnt += 1;
end

That counter is driving the rest of my design, and after doing timeing
analysis I find the longest path is the ripple carry adder that the +=
is generateing.

I have got my own modules for faster math, one is called add32. But I
can't get this to work:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
add32 u1(cnt, 32'b1, cnt);
end

I keep getting a syntax error. I'm sure I've not understood some simple
Verilog issue here, but why can't I use my own module instead of the +=
?

Thanks in advance.

Chris
 
On 2011-04-04 15:01:48 +0100, Chris Hinsley said:

Sorry for asking such a simple question, but it's really anoying me !

I have a pice of code that does:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
cnt += 1;
end

That counter is driving the rest of my design, and after doing timeing
analysis I find the longest path is the ripple carry adder that the +=
is generateing.

I have got my own modules for faster math, one is called add32. But I
can't get this to work:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
add32 u1(cnt, 32'b1, cnt);
end

I keep getting a syntax error. I'm sure I've not understood some simple
Verilog issue here, but why can't I use my own module instead of the +=
?

Thanks in advance.

Chris
reg [31:0] cnt = 32'b0;
wire [31:0] nxt_cnt;
add32 u1(cnt, 32'b1, nxt_cnt, c, v);
always @(posedge CLK_FPGA_50M)
begin
cnt = nxt_cnt;
end

This compiles, but I didn't realise that you can't use a module inside
an allways. Can somone confirm that ?

Thanks

Chris
 
Chris Hinsley <chris.hinsley@gmail.com> wrote:
On 2011-04-04 15:01:48 +0100, Chris Hinsley said:

Sorry for asking such a simple question, but it's really anoying me !

I have a pice of code that does:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
cnt += 1;
end

That counter is driving the rest of my design, and after doing timeing
analysis I find the longest path is the ripple carry adder that the +=
is generateing.

I have got my own modules for faster math, one is called add32. But I
can't get this to work:

reg [31:0] cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
add32 u1(cnt, 32'b1, cnt);
end
(snip)
cnt = 32'b0;
wire [31:0] nxt_cnt;
add32 u1(cnt, 32'b1, nxt_cnt, c, v);
always @(posedge CLK_FPGA_50M)
begin
cnt = nxt_cnt;
end

This compiles, but I didn't realise that you can't use a module inside
an allways. Can somone confirm that ?
Sounds right to me. One of the first things I remember learning
when actually running verilog.

Since a continuous assignment or module reference is not clocked,
there is no use for it to be inside an always block.

In my verilog, which is structural except for registers, I put
the register in its own module, such that it would come out.

add32 u1(cnt, 32'b1, nxt_cnt, c, v);
reg32 u2(cnt, nxt_cnt, CLK_FPGA_50M);

(with the always block in the reg32 module)

I usually write modules, such as add32 and reg32, with the
outputs first and inputs later, a popular convention.

-- glen
 
cnt = 32'b0;
wire [31:0] nxt_cnt;
add32 u1(cnt, 32'b1, nxt_cnt, c, v);
always @(posedge CLK_FPGA_50M)
begin
cnt = nxt_cnt;
end

This compiles, but I didn't realise that you can't use a module inside
an allways. Can somone confirm that ?

Sounds right to me. One of the first things I remember learning
when actually running verilog.

Since a continuous assignment or module reference is not clocked,
there is no use for it to be inside an always block.

In my verilog, which is structural except for registers, I put
the register in its own module, such that it would come out.

add32 u1(cnt, 32'b1, nxt_cnt, c, v);
reg32 u2(cnt, nxt_cnt, CLK_FPGA_50M);

(with the always block in the reg32 module)

I usually write modules, such as add32 and reg32, with the
outputs first and inputs later, a popular convention.

-- glen
I'm torn between going all structural, so I understand what ever bit is
doing, and going as high level as possible and leaving it to the
compiler. As I'm still a noob to Verilog I think I'll tough it out with
all structural till I really get a good feel for where all the
electrons are going. :)

I had noticed with the ordering I was bucking the trend. Somhow I just
ended up likeing that way better. I might switch things around to 'fit
in'.

Chris
 
On Mon, 4 Apr 2011 15:34:38 +0100, Chris Hinsley
<chris.hinsley@gmail.com> wrote:

cnt = 32'b0;
always @(posedge CLK_FPGA_50M)
begin
cnt += 1;
end

That counter is driving the rest of my design, and after doing timeing
analysis I find the longest path is the ripple carry adder that the +=
is generateing.
[...]
reg [31:0] cnt = 32'b0;
wire [31:0] nxt_cnt;
add32 u1(cnt, 32'b1, nxt_cnt, c, v);
always @(posedge CLK_FPGA_50M)
begin
cnt = nxt_cnt;
end

This compiles, but I didn't realise that
you can't use a module inside
an allways. Can somone confirm that ?
Indeed, it is true: you can't - the body of
an always block is procedural code, and a
module instance definitely isn't procedural.

However....

** there is something HORRIBLY WRONG
** with both your code fragments.

In both cases you're making BLOCKING assignment
(using += or straight =) to a variable that is
an output from a clocked always block - you've
said yourself that the count value is being
sent to other parts of the design. This is
A Very, Very Bad Idea.

As has been recited here and elsewhere,
ad nauseam but completely correctly:

in a clocked always block, when writing
to any variable that is an output from
that always block, you MUST use
nonblocking assignment (<=).

Disobey this rule and you risk faulty designs,
and get a cast-iron guarantee of unreliable
simulation.

In your second example it's easy to fix this:
just write cnt <= next_cnt. In the first
example you would need to write cnt <= cnt+1.
It might be a good idea to try exactly that;
the synthesis tool will certainly recognize
it as a counter (whereas it might not have
done so, in the case of your unusual += code)
and one would hope it might then do a better
job of the optimization. You really should
not need to instantiate adder macros by hand
to get adequate performance from a counter.

But that's another rant, for another day.
Just fix the nonblocking assignments, soon,
before you end up in a lot of trouble.

PS: What are the 'c' and 'v' connections
to your counter macro? Carry in and out,
I would guess. I sincerely hope the
carry-in is tied off to zero? Bear in
mind that if you supply some random name
for the connection to any one-bit port of
a module instance, Verilog will implicitly
declare that name as a one-bit wire and
happily leave it dangling... OK for the
carry-out, but unfortunate for carry-in.

Oh, are they carry-out and overflow-out?
In that case it's OK, although most folk
still regard it as poor style to rely on
implicit wire declarations.
--
Jonathan Bromley
 
cnt = 32'b0;
wire [31:0] nxt_cnt;
add32 u1(cnt, 32'b1, nxt_cnt, c, v);
always @(posedge CLK_FPGA_50M)
begin
cnt = nxt_cnt;
end

This compiles, but I didn't realise that
you can't use a module inside
an allways. Can somone confirm that ?

Indeed, it is true: you can't - the body of
an always block is procedural code, and a
module instance definitely isn't procedural.
OK, thanks.

In both cases you're making BLOCKING assignment
(using += or straight =) to a variable that is
an output from a clocked always block - you've
said yourself that the count value is being
sent to other parts of the design. This is
A Very, Very Bad Idea.
OK, I'm listening.

Does that apply to a genvar and integer as well ? Or does += work fine
because a genvar is just internal to the compiler and dosn't represent
actual hardware ?

For example here's a macro I came up with to generate encoders, feel
free to comment on any problems. (One and only 1 input line should be
one, it's not a priority encoder so the answer is garbage if the input
isn't correct)

`define ENCODER(NAME, BITS) \
module NAME(i, o); \
parameter WIDTH = (2 ** BITS); \
input [(WIDTH - 1):0] i; \
output [(BITS - 1):0] o; \
reg [(BITS - 1):0] r; \
genvar row; \
generate \
begin \
for (row = 0; row < BITS; row = row + 1) \
begin : G \
always \
begin \
integer start; \
integer range; \
r[row] = 0; \
for (start = (2 ** row); start < WIDTH; start = start + (2 ** row)
+ (2 ** row)) \
begin \
for (range = start; range < (start + (2 ** row)); range = range + 1) \
begin \
r[row] = (r[row] | i[range]); \
end \
end \
end \
assign o[row] = r[row]; \
end \
end \
endgenerate \
endmodule

`ENCODER(enc1, 1)
`ENCODER(enc2, 2)
`ENCODER(enc3, 3)
`ENCODER(enc4, 4)
`ENCODER(enc5, 5)

As has been recited here and elsewhere,
ad nauseam but completely correctly:

in a clocked always block, when writing
to any variable that is an output from
that always block, you MUST use
nonblocking assignment (<=).
I'll try to remember that one then.

You really should
not need to instantiate adder macros by hand
to get adequate performance from a counter.
I know, but I'm trying to learn down to the electrons how to build
stuff. I already found out that I can't beat the FPGA compiler at doing
add's '+' or muls '*' despite putting a lot of effort into building my
own Kogge-Stone adders and Booth CS multipliers because the compiler
dosn't see it can map this onto it's internal fast logic blocks for add
and mul.

But it was educational just to build it all from from transistors
upwards. I used Logisim at first, got a 32 bit RISC chip design runing,
and now I'm moving it over to Verilog to learn Verilog and FPGA design
tools.

PS: What are the 'c' and 'v' connections
to your counter macro?
Just carry and overflow out from my 32bit adder.

Oh, are they carry-out and overflow-out?
In that case it's OK, although most folk
still regard it as poor style to rely on
implicit wire declarations.
Again, thanks for the smack with the GURU stick ! :)

Chris
 
On Wed, 6 Apr 2011 21:45:45 +0100, Chris Hinsley wrote:

In both cases you're making BLOCKING assignment
(using += or straight =) to a variable that is
an output from a clocked always block - you've
said yourself that the count value is being
sent to other parts of the design. This is
A Very, Very Bad Idea.

OK, I'm listening.

Does that apply to a genvar and integer as well ? Or does += work fine
because a genvar is just internal to the compiler and dosn't represent
actual hardware ?
Fine with genvar, for exactly the reason you state.

Integers are just 32-bit signed registers in Verilog, and
the same guidance applies: If you're assigning to it in a
clocked always block, and its value will be used outside
that always block, then use <= if you want to keep your
sanity. But I guess you're thinking about integer as a
loop counter, in which case it surely won't leak out
of its always block (and, indeed, probably won't be a
piece of hardware at all) and again the blocking assign
is not only OK but essential. Heh, a funny thought just
occurred to me: for extra credit, explain why this
code is broken:

always @(whatever) begin: Broken
integer i;
for (i <= 0; i < LIMIT; i <= i+1) ...

For example here's a macro I came up with to generate encoders, feel
free to comment on any problems. (One and only 1 input line should be
one, it's not a priority encoder so the answer is garbage if the input
isn't correct)
[snip]

Fascinating - it's a cute macro (it's almost a code-generator
script!) but isn't it kinda obscure? Complex macros are
not the happiest things in Verilog; errors in their code
body, or bad uses of them, can often give rise to completely
incomprehensible compiler diagnostics. Two alternative
formulations of (I hope) the same logic are at the end of
this post.

One very specific criticism: *don't* use 2**N; instead
favour 1<<N. The reason is that 2**N yields a real rather
than integer result, which can have troublesome consequences
when defining array sizes and subscripts.

You really should
not need to instantiate adder macros by hand
to get adequate performance from a counter.

I know, but I'm trying to learn down to the electrons how to build
stuff.
Fair enough. It's a good way to get a grip on how things
hang together. At some stage, though, you'll need to find
ways to be as productive as possible.

Again, thanks for the smack with the GURU stick ! :)
Wasn't meant to be a smack so much as a friendly nudge.

Now for the code suggestion:

module ENCODER (i, o);
parameter OUTPUT_BITS = 2;
localparam INPUT_BITS = 1 << OUTPUT_BITS;
input [INPUT_BITS-1:0] i;
output [OUTPUT_BITS-1:0] o;
reg [OUTPUT_BITS-1:0] o;

always @i begin : EncoderLogic
integer out_bit, in_bit, start;
reg or_acc;
for (out_bit = 0; out_bit < OUTPUT_BITS; out_bit=out_bit+1) begin
or_acc = 0;
for (start = 1<<out_bit;
start < INPUT_BITS;
start = start + (1<<out_bit) ) begin
for (in_bit = start;
in_bit < (start + (1<<out_bit));
in_bit = in_bit + 1 ) begin
or_acc = or_acc | i[in_bit];
end
end
o[out_bit] = or_acc;
end
end
endmodule

Now you can just instantiate it as you need it...

ENCODER #3 enc8to3(.i(vec8), .o(vec3));

But there's also a cute trick for the calculation:

always @i begin: EncoderLogic2
reg or_acc;
integer in_bit, out_bit;
for (out_bit=0; out_bit<OUTPUT_BITS; out_bit=out_bit+1) begin
or_acc = 0;
for (in_bit=0; in_bit<INPUT_BITS; in_bit=in_bit+1)
if (in_bit & (1<<out_bit))
or_acc = or_acc | i[in_bit];
end
end

Generally, generates and (even more strongly) macros
are best thought of as tools of last resort - not something
you grab instinctively at the first opportunity.
 
On 4/6/2011 3:28 PM, Jonathan Bromley wrote:

One very specific criticism: *don't* use 2**N; instead
favour 1<<N. The reason is that 2**N yields a real rather
than integer result, which can have troublesome consequences
when defining array sizes and subscripts.
If that's what you are getting then your vendor is being lazy and not
following the standard. They are likely using the C pow routine instead
taking the time to write the appropriate bit based power routine. The
power operator should only return a real value if either of the operands
are real.

The reason I would favor 1<<N over 2**N is that it is more portable.

I also personally like to use $clog2() where appropriate.

Cary
 
For example here's a macro I came up with to generate encoders, feel
free to comment on any problems. (One and only 1 input line should be
one, it's not a priority encoder so the answer is garbage if the input
isn't correct)

[snip]

Fascinating - it's a cute macro (it's almost a code-generator
script!) but isn't it kinda obscure?
I was trying to save having to type lots of source for each width of
decoder, and avoid finger trouble from the odd mistake in each
different width of decoder. But I wanted to force the compiler to
produce a tree of 'or' gates.

Complex macros are
not the happiest things in Verilog; errors in their code
body, or bad uses of them, can often give rise to completely
incomprehensible compiler diagnostics. Two alternative
formulations of (I hope) the same logic are at the end of
this post.
Thanks, I'm still learning on this stuff. I didn't realise that a
module can get expanded like a macro !

One very specific criticism: *don't* use 2**N; instead
favour 1<<N. The reason is that 2**N yields a real rather
than integer result, which can have troublesome consequences
when defining array sizes and subscripts.
Fair enough. Thanks for the tip.

You really should
not need to instantiate adder macros by hand
to get adequate performance from a counter.

I know, but I'm trying to learn down to the electrons how to build
stuff.

Fair enough. It's a good way to get a grip on how things
hang together. At some stage, though, you'll need to find
ways to be as productive as possible.
I'm allready starting to think I should just 'get on with it', I'm
getting bogged down in re-inventing the wheel due to not trusting the
compiler to create what I want at the hardware level. Comes from years
of assembler codeing and never quite trusting C++ compilers to do what
I want. ;)

Now for the code suggestion:

module ENCODER (i, o);
parameter OUTPUT_BITS = 2;
localparam INPUT_BITS = 1 << OUTPUT_BITS;
input [INPUT_BITS-1:0] i;
output [OUTPUT_BITS-1:0] o;
reg [OUTPUT_BITS-1:0] o;
I'll study this. Many thanks.

Generally, generates and (even more strongly) macros
are best thought of as tools of last resort - not something
you grab instinctively at the first opportunity.
Your not going to like me doing this then ! :)

`define RIPPLE_ADDER(NAME, WIDTH) \
module NAME(i_a, i_b, i_c, o_s, o_c); \
input i_c; \
input [(WIDTH - 1):0] i_a, i_b; \
output [(WIDTH - 1):0] o_s; \
output o_c; \
wire [(WIDTH - 1):0] c; \
genvar n; \
generate \
for (n = 0; n < WIDTH; n += 1) \
begin : G \
if (n == 0) \
fa U(i_a[n], i_b[n], i_c, o_s[n], c[n]); \
else \
fa U(i_a[n], i_b[n], c[n - 1], o_s[n], c[n]); \
end \
endgenerate \
assign o_c = c[(WIDTH - 1)]; \
endmodule

`RIPPLE_ADDER(ripple_add4, 4)
`RIPPLE_ADDER(ripple_add5, 5)
`RIPPLE_ADDER(ripple_add6, 6)
`RIPPLE_ADDER(ripple_add7, 7)
`RIPPLE_ADDER(ripple_add32, 32)

Chris
 
Now for the code suggestion:

module ENCODER (i, o);
parameter OUTPUT_BITS = 2;
localparam INPUT_BITS = 1 << OUTPUT_BITS;
input [INPUT_BITS-1:0] i;
output [OUTPUT_BITS-1:0] o;
reg [OUTPUT_BITS-1:0] o;

I'll study this. Many thanks.
As another example, I'm struggling with expressing this priority
encoder (leading zero counter) form as a macro. So how would I
paramaterise it as a single module for different widths ? Don't worry
I'm not going to post all my design here and get you to do the job ! ;)

module pri_enc5 (i, o);
input [31:0] i;
output [4:0] o;
assign o =
i[0] ? 0 :
i[1] ? 1 :
i[2] ? 2 :
i[3] ? 3 :
i[4] ? 4 :
i[5] ? 5 :
i[6] ? 6 :
i[7] ? 7 :
i[8] ? 8 :
i[9] ? 9 :
i[10] ? 10 :
i[11] ? 11 :
i[12] ? 12 :
i[13] ? 13 :
i[14] ? 14 :
i[15] ? 15 :
i[16] ? 16 :
i[17] ? 17 :
i[18] ? 18 :
i[19] ? 19 :
i[20] ? 20 :
i[21] ? 21 :
i[22] ? 22 :
i[23] ? 23 :
i[24] ? 24 :
i[25] ? 25 :
i[26] ? 26 :
i[27] ? 27 :
i[28] ? 28 :
i[29] ? 29 :
i[30] ? 30 :
31;
endmodule

This is a prime example of why I'm getting bogged down. This compiles
to a ladder of 32 multiplexers. Looking at the RTL produced I was
horrified thinking of the delay through all that lot, put lot's of
effort into studying how to do a fast priority encoder, useing a tree
design again, and when I coded it up it was slower than this one. Damb
and blast it. :(

Still I'm enjoying learning and waching the LED's flash on my FPGA board. :)

Chris
 
On Thu, 7 Apr 2011 02:44:17 +0100, Chris Hinsley
<chris.hinsley@gmail.com> wrote:

As another example, I'm struggling with expressing this priority
encoder (leading zero counter) form as a macro. So how would I
paramaterise it as a single module for different widths ? Don't worry
I'm not going to post all my design here and get you to do the job ! ;)

module pri_enc5 (i, o);
input [31:0] i;
output [4:0] o;
assign o =
How about

module pri_enc(i,o);
parameter len=31;
input [len:p] i;
output [log2(len):0] o;

reg [log2(len):0] o;
always @(*)
o = 0;
while (!in[o] && o <= len) o = o + 1;
endmodule


This is not tested at all so it may not even compile but I think it
should start you with something reasonable.
--
Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services

http://www.dspia.com
 
On 04/07/2011 12:28 AM, Jonathan Bromley wrote:

module ENCODER (i, o);
parameter OUTPUT_BITS = 2;
localparam INPUT_BITS = 1<< OUTPUT_BITS;
input [INPUT_BITS-1:0] i;
output [OUTPUT_BITS-1:0] o;
reg [OUTPUT_BITS-1:0] o;

always @i begin : EncoderLogic
integer out_bit, in_bit, start;
reg or_acc;
for (out_bit = 0; out_bit< OUTPUT_BITS; out_bit=out_bit+1) begin
or_acc = 0;
for (start = 1<<out_bit;
start< INPUT_BITS;
start = start + (1<<out_bit) ) begin
for (in_bit = start;
in_bit< (start + (1<<out_bit));
in_bit = in_bit + 1 ) begin
or_acc = or_acc | i[in_bit];
end
end
o[out_bit] = or_acc;
end
end
endmodule

Now you can just instantiate it as you need it...

ENCODER #3 enc8to3(.i(vec8), .o(vec3));

But there's also a cute trick for the calculation:

always @i begin: EncoderLogic2
reg or_acc;
integer in_bit, out_bit;
for (out_bit=0; out_bit<OUTPUT_BITS; out_bit=out_bit+1) begin
or_acc = 0;
for (in_bit=0; in_bit<INPUT_BITS; in_bit=in_bit+1)
if (in_bit& (1<<out_bit))
or_acc = or_acc | i[in_bit];
end
end
In my mind, an encoder is a module that, assuming an input
with a single bit set, outputs that bit position.

In my quick simulation, EncoderLogic doesn't behave like
that. EncoderLogic2 does not work - it misses the output
assignment. But when I make the "obvious" assignemnt, it
behaves different from EncoderLogic, and also not
as an encoder.

Perhaps something else is meant with "encoder", but if
my definition is correct, the solutions so far are way
too complicated.

Jan

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Python as a HDL: http://www.myhdl.org
VHDL development, the modern way: http://www.sigasi.com
Analog design automation: http://www.mephisto-da.com
World-class digital design: http://www.easics.com
 
On Wed, 06 Apr 2011 17:18:30 -0700, "Cary R." <no-spam@host.spam>
wrote:

On 4/6/2011 3:28 PM, Jonathan Bromley wrote:

2**N yields a real result

If that's what you are getting then your vendor is being lazy and not
following the standard. They are likely using the C pow routine instead
taking the time to write the appropriate bit based power routine. The
power operator should only return a real value if either of the operands
are real.
Not true in Verilog, I'm afraid. The standard says you get
an integer result only when both operands are UNSIGNED
integers, and of course the undecorated integer literal 2
is SIGNED (and the parameter N is likely an integer, and
therefore signed, too). You could use
'd2**$unsigned(N)
and get an integer result, but that's a tad indigestible!

The reason I would favor 1<<N over 2**N is that it is more portable.
I think ** is widely supported now. But 1<<N has a certain
elegance to it, and relates well to its design intent in most
cases, so I still think it's preferable.

I also personally like to use $clog2() where appropriate.
Agreed, and again that's widely supported too. Usual gotchas
about the corner cases when you're using it to mean "how
many bits do I need to represent this integer?".
--
Jonathan Bromley
 
Muzaffer Kal <kal@dspia.com> wrote:
On Thu, 7 Apr 2011 02:44:17 +0100, Chris Hinsley
chris.hinsley@gmail.com> wrote:

As another example, I'm struggling with expressing this priority
encoder (leading zero counter) form as a macro. So how would I
paramaterise it as a single module for different widths ? Don't worry
I'm not going to post all my design here and get you to do the job ! ;)

module pri_enc(i,o);
parameter len=31;
input [len:p] i;
output [log2(len):0] o;

reg [log2(len):0] o;
always @(*)
o = 0;
while (!in[o] && o <= len) o = o + 1;
endmodule
It looks like it would generate a ripple down encoder, but
I believe the tools can figure it out and optimize it.

Last time I designed one for a real project, it had to be
pipelined to be fast enough. I believe it was four inputs
for the first level, and then nine of those, followed by
the appropriate second level encoder. (I had 36 inputs.)

-- glen
 
On Thu, 07 Apr 2011 09:37:39 +0200, Jan Decaluwe wrote:

In my mind, an encoder is a module that, assuming an input
with a single bit set, outputs that bit position.
Moi aussi.

In my quick simulation, EncoderLogic doesn't behave like
that.
That'll be because I got it wrong, then :) Actually
I copied the OP's error: the step size of his for(start)
loop should be twice as large as we wrote. With that
small fix, it works correctly.

EncoderLogic2 does not work - it misses the output
assignment.
oops.

But when I make the "obvious" assignemnt, it
behaves different from EncoderLogic, and also not
as an encoder.
hmmm, it works for me, with this correction
(and some added explanation):

for (out_bit=0; out_bit<OUTPUT_BITS; out_bit=out_bit+1) begin
// this variable will accumulate the OR result
// fur the output bit chosen by this trip of the loop
or_acc = 0;
for (in_bit=0; in_bit<INPUT_BITS; in_bit=in_bit+1)
// If bit #(out_bit) is set in the binary
// representation of "in_bit", we want this
// input bit included in the OR calculation.
if (in_bit & (1<<out_bit))
or_acc = or_acc | i[in_bit];
// deliver this bit's final result to output word
o[out_bit] = or_acc; // ** this line was missing
end

if my definition is correct, the solutions
so far are way too complicated.
Sure, but I was trying (unsuccessfully!!!) to get
across the idea that you could rather explicitly
represent an OR tree encoder without resorting
to generates. Looks as though I should have
been quite a lot more careful, though.

I don't really think my second solution is "too
complicated". There's a much more elegant
formulation using either recursive instancing
of modules, or recursive calls to a function,
but neither of those works well in Verilog :-(
--
Jonathan Bromley
 
On Thu, 7 Apr 2011 02:44:17 +0100, Chris Hinsley wrote:

As another example, I'm struggling with expressing this priority
encoder (leading zero counter) form as a macro. So how would I
paramaterise it as a single module for different widths ? Don't worry
I'm not going to post all my design here and get you to do the job ! ;)

module pri_enc5 (i, o);
input [31:0] i;
output [4:0] o;
assign o =
i[0] ? 0 :
snip, snip, snip
i[29] ? 29 :
i[30] ? 30 :
31;
endmodule
If you're OK with the long mux chain, then this is rather easy
to code:

module pri_enc (i, o);
parameter OUT_BITS = 2;
localparam IN_BITS = 1<<OUT_BITS;
input [IN_BITS-1:0] i;
output [OUT_BITS-1:0] o;
reg [OUT_BITS-1:0] o;

always @i begin: ripple_encoder
integer j;
o = 0; // the result if no input bits are set
for (j=IN_BITS-1; j>=0; j=j-1)
if (i[j])
o = j;
end
endmodule

Note the loop scans left-to-right (MSB down to LSB)
so that each 1 bit further to the right will cause 'o'
to be rewritten, and therefore it's the rightmost 1 bit
whose position gets reported on the output.

As usual, errors may have been introduced by my
carelessness and stupidity - sorry Jan ;)

This is a prime example of why I'm getting bogged down. This compiles
to a ladder of 32 multiplexers. Looking at the RTL produced I was
horrified thinking of the delay through all that lot, put lot's of
effort into studying how to do a fast priority encoder, useing a tree
design again, and when I coded it up it was slower than this one. Damb
and blast it. :(
Right, but the mux chain perhaps got synthesised into the
ripple-carry chain of your FPGA, which is likely the fastest
implementation for anything up to about 64 bits. However,
I *have* seen significant improvements in both area and speed
when writing very big priority encoders as a tree structure,
possibly with pipeline registers to get even more throughput.
So you're absolutely right to worry about these things - but
a sense of perspective is important too. In almost every
realistic case, as Jan Decaluwe will doubtless agree, the
best thing to do is to trust your synthesis tool, write the
code in the way that seems clearest and most maintainable
to you, and let the optimisations get on with their job.
But it's also important to be alive to the possibility that
you will, rarely, find important situations where heavy-duty
manual intervention is needed. You'll find other folk here
who use that as justification for describing everything in
structural detail; that's an alternative point of view that
doesn't work for me.

Still I'm enjoying learning and waching the LED's flash on my FPGA board. :)
That's the most important bit :)
--
Jonathan Bromley
 
In my mind, an encoder is a module that, assuming an input
with a single bit set, outputs that bit position.
Exactly what I'm after.

In my quick simulation, EncoderLogic doesn't behave like
that. EncoderLogic2 does not work - it misses the output
assignment. But when I make the "obvious" assignemnt, it
behaves different from EncoderLogic, and also not
as an encoder.

Perhaps something else is meant with "encoder", but if
my definition is correct, the solutions so far are way
too complicated.

Jan
And I'm having problems getting a paramatized form that produces what I
want, ie somthing that dosn't suck. :)

My macro did produce the correct result, and did it in what I thought
was an efficent way useing 'or' gates. If you have a simpler way to do
it I'd be grateful to see it.

Chris
 
In my quick simulation, EncoderLogic doesn't behave like
that.

That'll be because I got it wrong, then :) Actually
I copied the OP's error: the step size of his for(start)
loop should be twice as large as we wrote. With that
small fix, it works correctly.
Errm, I didn't think my original macro was wrong, just not desireable. :)

Here's the latest version just as a reminder.

`define ENCODER(NAME, BITS) \
module NAME(i, o); \
parameter WIDTH = (1 << BITS); \
input [(WIDTH - 1):0] i; \
output [(BITS - 1):0] o; \
reg [(BITS - 1):0] r; \
genvar row; \
generate \
begin \
for (row = 0; row < BITS; row += 1) \
begin : G \
always @(*) \
begin \
integer start; \
integer range; \
r[row] = 0; \
for (start = (1 << row); start < WIDTH; start += (2 << row) \
begin \
for (range = start; range < (start + (1 << row)); range += 1) \
begin \
r[row] |= i[range]; \
end \
end \
end \
assign o[row] = r[row]; \
end \
end \
endgenerate \
endmodule

`ENCODER(enc1, 1)
`ENCODER(enc2, 2)
`ENCODER(enc3, 3)
`ENCODER(enc4, 4)
`ENCODER(enc5, 5)

Best regards

Chris
 
On 04/07/2011 10:54 AM, Jonathan Bromley wrote:
On Thu, 07 Apr 2011 09:37:39 +0200, Jan Decaluwe wrote:

In my mind, an encoder is a module that, assuming an input
with a single bit set, outputs that bit position.

Moi aussi.

In my quick simulation, EncoderLogic doesn't behave like
that.

That'll be because I got it wrong, then :) Actually
I copied the OP's error: the step size of his for(start)
loop should be twice as large as we wrote. With that
small fix, it works correctly.

EncoderLogic2 does not work - it misses the output
assignment.

oops.

But when I make the "obvious" assignemnt, it
behaves different from EncoderLogic, and also not
as an encoder.

hmmm, it works for me, with this correction
(and some added explanation):

for (out_bit=0; out_bit<OUTPUT_BITS; out_bit=out_bit+1) begin
// this variable will accumulate the OR result
// fur the output bit chosen by this trip of the loop
or_acc = 0;
for (in_bit=0; in_bit<INPUT_BITS; in_bit=in_bit+1)
// If bit #(out_bit) is set in the binary
// representation of "in_bit", we want this
// input bit included in the OR calculation.
if (in_bit& (1<<out_bit))
or_acc = or_acc | i[in_bit];
// deliver this bit's final result to output word
o[out_bit] = or_acc; // ** this line was missing
end
Agreed.

if my definition is correct, the solutions
so far are way too complicated.

Sure, but I was trying (unsuccessfully!!!) to get
across the idea that you could rather explicitly
represent an OR tree encoder without resorting
to generates. Looks as though I should have
been quite a lot more careful, though.

I don't really think my second solution is "too
complicated". There's a much more elegant
formulation using either recursive instancing
of modules, or recursive calls to a function,
but neither of those works well in Verilog :-(
Recently there was a similar discussion on the
MyHDL newsgroup, and also there it seemed like
a complexity contest of bit fiddling :)

That may be instructive, but I believe it is
meaningfull to stress that in many cases,
simpler HDL patterns can be used without
loss of implementation efficiency.

This was my proposed solution including MyHDL source,
testbench and converted Verilog and VHDL.

Design
------
from myhdl import *

def encodeHotBit(iv, ov, Width=4):

@always_comb
def logic():
ov.next = 0 # avoid state
for i in range(len(iv)):
if iv == 1:
ov.next = i

return logic

Conversion code & Test bench
----------------------------
Width=4
iv = Signal(intbv(0)[Width**2:])
ov = Signal(intbv(0)[Width:])

toVerilog(encodeHotBit, iv, ov, Width)
toVHDL(encodeHotBit, iv, ov, Width)

def tb():
dut = encodeHotBit(iv, ov, Width)
@instance
def check():
yield delay(10)
for i in range(16):
iv.next = 2**i
yield delay(10)
print i, iv, ov
assert 2**ov == iv
return dut, check

# the entry point for the py.test unit test framework
def test_ehb():
sim = Simulation(tb())
sim.run()

test_ehb()

Converted Verilog
-----------------
// File: encodeHotBit.v
// Generated by MyHDL 0.7
// Date: Thu Apr 7 11:17:45 2011

`timescale 1ns/10ps

module encodeHotBit (
iv,
ov
);

input [15:0] iv;
output [3:0] ov;
reg [3:0] ov;

always @(iv) begin: ENCODEHOTBIT_LOGIC
integer i;
ov = 0;
for (i=0; i<16; i=i+1) begin
if ((iv == 1)) begin
ov = i;
end
end
end

endmodule

Converted VHDL
--------------
-- File: encodeHotBit.vhd
-- Generated by MyHDL 0.7
-- Date: Thu Apr 7 11:17:45 2011

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;

use work.pck_myhdl_07.all;

entity encodeHotBit is
port (
iv: in unsigned(15 downto 0);
ov: out unsigned(3 downto 0)
);
end entity encodeHotBit;

architecture MyHDL of encodeHotBit is

begin

ENCODEHOTBIT_LOGIC: process (iv) is
begin
ov <= "0000";
for i in 0 to 16-1 loop
if (iv(i) = '1') then
ov <= to_unsigned(i, 4);
end if;
end loop;
end process ENCODEHOTBIT_LOGIC;

end architecture MyHDL;




--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Python as a HDL: http://www.myhdl.org
VHDL development, the modern way: http://www.sigasi.com
Analog design automation: http://www.mephisto-da.com
World-class digital design: http://www.easics.com
 
On Thu, 7 Apr 2011 10:56:33 +0100, Chris Hinsley
<chris.hinsley@gmail.com> wrote:

In my quick simulation, EncoderLogic doesn't behave like
that.

That'll be because I got it wrong, then :) Actually
I copied the OP's error: the step size of his for(start)
loop should be twice as large as we wrote. With that
small fix, it works correctly.

Errm, I didn't think my original macro was wrong, just not desireable. :)
Oh, I looked again and you were indeed correct. I had
mis-transcribed this line....
for (start = (2 ** row); start < WIDTH; start = start + (2 ** row)
+ (2 ** row))
because of the line-wrap (newsreader artefact). My bad.

Here's the latest version just as a reminder.
[...]
for (start = (1 << row); start < WIDTH; start += (2 << row) \
yeah, that's where I got to as well.

Anyway, back to the key point - which I don't think I ever
stated clearly: In design code, if you have a procedural for-loop
whose loop bounds are constant expressions, the synth tool can
unroll the loop - giving you the same effect as a generate loop,
but in a way that's usually far more convenient. There are
times when generate is exactly what you need and there is no
reasonable alternative, but those times are quite rare.
Procedural loops are almost always easier to understand,
and they are usually faster to simulate as well (although
that's irrelevant for small examples like the ones we've
been discussing).
--
Jonathan Bromley
 

Welcome to EDABoard.com

Sponsor

Back
Top