How to write a good verilog code that can be synthesized as

L

Lee

Guest
Dear all,

When I write verilog code, do I need to consider the hardware
structure?

For example,
one inverter can be writen in two ways:
1.if(a == 1)
output <= 0;
else
output <= 1;

2. assign output = ~a;

There is no any difference between them in behavior level.

In case 2, the synthesis tool can synthesize the code into an
inverter. What happens for case 1?Can the synthesis tool generate the
same circuit?

Above is just an example. In the digital design, some situations can
be more complicated. If I don't consider hardware structure much, I
can write the code in shorter time. If I consider hardware and
simplify it, I can write a short code in longer time and maybe the
better circuit.

Which one is better?Can I just depend on synthesis tools to get a good
design?

Thanks,

Adrian
 
If your design pushes performance or you need to conserve on area, coding
with the hardware architecture in mind is critical. You also, however, need
to know what the tools will do with your code. I would expect both your
examples to be synthesized into inverters without issue.

Knowing where your silicon features can help you - such as dedicated carry
chains, muxes, memories, multipliers - will help to build more efficient
code when you're targeting a specific device. Usually simpler is better but
you can find some long paths in your design that seem arbitrarily longer
than needed, requiring a little "nudge" to get the synthesizer to go down
the right path either by changing your code, constraints, or adding some
synthesis directives.

"Lee" <yxl4444@louisiana.edu> wrote in message
news:5c3c88bc.0406080843.6dd589fb@posting.google.com...
Dear all,

When I write verilog code, do I need to consider the hardware
structure?

For example,
one inverter can be writen in two ways:
1.if(a == 1)
output <= 0;
else
output <= 1;

2. assign output = ~a;

There is no any difference between them in behavior level.

In case 2, the synthesis tool can synthesize the code into an
inverter. What happens for case 1?Can the synthesis tool generate the
same circuit?

Above is just an example. In the digital design, some situations can
be more complicated. If I don't consider hardware structure much, I
can write the code in shorter time. If I consider hardware and
simplify it, I can write a short code in longer time and maybe the
better circuit.

Which one is better?Can I just depend on synthesis tools to get a good
design?

Thanks,

Adrian
 
Asking how a specific statement is synthesized into gates is a
lot like asking how a compiler generates machine code for a
given statement. You can make some blanket statements, so long
as you realize that the compiler can implement your desired
behavior any d[arn] way it pleases. Synthesis tool writers
reserve the right to try to write code that is better then
the obvious translation.

Lee wrote:

For example,
one inverter can be writen in two ways:
1.if(a == 1)
output <= 0;
else
output <= 1;
Assuming you meant and always @* in front of that, the simplistic
synthesis of this is to make a MUX where "a" is connected to the
select and the inputs are connected to 0 and 1. Constant propagation
should then notice that there are a lot of constant inputs to the
mux and do something better. For example a NOT gate. This should
be well within the state of the art.

2. assign output = ~a;
Even the dumbest synthesizer will make a NOT gate for this.
Obviously.

But then, what actually becomes of your NOT gate may depend on
the context in which it is placed. Optimization may elide them
or blend them into other expressions that contain your gate.
--
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
 
When I write verilog code, do I need to consider the hardware
structure?
*Need* is a strong word, but you *should* almost always consider it,
and in some cases you *must* consider it. Moreover, one also needs
to be a little careful with *simulation* too, when writing Verilog.
Your inverter circuit is a good example.

if(a == 1)
output <= 0;
else
output <= 1;
Depending on context, this may not generate an inverter. For example:

always@(posedge clock) // in this context probably generates a flip-flop
if(a == 1)
output <= 0;
else
output <= 1;

The following is a case, where the code doesn't simulate properly, if
you are intending a simple inverter, but the driver is a tri-state bus
that may go undriven or have contention.

assign a = b ? 1 : 1'bz; // tristate bus desired
always@(a)
if(a == 1)
output <= 0;
else // goes here??? if bus is undriven or driven by conflicting drivers (z or x)
output <= 1;

Note, that you have also used non-blocking assigns to drive the
output. This puts a small delay between the time "a" changes and the
time output changes (which is what prompted me to write the flip-flop
case, as local coding conventions drive only flip-flops with
non-blocking assigns and combinatorial code with blocking assigns).

However, these are just minor glitches in Verilog. You are not likely
to write an "if" when you mean a "not". That isn't what is likely to
get you in trouble. The real place where "thinking" in terms of
hardware applies is in the use of "arrays" (or memories). This is the
place where most people start writing "programs" that have no
connection to real hardware. If you find yourself writing a loop over
an "array", then you probably have designed a non-hardware solution.

// what hardware will implement this?
always@(posedge clk)
begin
j <= max; // j looks like a flip-flop output
for (i = 0; i < max; i=i+1 )
begin
if (a == value) then j <= i; // a had better be aset of flip-flops
// and not a 1-port ram
end;
end;

My off-hand analysis suggests that the code could be implemented by a
set of flip-flops tied to a set of comparators (plus and gates that
mask of undesired outputs greter than max, which can be calculated by
a decoder) that drive a priority encoder whose output is fed into
flip-flop--however, I doubt that there is a synthesizer that could
find that solution, nor is one likely to see such a synthesizer soon.

If that is the hardware desired, one needs to write it out
explicitly, something like the following:

wire [3:0] max;
reg [3:0] a[14:0]; // flip flop array (hopefully!)
wire [3:0] value;
reg [15:0] max_decode; // internal signal
reg [15:0] comp; // internal signal
reg [3:0] enocode; // internal signal
reg [3:0] j; // flip-flop output

// decoder

always @(max)
case max of
0: max_decode = 16'b0000000000000000;
1: max_decode = 16'b0000000000000001;
2: max_decode = 16'b0000000000000011;
3: max_decode = 16'b0000000000000111;
4: max_decode = 16'b0000000000001111;
...
15: max_decode = 16'b1111111111111111;
endcase


// comparitors and associated logic

always@(value or a[0] or max_decode[0])
comp[0] = max_decode[0] && (value == a[0]);

always@(value or a[1] or max_decode[1])
comp[1] = max_decode[1] && (value == a[1]);

always@(value or a[2] or max_decode[2])
comp[2] = max_decode[2] && (value == a[2]);

...

always@(value or a[15] or max_decode[15])
comp[15] = max_decode[15] && (value == a[15]);

// priority encoder

always@(comp)
begin
case(comp) of
1'bzzzzzzzzzzzzzzz1: encode = 0; // value found at loop iter 1
1'bzzzzzzzzzzzzzz10: encode = 1; // value found at loop iter 2
1'bzzzzzzzzzzzzz100: encode = 2; // value found at loop iter 3
...
1'b1000000000000000: encode = 15; // value found at loop iter 16
1'b0000000000000000: encode = max; // value not found
end

// flip-flop

always@(posedge clk)
j <= encode;

Now, why is this better (besides the fact that it might synthesize)?
Because, it makes one realize how expensive the resulting hardware is
given how much work there is to getting this code right--and it may
not be right. A simple array look up is not cheap in hardware. It
takes a lot of circuitry. If one realizes that, one might pick a
different overall solution to the problem that doesn't involve the
array lookup. To me that's what "thinking in hardware" means.

Synthesizers are impressive pieces of software. They can find
interesting solutions to implementing bits of Verilog that one might
miss, especially if I understand right when it comes to implementing
state machines and finding clever pieces of combinatorial logic that
do what you mean.

However, they are not general purpose compilers though and can't turn
a C-style algorithm into a piece of hardware except in limited cases.
The underlying models are just too different. C gets much of its
functionality from the sequence in which operations are performed.
Hardware gets its functionality from parallelism. It is just hard to
go efficiently from sequential to parallel or vice-versa--and there is
no general solution to that problem.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 
Lee wrote:

When I write verilog code, do I need to consider the hardware
structure?

For example,
one inverter can be writen in two ways:
1.if(a == 1)
output <= 0;
else
output <= 1;

2. assign output = ~a;

There is no any difference between them in behavior level.
Well, consider that for FPGA (you don't say the target
architecture) all logic is a look up table, so they probably
synthesize the same. FPGA compilers have extra logic to
remove unneeded inverters from the logic.

Consider that 2) takes much less writing on your part,
I would choose it for that reason.

-- glen
 
Tommy corrected my post with a simpler version that assumed that max
was a constant (in particular 16), which is not an unreasonable
assumpition, at least for most cases. And then asked the question
as to what kind of advise was I offering?

It's the kind of advise that takes into account the value of "max"
which is also an input to the C-style algorithm (and a variable one
not a constant one), and which is what makes the result complicated.
The original algorithm didn't search all of "a", but only those
elements up to the "max" element, and if one wants to preserve the
correctness of the output, then one must include all the factors.

However, Tommy's point is valid and there might be a simpler and
better "one-line" solution. One can eliminate the decoder stage,
because it isn't necessary. It was merely a figment of my translation
of the software solution into a hardware one.

j <= ((value == a[15]) && (15 < max)) ? 15 :
((value == a[14]) && (14 < max)) ? 14 :
...
((value == a[ 0]) && (0 < max)) ? 0 :
max;

However, note that this version has 32 comparitors, rather than a
decoder plus 16 comparitors. If one is thinking in hardware, one
probably knows which one will takes up less space, run-faster, or
whatever criteria is important in one's circuit. I implicitly assumed
that a decoder would use less resources than the additional
comparitors.

And, that is still my point. One might be able to simply look at the
loop and generate the "one-line" version--I was almost embarrassed
when I saw it, before realizing that it just doubled the number of
comparitors on the chip. So, if the decoder version is the better
hardware version, that isn't apparent from the loop. And, in either
case, one wants to understand the approximate hardware that's one's
constructs will generate.

Now, in the case where max is a constant, the simple loop case and
simple translation works, because the extra comparitors go away.
However, synthesizers can handle constant loop bounds.

It is the case were variables are used as loop bounds and those
variables can get updated in the code, where one has non-hardware
solutions. Moreover, if one is thinking in hardware, one will know
which inputs are variable and which are constants. In this example,
how many comparitors are expected. If the number is a constant, then
the bounds on the loop are constant. If the number of comparisons
varies depending on the input, then the algorithm is probably a
software one rather than one which can be realized directly in
hardware.

Note, in both "hardware" versions the loop was unrolled a specific
number of times. If the value of max was able to be upto 32768, then
you probably wouldn't want 32768 comparitors (or the worse 65536 for
the doubled case) on your circuit. And, that's the kind of analysis
that only "thinking in hardware" will resolve. I picked the value of
max to be 16, specifically because I was certain that the resulting
circuit would be realizable. However, I have no obvious direct
hardware solution for the 32768 case. Yet, both would use the same
C-style loop.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 

Welcome to EDABoard.com

Sponsor

Back
Top