A couple of controversial(?) improvement proposals for Veril

Andreas Ehliar · Dec 28, 2008

I have a few ideas for features which would be nice to have in Verilog
(or SystemVerilog). Not that I really expect anything to happen from
posting the ideas here, but you never know.

I expect that the first proposal is not going to be very controversial
and would immediately be useful in many cases. However, the remaining
proposals are fairly "far out". Feel free to post upset comments if you
think that they would be of use to you or if you are offended by merely
reading about such unholy ideas

My main reason for thinking about proposal 2-4 is a feeling that you
are often constrained by the details when using a HDL language. In this
case the detail of how many clock cycles a certain task will take. While
this is of course a strength of the HDL language in many cases it is also
a drawback. In many cases it doesn't matter if it takes 150 or 180 cycles
to do a certain task as long as it gets done. In many cases it doesn't matter
if a pipeline contains 50 or 80 stages as long as the throughput is there.
But if we write code in HDL we do have to specify these numbers. In many
cases we also have to iterate over the design to find the value for these
numbers that will achieve a minimal area while barely meeting the
required timing constraints.

1. Automatic pipelining
-----------------------

reg [31:0] a;
pipeline(clk,3) begin
a <= b*c;
end

This would be identical to the following piece of code:

reg [31:0] pipeline_reg1,pipeline_reg2,a;
always @(posedge clk) begin
pipeline_reg1 <= b*c;
pipeline_reg2 <= pipeline_reg1;
a <= pipeline_reg2;
end

Rationale: Creating extra regs for pipelining is just tedious
work. Changing the length of the pipeline is also just tedious work.
It is surprising that SystemVerilog doesn't support anything like
this even though pipelining is arguably one of the most important
tools in a designer's toolbox.

Disadvantages: You have more control over what the tool does if you
specify the pipeline yourself instead of trusting the tool to do
retiming properly.

2. Automatic pipelinining with optimum number of pipeline stages
----------------------------------------------------------------

reg [31:0] a;
pipeline(clk,max_regs(3)) begin : TEST // Haven't really thought through
// the syntax of the max_regs(3) here
a <= b*c;
end

This would generate a pipeline with a maximum of three pipeline
registers. The number of registers would be decided by the synthesis
tool when evaluating the area constraints and timing constraints of
the design.

The example above would probably make little sense in such a
situation however since the rest of the design would have no idea
about the length of the pipeline. It would therefore be nice if
it was also possible to figure out the pipeline length either by
using some sort of enable signal which is propagated through the
pipeline or by asking the block abouts its pipeline length. The latter
could be implemented by having the synthesis tool implement a
constant named TEST.PIPELINE_LENGTH (or some similar name) which
could be accessed in the rest of the module.

Rationale: Figuring out the optimal length of a pipeline for
a certain timing constraint can be very tedious. In many situations
the exact length of a pipeline is not going to be very important as
long as the throughput is high enough.

Disadvantage: Verification will be hard since you would have to
figure out a way to verify all different pipeline lengths.

3. Automatic FSM creation
-------------------------

serial(clk,16,enable) begin
a <= b*c;
end

This will create a multiplier which will output a result
16 clock cycles after enable is asserted. (The syntheis tool
could now create a bit serial multiplier here.) It is assumed that
b and c will not changed during the calculation.

Multiplication is perhaps not the most interesting example here though.
I'm instead thinking of a case where a rather complicated algorithm
like a 2D DCT could be written as sequential code and the synthesis
tool would do something clever with it like create an FSM with resource
sharing of multipliers/adders/memories/etc.

Rationale: Enables us to create logic using sequential programming
which will simplify things a lot in many cases.

Disadvantage: This is probably a really hard problem to solve
in an optimal way for the synthesis tool...

4. Automatic FSM creation with optimum delay
--------------------------------------------

serial(clk,max_cycles(16),enable) begin
a <= b*c;
finished <= enable;
end

Similar to the previous proposal except that this one will take a
maximum of 16 cycles. If timing constraints and area constraints can
be fulfilled using fewer cycles it will be.

Rationale: Same as for 3. Plus the fact that figuring out the optimal
number of cycles for different algorithms will not be easy.

Disadvantage: Makes verification hard. Will be even harder than 3
on the synthesis tool.

/Andreas

Jonathan Bromley · Dec 28, 2008

On Sun, 28 Dec 2008 05:06:10 +0000 (UTC), Andreas Ehliar wrote:

I have a few ideas for features which would be nice to have in Verilog
(or SystemVerilog). Not that I really expect anything to happen from
posting the ideas here, but you never know.

Go on, sometimes New Year wishes come true

Feel free to post upset comments if you
think that they would be of use to you or if you are offended by merely
reading about such unholy ideas

What, on Usenet? Who would ever consider
such rude behaviour as to post an upset comment?.....
So, here's a promise: only one "upset comment" in what follows.

My main reason for thinking about proposal 2-4 is a feeling that you
are often constrained by the details when using a HDL language.

No dispute; that's the nature of the beast. HDLs specify in
painstaking detail exactly what happens at each nanosecond and
on each clock edge.

In this
case the detail of how many clock cycles a certain task will take. While
this is of course a strength of the HDL language in many cases it is also
a drawback. In many cases it doesn't matter if it takes 150 or 180 cycles
to do a certain task as long as it gets done. In many cases it doesn't matter
if a pipeline contains 50 or 80 stages as long as the throughput is there.
But if we write code in HDL we do have to specify these numbers. In many
cases we also have to iterate over the design to find the value for these
numbers that will achieve a minimal area while barely meeting the
required timing constraints.

Before looking at the details I think it's worth pointing out
that your proposals are very design-centric. By contrast,
Verilog is really a _simulation_ language and it makes no
sense at all to specify a language feature that cannot be
simulated in a deterministic manner (although, to be fair,
there have been some proposed enhancements to SystemVerilog
assertions that come close to breaching that barrier). So
I, and probably the vendor community, would baulk at any
construct that delivered a non-deterministic delay whose
value is to be established by a design-synthesis tool.

Anyway, now I've got that off my chest, let's look at
your proposals...

1. Automatic pipelining
-----------------------

reg [31:0] a;
pipeline(clk,3) begin
a <= b*c;
end

There is a perfectly good Verilog construct that
already implements exactly what you ask for:

always @(posedge clk)
a <= repeat(2) @(posedge clk) b*c;

The "repeat(2) @(posedge clk)" intra-assignment delay
postpones the update of 'a' for two additional clock
cycles of transport delay. Bingo, a pipeline.

Admittedly I know of no synthesis tool that understands
this construct, but that's something to take up with
the synthesis vendors, not something to add to Verilog.
The nifty thing about it is that no additional registers
are created (the in-flight pipeline stage values are
held internally in the simulator's future-event queue)
and therefore a synthesis tool could organize the pipeline
in any way it chose, and still meet the requirements
imposed by the simulation model. Many synth tools already
do this if you specify a bunch of delay stages before or
after the arithmetic operation (register retiming), so
I think it is probably implementable. I would vigorously
support you in efforts to bring this feature into the
synthesisable subset.

2. Automatic pipelinining with optimum number of pipeline stages
----------------------------------------------------------------

reg [31:0] a;
pipeline(clk,max_regs(3)) begin : TEST // Haven't really thought through
// the syntax of the max_regs(3) here
a <= b*c;
end

This would generate a pipeline with a maximum of three pipeline
registers. The number of registers would be decided by the synthesis
tool when evaluating the area constraints and timing constraints of
the design.

But what should a SIMULATOR do with this? That's the key
question, and it is always a problem for behavioural
synthesis tools of any kind that are able to make
tradeoffs of latency against area or clock frequency.
There's nothing to simulate until synthesis has created
a netlist.

You could consider parameterising the pipeline delay
in my previous example:

parameter N;
always @(posedge clk) a <= repeat(N) @(posedge clk) b*c;

and then find some way to (1) simulate with a range of
different values of N, and (2) encourage synthesis to
choose an appropriate value somewhere in that range.
SystemVerilog assertions provide syntax for delay ranges,
and you could imagine asserting this property:

property pipe_mult(min, max);
logic [31:0] bb, cc;
@(posedge clk)
(1'b1, bb = b, cc = c) // capture operands on every clk
|->
##[min:max] a == bb * cc; // expect appropriate output
endproperty

and expecting a sufficiently smart synthesis tool to infer
from the assertion that the design must be a pipelined
multiplier whose pipeline delay is somewhere in the range
min to max. I believe there has been some work on this
idea of synthesis from assertions that specify design
intent, but I'm not aware of any details.

[exposing synthesis results to simulation]

could be implemented by having the synthesis tool implement a
constant named TEST.PIPELINE_LENGTH (or some similar name) which
could be accessed in the rest of the module.

Oooooh, I really really really don't like that idea.
It stinks of kludge. Surely it would make more sense
to have the pipe length as a parameter, and get the
synth tool to write out a defparam or somesuch to
back-annotate that parameter value into the original
code. But I still deeply dislike the idea that I
can't simulate at all until synthesis is done.

Rationale: Figuring out the optimal length of a pipeline for
a certain timing constraint can be very tedious. In many situations
the exact length of a pipeline is not going to be very important as
long as the throughput is high enough.

Absolutely agreed, and that's precisely the kind of thing
that C-to-gates and behavioural synthesis tools try to do.

Disadvantage: Verification will be hard since you would have to
figure out a way to verify all different pipeline lengths.

You said it

See the assertion idea.

3. Automatic FSM creation
-------------------------

serial(clk,16,enable) begin
a <= b*c;
end

No, sorry, you're losing me now. This sounds like
you really want some kind of design creation script.
I think you could easily get that kind of effect with
smartly-written Python code for MyHDL.

We already have more than enough special-purpose kruft
in SystemVerilog (always_latch, anyone? unique if?).
What you're suggesting takes it several stages too far
for my taste.

Multiplication is perhaps not the most interesting example here though.
I'm instead thinking of a case where a rather complicated algorithm
like a 2D DCT could be written as sequential code and the synthesis
tool would do something clever with it like create an FSM with resource
sharing of multipliers/adders/memories/etc.

Again, look at what the C-to-gates and behavioural synth
tools have been doing for years. See also Bluespec.

Rationale: Enables us to create logic using sequential programming
which will simplify things a lot in many cases.

Yes, that's what the vendors of all those tools say.

Disadvantage: This is probably a really hard problem to solve
in an optimal way for the synthesis tool...

But, for many realistic cases, tools exist that do a useful
job that is "good enough". Right now they either use
bastardised languages (Bluespec) or pragmas/directives
layered on top of Verilog, or else they abandon cycle-
accurate simulation altogether and just go all-out for
behavioural synthesis from sequential algorithms
(C-to-gates tools). All those things have their place,
but none makes sense within a language whose prime reason
for existence is as an event-driven simulator.

4. Automatic FSM creation with optimum delay

Same responses, _a fortiori_.

Happy New Year!
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andreas Ehliar · Dec 28, 2008

On 2008-12-28, Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

There is a perfectly good Verilog construct that
already implements exactly what you ask for:

always @(posedge clk)
a <= repeat(2) @(posedge clk) b*c;

The "repeat(2) @(posedge clk)" intra-assignment delay
postpones the update of 'a' for two additional clock
cycles of transport delay. Bingo, a pipeline.

Interesting, I was trying to think of a way to do this with the repeat
statement but never realized that it could be used inside an
intra-assignment delay. I'll add this trick to my toolbox, at least
for simulation

It will look quite messy when you are assigning lots of signals in an
always block however. But I could live with that for the ease of
pipeline parameterization that this would bring.

Another solution to this would be to modify verilog-mode for emacs to
support this with some sort of /*AUTO*/ mode. By rewriting something
like this:

reg [31:0] a;
always @(posedge clk) begin
a <= /*AUTOREPEAT(clk,2) b*c */;
// assignment inside AUTOREPEAT comment so that it will fail to compile if
// AUTOs are not expanded by emacs
end

Into this:

reg [31:0] a,_pipe1,_pipe2;
always @(posedge clk) begin
a <= /*AUTOREPEAT(clk,2) b*c */;
_pipe2;
_pipe2 <= _pipe1;
_pipe1 <= b*c;
end

I'll have to look at verilog-mode some more to see if this is
feasible. Of course, this won't be parameterizable unfortunately :/

<snip interesting assertion based idea for variable pipelines>

could be implemented by having the synthesis tool implement a
constant named TEST.PIPELINE_LENGTH (or some similar name) which
could be accessed in the rest of the module.

Oooooh, I really really really don't like that idea.
It stinks of kludge. Surely it would make more sense
to have the pipe length as a parameter, and get the
synth tool to write out a defparam or somesuch to
back-annotate that parameter value into the original
code. But I still deeply dislike the idea that I
can't simulate at all until synthesis is done.

This might make sense, and might even be possible to implement today
if you create some nice synthesis scripts that examines your source
code for interesting parameters and change them accordingly. (Even
though repeat(N) time intra-assignment delays are not synthesizable it
should be possible to create parameterizable pipelines using
generate.) I really don't have time to look at this at the moment
which makes it even more tempting to write a quick and dirty script to
this

On the other hand, I relly don't like the idea of the synthesis tool
messing around with my source code, but just like you I don't have a
better idea of how to simulate it reliably and having confidence in
the simulation result.

We already have more than enough special-purpose kruft
in SystemVerilog (always_latch, anyone? unique if?).
What you're suggesting takes it several stages too far
for my taste.

Well, if always_latch is the price I have to pay for having always_ff
and always_comb I'm prepared to pay that price... But I understand
what you mean.

<snip>

Rationale: Enables us to create logic using sequential programming
which will simplify things a lot in many cases.

Yes, that's what the vendors of all those tools say.

Well, as you said, one can always wish

Disadvantage: This is probably a really hard problem to solve
in an optimal way for the synthesis tool...

But, for many realistic cases, tools exist that do a useful
job that is "good enough". Right now they either use
bastardised languages (Bluespec) or pragmas/directives
layered on top of Verilog, or else they abandon cycle-
accurate simulation altogether and just go all-out for
behavioural synthesis from sequential algorithms
(C-to-gates tools). All those things have their place,
but none makes sense within a language whose prime reason
for existence is as an event-driven simulator.

Personally I'm thinking (or rather wishing) that none of these
bastardised languages makes sense when we have a perfectly good HDL
language which we could extend so that we don't have to deal with yet
another not-quite-C-to-gates language. I really don't like the idea of
having to debug my HDL design when it consists of many HDL files that
have been automatically generated...

But I fear that you are right in being skeptical to this
proposal. Perhaps in the future it will be just as rare to write code
in Verilog/VHDL as it is to write inline assembler in the software
world today. And of course, even in the software world you sometimes
have to use code generators like for example lex and yacc. (No, I
don't like writing parsers by hand...)

/Andreas

Jonathan Bromley · Dec 28, 2008

On Sun, 28 Dec 2008 15:42:30 +0000 (UTC), Andreas Ehliar wrote:

always @(posedge clk)
a <= repeat(2) @(posedge clk) b*c;
[...]
It will look quite messy when you are assigning lots of signals in an
always block however.

Might not be such a big deal. Two possibilities spring
to mind:

1) a macro:

`define pipe(N) repeat(N) @(posedge clk)
...
always @(posedge clk)
a <= `pipe(2) b*c;

2) if you have lots of different outputs, compute them all
into temporary variables and then do a single pipelined
copy to the output registers:

always @(posedge clk) begin : poly_pipe
logic [7:0] octet;
logic [15:0] word;
word = (horrible function of inputs);
octet = (another H.F.O.I.);
{out_word, out_octet} <= repeat(N) @(posedge clk) {word, octet};
end

On the other hand, I relly don't like the idea of the synthesis tool
messing around with my source code, but just like you I don't have a
better idea of how to simulate it reliably and having confidence in
the simulation result.

Untimed simulation models are probably the only way forward.
But they always leave a nasty gap between the verified
untimed model and the actual RTL design.

Personally I'm thinking (or rather wishing) that none of these
bastardised languages makes sense when we have a perfectly good HDL
language which we could extend so that we don't have to deal with yet
another not-quite-C-to-gates language. I really don't like the idea of
having to debug my HDL design when it consists of many HDL files that
have been automatically generated...

That was one of the motivations for my attempt to reproduce
some (System)C-style untimed modelling techniques in
SystemVerilog [in a paper at SNUG Munich 2008] - it would
be nice to have a test environment in which untimed models
could easily be swapped with RTL or even synthesised netlists
so that you could confirm the correctness of your
untimed-to-RTL refinement flow, some of which might be quite
highly automated if you're using behavioural synthesis.

Perhaps in the future it will be just as rare to write code
in Verilog/VHDL as it is to write inline assembler

Possibly. As Makimoto's Wave famously hints, the balance
between hardware-like and software-like descriptions
shifts from time to time as a function of the available
platforms. Right now, for most consumer and industrial
applications, general-purpose compute platforms are so
fast and cheap that the balance is swinging strongly in
favour of software-like descriptions and hardware design
is largely restricted to system integration tasks and
a few very specialised applications. Not many people
get to work on the interesting IP blocks where core
digital design skills are paramount. And there's little
doubt that traditional HDLs are a poor tool for the
kind of design work that's needed for integrating large
IP blocks into a system.

Anyway, it's good to challenge the status quo

Thanks
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andreas Ehliar · Dec 29, 2008

On 2008-12-28, Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

There is a perfectly good Verilog construct that
already implements exactly what you ask for:

always @(posedge clk)
a <= repeat(2) @(posedge clk) b*c;

Just for fun I tried to synthesize the following module
using ISE, Quartus and Precision:

module test(
input clk_i,
input rx_i,
output reg tx_o
);

always @(posedge clk_i) begin
tx_o <= repeat(2) @(posedge clk_i) rx_i;
end

endmodule

Precision and ISE washed their hands of the entire affair whereas
Quartus surprisingly accepted this without any complaints!

However, when looking at the report files it seems like I only got
one register here (even when increasing the repeat(2) to repeat(2000).

But at least it didn't complain that it was unsynthesizable

/Andreas

Jan 6, 2009

On Dec 29 2008, 5:59 am, Andreas Ehliar <ehliar-nos...@isy.liu.se>
wrote:

Precision and ISE washed their hands of the entire affair whereas
Quartus surprisingly accepted this without any complaints!

However, when looking at the report files it seems like I only got
one register here (even when increasing the repeat(2) to repeat(2000).

It presumably just assumes intra-assignment delays are time delays,
and ignores them like any other time delay.

On the messiness of multiple pipelined assignments, here is another
approach besides the two that Jonathan suggested. In SV, you could
use a fork..join_none whose statement contains a single repeat and
event control, then all the assignments. This is probably closer to
what you were trying to do with repeat. It seems even less likely
that synthesis will support this approach though.

On the other suggestions, Jonathan has done an excellent job of
expressing the issues I would have with them.

Andreas Ehliar · Jan 7, 2009

On 2009-01-06, sharp@cadence.com <sharp@cadence.com> wrote:

On Dec 29 2008, 5:59 am, Andreas Ehliar <ehliar-nos...@isy.liu.se
wrote:
However, when looking at the report files it seems like I only got
one register here (even when increasing the repeat(2) to repeat(2000).

It presumably just assumes intra-assignment delays are time delays,
and ignores them like any other time delay.

Probably. Although I should also point out that Altera was quick to respond
to a service request about this and confirm that it is an error. I think
they will fix this for the next release (so that they give an error on this
kind of construct).

/Andreas

A couple of controversial(?) improvement proposals for Veril

Andreas Ehliar

Guest

Jonathan Bromley

Guest

Andreas Ehliar

Guest

Jonathan Bromley

Guest

Andreas Ehliar

Guest

Guest

Andreas Ehliar

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

A couple of controversial(?) improvement proposals for Veril

Andreas Ehliar

Guest

Jonathan Bromley

Guest

Andreas Ehliar

Guest

Jonathan Bromley

Guest

Andreas Ehliar

Guest

Guest

Andreas Ehliar

Guest

Log in

Welcome to EDABoard.com

Sponsor