A
Andreas Ehliar
Guest
I have a few ideas for features which would be nice to have in Verilog
(or SystemVerilog). Not that I really expect anything to happen from
posting the ideas here, but you never know.
I expect that the first proposal is not going to be very controversial
and would immediately be useful in many cases. However, the remaining
proposals are fairly "far out". Feel free to post upset comments if you
think that they would be of use to you or if you are offended by merely
reading about such unholy ideas
My main reason for thinking about proposal 2-4 is a feeling that you
are often constrained by the details when using a HDL language. In this
case the detail of how many clock cycles a certain task will take. While
this is of course a strength of the HDL language in many cases it is also
a drawback. In many cases it doesn't matter if it takes 150 or 180 cycles
to do a certain task as long as it gets done. In many cases it doesn't matter
if a pipeline contains 50 or 80 stages as long as the throughput is there.
But if we write code in HDL we do have to specify these numbers. In many
cases we also have to iterate over the design to find the value for these
numbers that will achieve a minimal area while barely meeting the
required timing constraints.
1. Automatic pipelining
-----------------------
reg [31:0] a;
pipeline(clk,3) begin
a <= b*c;
end
This would be identical to the following piece of code:
reg [31:0] pipeline_reg1,pipeline_reg2,a;
always @(posedge clk) begin
pipeline_reg1 <= b*c;
pipeline_reg2 <= pipeline_reg1;
a <= pipeline_reg2;
end
Rationale: Creating extra regs for pipelining is just tedious
work. Changing the length of the pipeline is also just tedious work.
It is surprising that SystemVerilog doesn't support anything like
this even though pipelining is arguably one of the most important
tools in a designer's toolbox.
Disadvantages: You have more control over what the tool does if you
specify the pipeline yourself instead of trusting the tool to do
retiming properly.
2. Automatic pipelinining with optimum number of pipeline stages
----------------------------------------------------------------
reg [31:0] a;
pipeline(clk,max_regs(3)) begin : TEST // Haven't really thought through
// the syntax of the max_regs(3) here
a <= b*c;
end
This would generate a pipeline with a maximum of three pipeline
registers. The number of registers would be decided by the synthesis
tool when evaluating the area constraints and timing constraints of
the design.
The example above would probably make little sense in such a
situation however since the rest of the design would have no idea
about the length of the pipeline. It would therefore be nice if
it was also possible to figure out the pipeline length either by
using some sort of enable signal which is propagated through the
pipeline or by asking the block abouts its pipeline length. The latter
could be implemented by having the synthesis tool implement a
constant named TEST.PIPELINE_LENGTH (or some similar name) which
could be accessed in the rest of the module.
Rationale: Figuring out the optimal length of a pipeline for
a certain timing constraint can be very tedious. In many situations
the exact length of a pipeline is not going to be very important as
long as the throughput is high enough.
Disadvantage: Verification will be hard since you would have to
figure out a way to verify all different pipeline lengths.
3. Automatic FSM creation
-------------------------
serial(clk,16,enable) begin
a <= b*c;
end
This will create a multiplier which will output a result
16 clock cycles after enable is asserted. (The syntheis tool
could now create a bit serial multiplier here.) It is assumed that
b and c will not changed during the calculation.
Multiplication is perhaps not the most interesting example here though.
I'm instead thinking of a case where a rather complicated algorithm
like a 2D DCT could be written as sequential code and the synthesis
tool would do something clever with it like create an FSM with resource
sharing of multipliers/adders/memories/etc.
Rationale: Enables us to create logic using sequential programming
which will simplify things a lot in many cases.
Disadvantage: This is probably a really hard problem to solve
in an optimal way for the synthesis tool...
4. Automatic FSM creation with optimum delay
--------------------------------------------
serial(clk,max_cycles(16),enable) begin
a <= b*c;
finished <= enable;
end
Similar to the previous proposal except that this one will take a
maximum of 16 cycles. If timing constraints and area constraints can
be fulfilled using fewer cycles it will be.
Rationale: Same as for 3. Plus the fact that figuring out the optimal
number of cycles for different algorithms will not be easy.
Disadvantage: Makes verification hard. Will be even harder than 3
on the synthesis tool.
/Andreas
(or SystemVerilog). Not that I really expect anything to happen from
posting the ideas here, but you never know.
I expect that the first proposal is not going to be very controversial
and would immediately be useful in many cases. However, the remaining
proposals are fairly "far out". Feel free to post upset comments if you
think that they would be of use to you or if you are offended by merely
reading about such unholy ideas
My main reason for thinking about proposal 2-4 is a feeling that you
are often constrained by the details when using a HDL language. In this
case the detail of how many clock cycles a certain task will take. While
this is of course a strength of the HDL language in many cases it is also
a drawback. In many cases it doesn't matter if it takes 150 or 180 cycles
to do a certain task as long as it gets done. In many cases it doesn't matter
if a pipeline contains 50 or 80 stages as long as the throughput is there.
But if we write code in HDL we do have to specify these numbers. In many
cases we also have to iterate over the design to find the value for these
numbers that will achieve a minimal area while barely meeting the
required timing constraints.
1. Automatic pipelining
-----------------------
reg [31:0] a;
pipeline(clk,3) begin
a <= b*c;
end
This would be identical to the following piece of code:
reg [31:0] pipeline_reg1,pipeline_reg2,a;
always @(posedge clk) begin
pipeline_reg1 <= b*c;
pipeline_reg2 <= pipeline_reg1;
a <= pipeline_reg2;
end
Rationale: Creating extra regs for pipelining is just tedious
work. Changing the length of the pipeline is also just tedious work.
It is surprising that SystemVerilog doesn't support anything like
this even though pipelining is arguably one of the most important
tools in a designer's toolbox.
Disadvantages: You have more control over what the tool does if you
specify the pipeline yourself instead of trusting the tool to do
retiming properly.
2. Automatic pipelinining with optimum number of pipeline stages
----------------------------------------------------------------
reg [31:0] a;
pipeline(clk,max_regs(3)) begin : TEST // Haven't really thought through
// the syntax of the max_regs(3) here
a <= b*c;
end
This would generate a pipeline with a maximum of three pipeline
registers. The number of registers would be decided by the synthesis
tool when evaluating the area constraints and timing constraints of
the design.
The example above would probably make little sense in such a
situation however since the rest of the design would have no idea
about the length of the pipeline. It would therefore be nice if
it was also possible to figure out the pipeline length either by
using some sort of enable signal which is propagated through the
pipeline or by asking the block abouts its pipeline length. The latter
could be implemented by having the synthesis tool implement a
constant named TEST.PIPELINE_LENGTH (or some similar name) which
could be accessed in the rest of the module.
Rationale: Figuring out the optimal length of a pipeline for
a certain timing constraint can be very tedious. In many situations
the exact length of a pipeline is not going to be very important as
long as the throughput is high enough.
Disadvantage: Verification will be hard since you would have to
figure out a way to verify all different pipeline lengths.
3. Automatic FSM creation
-------------------------
serial(clk,16,enable) begin
a <= b*c;
end
This will create a multiplier which will output a result
16 clock cycles after enable is asserted. (The syntheis tool
could now create a bit serial multiplier here.) It is assumed that
b and c will not changed during the calculation.
Multiplication is perhaps not the most interesting example here though.
I'm instead thinking of a case where a rather complicated algorithm
like a 2D DCT could be written as sequential code and the synthesis
tool would do something clever with it like create an FSM with resource
sharing of multipliers/adders/memories/etc.
Rationale: Enables us to create logic using sequential programming
which will simplify things a lot in many cases.
Disadvantage: This is probably a really hard problem to solve
in an optimal way for the synthesis tool...
4. Automatic FSM creation with optimum delay
--------------------------------------------
serial(clk,max_cycles(16),enable) begin
a <= b*c;
finished <= enable;
end
Similar to the previous proposal except that this one will take a
maximum of 16 cycles. If timing constraints and area constraints can
be fulfilled using fewer cycles it will be.
Rationale: Same as for 3. Plus the fact that figuring out the optimal
number of cycles for different algorithms will not be easy.
Disadvantage: Makes verification hard. Will be even harder than 3
on the synthesis tool.
/Andreas