A
Aleksandar Kuktin
Guest
Hi all!
I'm having a problem with the synthesis and P&R tools introducing a
unnecessary gate in a critical path. Consider the following verilog:
reg [31:0] mem_dataintomem = 32'd0;
always @(posedge CLK)
begin
if (mcu_active && (w_we_recv || w_tlb_recv))
mem_dataintomem <= w_data_recv;
if (mem_ack && !mcu_active)
mem_dataintomem <= 0;
end
Practically, the synthesis correctly implements mem_dataintomem as a
register whose data input is ungated w_data_recv, and whose clock enable
and synchronous reset control the contents of the register.
In this setup, for speed considerations, the best thing to do is create a
single gate signal to the clock enable input of a register, and another
separate single gate signal to drive the synchronous reset of the same
register.
But instead, the tools combine the signals, so that there is one single-
gate signal that feeds the synchronous reset, but is also piped into the
other signal (which therefore becomes a two-gate signal). Sort of like
this:
assign ctrl_syn_reset = mem_ack && !mcu_active;
assign ctrl_clock_enable = mcu_active &&
(w_we_recv || w_tlb_recv) &&
!ctrl_syn_reset; // this is superfluous
register_module mem_dataintomem_somebit(
.CLOCK(CLK),
.DATAIN(w_data_recv),
.CLOCK_ENABLE(ctrl_clock_enable),
.RESET(ctrl_syn_reset), // ignore the drive-high-to-reset quirk
[...]
);
My problem is that in this setup ctrl_clock_enable becomes too slow on
account of the extra gate and timing analysis fails.
What I would like is help figuring out some incantation that would make
the machine see the light and not chain the signals. OTOH, is there a
real need for this? It seems obvious to me that only one of CLOCK_ENABLE
or RESET will ever be asserted, and that you don't have to mix them.
I know for a fact I can sidestep the problem if I remove the clock enable
and instead gate the input to the register, but that is *very*
problematic because that area of the die is already tightly packed and it
can't support the extra logic (that is, all the wires are already taken).
The chip used is Lattice iCE40 HX8K, and the software used is iCEcube2
version 2017.01.27914 . Synthesis is performed by the bundled Synplify
Pro L-2016.09.L+ice40, build 077R.
The code is part of a FOSH project and can be inspected on
https://github.com/akuktin/special_snowflake . The code of interest for
this question is in file cache/cpu_mcu2.v, lines approximately 303 to
333. The code at the start of the post (without else) is one of the
experiments I tried, and failed with.
I'm having a problem with the synthesis and P&R tools introducing a
unnecessary gate in a critical path. Consider the following verilog:
reg [31:0] mem_dataintomem = 32'd0;
always @(posedge CLK)
begin
if (mcu_active && (w_we_recv || w_tlb_recv))
mem_dataintomem <= w_data_recv;
if (mem_ack && !mcu_active)
mem_dataintomem <= 0;
end
Practically, the synthesis correctly implements mem_dataintomem as a
register whose data input is ungated w_data_recv, and whose clock enable
and synchronous reset control the contents of the register.
In this setup, for speed considerations, the best thing to do is create a
single gate signal to the clock enable input of a register, and another
separate single gate signal to drive the synchronous reset of the same
register.
But instead, the tools combine the signals, so that there is one single-
gate signal that feeds the synchronous reset, but is also piped into the
other signal (which therefore becomes a two-gate signal). Sort of like
this:
assign ctrl_syn_reset = mem_ack && !mcu_active;
assign ctrl_clock_enable = mcu_active &&
(w_we_recv || w_tlb_recv) &&
!ctrl_syn_reset; // this is superfluous
register_module mem_dataintomem_somebit(
.CLOCK(CLK),
.DATAIN(w_data_recv),
.CLOCK_ENABLE(ctrl_clock_enable),
.RESET(ctrl_syn_reset), // ignore the drive-high-to-reset quirk
[...]
);
My problem is that in this setup ctrl_clock_enable becomes too slow on
account of the extra gate and timing analysis fails.
What I would like is help figuring out some incantation that would make
the machine see the light and not chain the signals. OTOH, is there a
real need for this? It seems obvious to me that only one of CLOCK_ENABLE
or RESET will ever be asserted, and that you don't have to mix them.
I know for a fact I can sidestep the problem if I remove the clock enable
and instead gate the input to the register, but that is *very*
problematic because that area of the die is already tightly packed and it
can't support the extra logic (that is, all the wires are already taken).
The chip used is Lattice iCE40 HX8K, and the software used is iCEcube2
version 2017.01.27914 . Synthesis is performed by the bundled Synplify
Pro L-2016.09.L+ice40, build 077R.
The code is part of a FOSH project and can be inspected on
https://github.com/akuktin/special_snowflake . The code of interest for
this question is in file cache/cpu_mcu2.v, lines approximately 303 to
333. The code at the start of the post (without else) is one of the
experiments I tried, and failed with.