J
Jonathan Bromley
Guest
Every cloud has a silver lining, but it seems
every rose has its thorns too.
PLLs/DCMs/DLLs (or whatever your favourite FPGA
happens to offer) provide a wonderful way to create
multiplied-up clocks within the device. What's more,
you can line up the active clock edges so closely
that you can treat the x1 and xN clock domains as
if they were one single clock domain; hold times
can be avoided when crossing the boundary in either
direction.
Until recently I've always avoided taking advantage
of this, and have treated the x1 and xN clock domains
as if they were asynchronous, using FIFOs or whatever
to convey things across the boundary. But in a recent
client engagement I was faced with a design in which
a x2 and x4 clock, from the same PLL, were used in
a completely sensible way as if they were in the same
clock domain as the original x1 clock. The TimeQuest
timing analyzer (for it was Brand A that was in use
on this occasion) was quite happy to deal with these
crossings, giving clear-headed and (as it turned out)
accurate reports of what was going on. There is no
doubt that this is cool.
However, it's not so cool in RTL simulation. The
PLL simulation models, not too surprisingly,
introduce some delta delays between the
nominally coincident clock edges. Consequently
I get everything working when going in one direction
(from fast clock to slow clock, as it turns out)
but I get shoot-through behaviour, the RTL equivalent
of a hold time violation, when crossing from slow to
fast clock; data is arriving one or more delta cycles
*before* the clock.
We've easily enough got around this for the present
design, but I'd love to know what all you seasoned
PLL/DCM users out there do about it. Do you
introduce small non-zero time delays in all the
signals crossing the clock domains, so that it all
works in simulation? Do you treat the various
clock domains as if they were asynchronous, thereby
losing one of the nicest benefits of the PLLs? Or
do you simply accept that it's necessary to do timing
simulation in order to see what will really happen?
This is partly a plague of VHDL RTL sim (hence the
posting to c.l.vhdl as well); in Verilog you can
model clock gating and PLL-ish behaviour with "less"
zero delay than the nonblocking assignments to your
flip-flops, by taking care to use blocking assignment
in all your clock paths. I have not yet tried the
Verilog simulation models for the PLLs to see whether
that makes any difference.
One further whinge: I haven't tried this in Brand X
recently, but the Altera PLL models are spectacularly
inefficient for RTL simulation. In our modest-size
project - think SDRAM controller, a few FIFOs occupying
most of the blockRAM, and a fairly small bunch of
additional logic - the two PLLs are responsible for
at least 90% of the simulation time - OUCH. I swapped-in
much simpler, but perfectly adequate in-house models and
got x10 simulation speedup.
Opinions/rants/insults welcomed. Thanks in advance.
--
Jonathan Bromley, Consultant
DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com
The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
every rose has its thorns too.
PLLs/DCMs/DLLs (or whatever your favourite FPGA
happens to offer) provide a wonderful way to create
multiplied-up clocks within the device. What's more,
you can line up the active clock edges so closely
that you can treat the x1 and xN clock domains as
if they were one single clock domain; hold times
can be avoided when crossing the boundary in either
direction.
Until recently I've always avoided taking advantage
of this, and have treated the x1 and xN clock domains
as if they were asynchronous, using FIFOs or whatever
to convey things across the boundary. But in a recent
client engagement I was faced with a design in which
a x2 and x4 clock, from the same PLL, were used in
a completely sensible way as if they were in the same
clock domain as the original x1 clock. The TimeQuest
timing analyzer (for it was Brand A that was in use
on this occasion) was quite happy to deal with these
crossings, giving clear-headed and (as it turned out)
accurate reports of what was going on. There is no
doubt that this is cool.
However, it's not so cool in RTL simulation. The
PLL simulation models, not too surprisingly,
introduce some delta delays between the
nominally coincident clock edges. Consequently
I get everything working when going in one direction
(from fast clock to slow clock, as it turns out)
but I get shoot-through behaviour, the RTL equivalent
of a hold time violation, when crossing from slow to
fast clock; data is arriving one or more delta cycles
*before* the clock.
We've easily enough got around this for the present
design, but I'd love to know what all you seasoned
PLL/DCM users out there do about it. Do you
introduce small non-zero time delays in all the
signals crossing the clock domains, so that it all
works in simulation? Do you treat the various
clock domains as if they were asynchronous, thereby
losing one of the nicest benefits of the PLLs? Or
do you simply accept that it's necessary to do timing
simulation in order to see what will really happen?
This is partly a plague of VHDL RTL sim (hence the
posting to c.l.vhdl as well); in Verilog you can
model clock gating and PLL-ish behaviour with "less"
zero delay than the nonblocking assignments to your
flip-flops, by taking care to use blocking assignment
in all your clock paths. I have not yet tried the
Verilog simulation models for the PLLs to see whether
that makes any difference.
One further whinge: I haven't tried this in Brand X
recently, but the Altera PLL models are spectacularly
inefficient for RTL simulation. In our modest-size
project - think SDRAM controller, a few FIFOs occupying
most of the blockRAM, and a fairly small bunch of
additional logic - the two PLLs are responsible for
at least 90% of the simulation time - OUCH. I swapped-in
much simpler, but perfectly adequate in-house models and
got x10 simulation speedup.
Opinions/rants/insults welcomed. Thanks in advance.
--
Jonathan Bromley, Consultant
DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com
The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.