Simulation deltas

C

Carl

Guest
Hi,

This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.

This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.


*Conceptual*

I would argue that for a well-behaved synchronous RTL entity, the following must be true:

*All readings of the input ports must be made *on* the delta of the rising flank of the clock - not one or any other number of deltas after that.*

Would people agree on that?

It follows from the possibility of other logic, hierarchically above the entity in question, altering the input ports as little as one delta after the rising flank. That must be allowed.


*My actual problem*

After a lot of debugging of one of my simulations, I found a Xilinx simulation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the previous section, which had caused all the problems.

See the signals plotted here:
http://www.fpga-dev.com/misc/deltaDelayProblem.png

It's enough to focus on the "ports" section. The ports are:
- c: in, the clock
- cntValueIn: in
- ld: in, writeEnable for writing cntValueIn to an internal register
- cntValueOut: out, giving the contents of that register

As can be seen, my 'ld' operation is de-asserted one delta after the rising flank. I argue this should be OK, but it is obvious that the data is never written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just one more delta, the write *does* take effect as desired.

I would argue this is a (serious) flaw of the Xilinx primitive. Would people agree on that as well?


(The following is not central for the above discussion, may be skipped.)

I have checked the actual reason for the problem. See the "internals" section of the signals. First, Xilinx delays both the clock and the ports to the *_dly signals. Fully OK, if from now on operating on the delayed signals. The problem is that the process writing to the internal register is not clocked by c_dly, but by another signal, c_in, which is delayed *one more* delta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly in another process, inverting the the clock input if the user has requested that.)

I argue that synchronous entities must be modelled in such a way that all processes reading input ports *must* be clocked directly by the input clock port - not by some derived signal that is lagging (if only by one delta). If this is not possible, the input ports being read must be delayed accordingly. In this case, if Xilinx wishes to conditionally invert the clock like this, causing another delta of delay, the input ports must also be delayed the corresponding number of deltas.


Cheers,
Carl
 
On Thursday, April 3, 2014 9:01:34 AM UTC-4, Carl wrote:
*Conceptual*

I would argue that for a well-behaved synchronous RTL entity, the following
must be true:

*All readings of the input ports must be made *on* the delta of the rising
flank of the clock - not one or any other number of deltas after that.*

Would people agree on that?

I would not agree, conceptual reasoning is as follows:
- The clock causes something to happen
- Something that causes 'something else' to happen must precede 'something else' because this is a causal world we live in.

It follows from the possibility of other logic, hierarchically above the entity
in question, altering the input ports as little as one delta after the rising
flank. That must be allowed.

Hierarchy does not alter signals. You can go through as many levels of hierarchy as you want and it will not change the time (including simulation delta time) that a signal changes. What *will* change that time are statements such as 'clk_out <= clk_in' but that is because a new signal called 'clk_out' has been created and that is *not* the same thing as 'clk_in'...since we live in a causal world, 'clk_out' must occur after 'clk_in'. Granted a synthesizer will ignore optimize the statement and just use 'clk_in' wherever 'clk_out' goes, but that is a different tangent than what you're asking.

Kevin Jennings
 
Den torsdagen den 3:e april 2014 kl. 15:24:49 UTC+2 skrev KJ:
On Thursday, April 3, 2014 9:01:34 AM UTC-4, Carl wrote:

I would argue that for a well-behaved synchronous RTL entity, the following
must be true:

*All readings of the input ports must be made *on* the delta of the rising
flank of the clock - not one or any other number of deltas after that.*

Would people agree on that?

I would not agree, conceptual reasoning is as follows:
- The clock causes something to happen
- Something that causes 'something else' to happen must precede 'something else' because this is a causal world we live in.

I don't really get what your two points mean in this context. I do understand and agree on the literal meaning of them.

I don't think those points necessariyl adress my issue. My issue doesn't only relate to causality. Then main problem is to determine *exactly when something is sampled*.

Since you don't agree with the statement however; how then should synchronous elements communicate with each other? If I clock a unit with 'clk', and I can't expect that unit to sample the input ports (which I drive) on (exactly on, without any delta delays) the rising edge of 'clk', then how long after the edge must I hold the input data stable? One delta? Two, ten? One ps, one ns?

(If the answer is anything more than deltas, e.i. involving time, we are no longer in functional modelling, which was an assumption for this question.)

Or how would you suggest the problem I illustrated should be avoided?


It follows from the possibility of other logic, hierarchically above the entity

in question, altering the input ports as little as one delta after the rising

flank. That must be allowed.

Hierarchy does not alter signals. You can go through as many levels of hierarchy as you want and it will not change the time (including simulation delta time) that a signal changes. What *will* change that time are statements such as 'clk_out <= clk_in' but that is because a new signal called 'clk_out' has been created and that is *not* the same thing as 'clk_in'...since we live in a causal world, 'clk_out' must occur after 'clk_in'.

Well of course I agree on all that. This is not about hierarchy. Maybe that was bad wording by me. This is about how you should expect functional, synchronous elements (possibly developed by others) to behave.

> Granted a synthesizer will ignore optimize the statement and just use 'clk_in' wherever 'clk_out' goes, but that is a different tangent than what you're asking.

Yes, that's something else. A synthesis tools knows about the clocks and signals and warns for any setup/hold time violations. My question regards ideal functional models. Ideal and functional in the sens that delays are not modelled (rather trusting the delta's to keep track of event ordering). If delays would be modelled as well, these problems would not arise.
 
On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote:
I don't really get what your two points mean in this context. I do understand
and agree on the literal meaning of them.

I don't think those points necessariyl adress my issue. My issue doesn't only
relate to causality. Then main problem is to determine *exactly when something
is sampled*.

Since you don't agree with the statement however; how then should synchronous
elements communicate with each other? If I clock a unit with 'clk', and I can't
expect that unit to sample the input ports (which I drive) on (exactly on,
without any delta delays) the rising edge of 'clk', then how long after the
edge must I hold the input data stable? One delta? Two, ten? One ps, one ns?

Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do. For some reason, I thought you were talking about outputs being generated.

In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive. I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct. Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either. Inputs to such a model must meet the setup/hold constraints of the design.

When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues. This is an invalid assumption if you include parts into your model that model reality where delays do occur.. The model is not wrong in that case, it is your usage of that model.

Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints. If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing). On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min).

Kevin Jennings
 
KJ wrote:
On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote:
I don't really get what your two points mean in this context. I do understand
and agree on the literal meaning of them.

I don't think those points necessariyl adress my issue. My issue doesn't only
relate to causality. Then main problem is to determine *exactly when something
is sampled*.

Since you don't agree with the statement however; how then should synchronous
elements communicate with each other? If I clock a unit with 'clk', and I can't
expect that unit to sample the input ports (which I drive) on (exactly on,
without any delta delays) the rising edge of 'clk', then how long after the
edge must I hold the input data stable? One delta? Two, ten? One ps, one ns?

Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do. For some reason, I thought you were talking about outputs being generated.

In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive. I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct. Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either. Inputs to such a model must meet the setup/hold constraints of the design.

When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues. This is an invalid assumption if you include parts into your model that model reality where delays do occur. The model is not wrong in that case, it is your usage of that model.

Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints. If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing). On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min).

Kevin Jennings

Then perhaps the error in the xilinx case is that they are applying a
physical model when you call up a behavioral simulation. I remember
that the BRAM models (at least for VHDL) had a similar issue causing
the behavioral simulation to look as if the readout was not registered
unless you had some delay on the address inputs.

--
Gabor
 
On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote:
Hi,



This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.



This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.





*Conceptual*



I would argue that for a well-behaved synchronous RTL entity, the following must be true:



*All readings of the input ports must be made *on* the delta of the rising flank of the clock - not one or any other number of deltas after that.*



Would people agree on that?



It follows from the possibility of other logic, hierarchically above the entity in question, altering the input ports as little as one delta after the rising flank. That must be allowed.





*My actual problem*



After a lot of debugging of one of my simulations, I found a Xilinx simulation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the previous section, which had caused all the problems.



See the signals plotted here:

http://www.fpga-dev.com/misc/deltaDelayProblem.png



It's enough to focus on the "ports" section. The ports are:

- c: in, the clock

- cntValueIn: in

- ld: in, writeEnable for writing cntValueIn to an internal register

- cntValueOut: out, giving the contents of that register



As can be seen, my 'ld' operation is de-asserted one delta after the rising flank. I argue this should be OK, but it is obvious that the data is never written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just one more delta, the write *does* take effect as desired.



I would argue this is a (serious) flaw of the Xilinx primitive. Would people agree on that as well?





(The following is not central for the above discussion, may be skipped.)



I have checked the actual reason for the problem. See the "internals" section of the signals. First, Xilinx delays both the clock and the ports to the *_dly signals. Fully OK, if from now on operating on the delayed signals.. The problem is that the process writing to the internal register is not clocked by c_dly, but by another signal, c_in, which is delayed *one more* delta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly in another process, inverting the the clock input if the user has requested that.)



I argue that synchronous entities must be modelled in such a way that all processes reading input ports *must* be clocked directly by the input clock port - not by some derived signal that is lagging (if only by one delta). If this is not possible, the input ports being read must be delayed accordingly. In this case, if Xilinx wishes to conditionally invert the clock like this, causing another delta of delay, the input ports must also be delayed the corresponding number of deltas.





Cheers,

Carl

I would agree with Kevin's assessment and offer an easy solution. As soon as you involve vendor supplied models you might as well just assume that they are not purely behavioral in the sense you are describing. The easy way to deal with this is to move edges of stimulus signals in test benches to the falling edge of the clock, and to ensure your clock is running in simulation at an appropriate time period as it would in the real hardware.
 
matt.lettau@gmail.com wrote:
On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote:
Hi,



This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.



This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.





*Conceptual*



I would argue that for a well-behaved synchronous RTL entity, the following must be true:



*All readings of the input ports must be made *on* the delta of the rising flank of the clock - not one or any other number of deltas after that.*



Would people agree on that?



It follows from the possibility of other logic, hierarchically above the entity in question, altering the input ports as little as one delta after the rising flank. That must be allowed.





*My actual problem*



After a lot of debugging of one of my simulations, I found a Xilinx simulation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the previous section, which had caused all the problems.



See the signals plotted here:

http://www.fpga-dev.com/misc/deltaDelayProblem.png



It's enough to focus on the "ports" section. The ports are:

- c: in, the clock

- cntValueIn: in

- ld: in, writeEnable for writing cntValueIn to an internal register

- cntValueOut: out, giving the contents of that register



As can be seen, my 'ld' operation is de-asserted one delta after the rising flank. I argue this should be OK, but it is obvious that the data is never written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just one more delta, the write *does* take effect as desired.



I would argue this is a (serious) flaw of the Xilinx primitive. Would people agree on that as well?





(The following is not central for the above discussion, may be skipped.)



I have checked the actual reason for the problem. See the "internals" section of the signals. First, Xilinx delays both the clock and the ports to the *_dly signals. Fully OK, if from now on operating on the delayed signals. The problem is that the process writing to the internal register is not clocked by c_dly, but by another signal, c_in, which is delayed *one more* delta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly in another process, inverting the the clock input if the user has requested that.)



I argue that synchronous entities must be modelled in such a way that all processes reading input ports *must* be clocked directly by the input clock port - not by some derived signal that is lagging (if only by one delta). If this is not possible, the input ports being read must be delayed accordingly. In this case, if Xilinx wishes to conditionally invert the clock like this, causing another delta of delay, the input ports must also be delayed the corresponding number of deltas.





Cheers,

Carl

I would agree with Kevin's assessment and offer an easy solution. As soon as you involve vendor supplied models you might as well just assume that they are not purely behavioral in the sense you are describing. The easy way to deal with this is to move edges of stimulus signals in test benches to the falling edge of the clock, and to ensure your clock is running in simulation at an appropriate time period as it would in the real hardware.

The problem with that approach is that the vendor IP is driven by user
IP and not the test bench directly. You certainly don't want the
user IP (for synthesis) working on the opposite clock edge. In the
past I have worked around the Xilinx model issues by adding unit delays
in the code that instantiates it, but even that leaves a bad taste in
my mouth, as it shouldn't be necessary for behavioral simulation.

--
Gabor
 
On Friday, April 4, 2014 12:01:33 PM UTC-4, Gabor wrote:
The problem with that approach is that the vendor IP is driven by user
IP and not the test bench directly.

I didn't see anything in the OP indicating whether the driving signals were testbench or design...but you could be right.

You certainly don't want the
user IP (for synthesis) working on the opposite clock edge. In the
past I have worked around the Xilinx model issues by adding unit delays
in the code that instantiates it, but even that leaves a bad taste in
my mouth, as it shouldn't be necessary for behavioral simulation.

Again the way to fight a model that tries to model reality is with more 'reality' of your own. Make the assignments that assign to signals that connect with the primitive be delayed by 1 ns (i.e. "a <= b after 1 ns;"). Synthesis tools ignore the 'after' clause, sim does not.

I agree that you shouldn't have to do this when you're simulating the original design sources (but I thought he was simulating a post-route design being driven by a testbench). It's ugly, but I guess that is part of the baggage that comes with Brand X...maybe switch to Brand A and see if the laundry comes out cleaner.

Kevin Jennings
 
On Saturday, April 5, 2014 1:02:16 PM UTC-4, rickman wrote:
In any case, your conceptual question doesn't relate to the problem that you
are seeing with the Xilinx primitive. I have no idea whether it correctly > > models the primitive or not, but let's assume for a moment that it is
correct. Since that primitive is attempting to model reality, there very
well would be a delay between the input clock to that primitive and when
that primitive actually samples input signals. If that is the situation,
then inputs must also model reality in that they cannot be changing
instantaneously either. Inputs to such a model must meet the setup/hold > > > constraints of the design.

This is a specious argument. Delta delays are not in any way related to
physical delays and are intended to deal with issues in the logic of
simulation, not real world physics.

Nothing at all specious, it is correct. If you're connecting to a block that models delays (and the OP's does), then the solution is to model reality as well on the inputs in order to meet setup/hold time as well as to not sample outputs before Tco max. Whether those delays are caused by the model using delta delays or real time delays does not change the fact that the solution I provided is correct. It will be correct if the offending model uses delta delays or actual post-route delays.

When you're performing functional simulation, there can be an assumption > > > that you can ignore setup/hold time issues. This is an invalid assumption
if you include parts into your model that model reality where delays do
occur. The model is not wrong in that case, it is your usage of that
model.

This model is clearly *not* modeling timing delays. Just read his
description of the problem and you will see that.

I did read the post, and there are timing delays. Just because the delays are simulation deltas does not make them 'not a delay'. Since the model he is using implements these delays, the user needs to account for that. If you don't want to account for it, then you should use a different model.

Just like on a physical board, on the input side to such a model, you need
to insure that you do not violate setup or hold constraints. If you do,
then a physical board will not always work, in a simulation environment
your simulation will fail (which is what you're experiencing). On the
output side of a model, you need to make sure that you're not sampling too
early (i.e. sooner than the Tco min).

This discussion is not at all about setup or hold times. The OP is
performing functional simulation which is very much like unit delay
simulation.

I agree that the OP's problem is not about setup or hold times. The work around/solution I suggested was to add delays in order to conform with setup or hold times, "Just like on a physical board...". My solution has a direct connection with reality (i.e. a physical board with the design programmed in), other solutions might not.

If you're adding something to work around some problem, you're on much firmer ground if there is an actual basis that can be traced back to specifications. On the assumption that the external thing connected to the part being worked around is a physical part, ask yourself if adding Tpd and Tco delays to that model makes it closer or farther away from a 'true' model of that part.

Someone else posted that they typically worked around this by changing the inputs to be driven by the opposite edge of the clock. That probably works also, but again ask yourself does that make the simulation model closer to reality? Don't think so.

Of course, there is also the possibility that the stuff connecting to the Xilinx primitive is itself internal to the device in which case I suggested adding a 1 ns (or really whatever small non-zero time delay you want). Again, inside a real device, the output of a flop will not change in zero time so adding a small nominal delay as a work around can be justified as modeling reality.

In any case, the work around you use should have a rational basis for being the way it is. If the only justification is that 'it was the only way I could get the sim to run' then there is probably a design error that is being covered up, rather than a model limitation that is being worked around.

Kevin Jennings
 
On 4/3/2014 1:17 PM, KJ wrote:
On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote:
I don't really get what your two points mean in this context. I do understand
and agree on the literal meaning of them.

I don't think those points necessariyl adress my issue. My issue doesn't only
relate to causality. Then main problem is to determine *exactly when something
is sampled*.

Since you don't agree with the statement however; how then should synchronous
elements communicate with each other? If I clock a unit with 'clk', and I can't
expect that unit to sample the input ports (which I drive) on (exactly on,
without any delta delays) the rising edge of 'clk', then how long after the
edge must I hold the input data stable? One delta? Two, ten? One ps, one ns?

Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do. For some reason, I thought you were talking about outputs being generated.

In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive. I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct. Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either. Inputs to such a model must meet the setup/hold constraints of the design.

This is a specious argument. Delta delays are not in any way related to
physical delays and are intended to deal with issues in the logic of
simulation, not real world physics. If the Xilinx primitive is trying
to model timing delays it has done a pretty durn poor job of it since a
delta delay is zero simulation time.


> When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues. This is an invalid assumption if you include parts into your model that model reality where delays do occur. The model is not wrong in that case, it is your usage of that model.

This model is clearly *not* modeling timing delays. Just read his
description of the problem and you will see that.


> Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints. If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing). On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min).

This discussion is not at all about setup or hold times. The OP is
performing functional simulation which is very much like unit delay
simulation. The purpose of delta delays are to prevent the order of
evaluating sequential logic from affecting the outcome. So the output
of all logic gets a delta delay (zero simulation time, but logically
delayed only) so that the output change is indeed causal and can not
affect other sequential elements on that same clock edge.

In fact, this is the classic problem where a logic element is inserted
into the clock path for some sequential elements and not others creating
the exact problem the OP is observing. Normally, designers know not to
do this. I guess someone at Xilinx was out that day in the training class.

--

Rick
 
On 4/5/2014 3:21 PM, KJ wrote:
On Saturday, April 5, 2014 1:02:16 PM UTC-4, rickman wrote:
In any case, your conceptual question doesn't relate to the problem that you
are seeing with the Xilinx primitive. I have no idea whether it correctly > > models the primitive or not, but let's assume for a moment that it is
correct. Since that primitive is attempting to model reality, there very
well would be a delay between the input clock to that primitive and when
that primitive actually samples input signals. If that is the situation,
then inputs must also model reality in that they cannot be changing
instantaneously either. Inputs to such a model must meet the setup/hold > > > constraints of the design.

This is a specious argument. Delta delays are not in any way related to
physical delays and are intended to deal with issues in the logic of
simulation, not real world physics.

Nothing at all specious, it is correct. If you're connecting to a block that models delays (and the OP's does), then the solution is to model reality as well on the inputs in order to meet setup/hold time as well as to not sample outputs before Tco max. Whether those delays are caused by the model using delta delays or real time delays does not change the fact that the solution I provided is correct. It will be correct if the offending model uses delta delays or actual post-route delays.

When you're performing functional simulation, there can be an assumption > > > that you can ignore setup/hold time issues. This is an invalid assumption
if you include parts into your model that model reality where delays do
occur. The model is not wrong in that case, it is your usage of that
model.

This model is clearly *not* modeling timing delays. Just read his
description of the problem and you will see that.

I did read the post, and there are timing delays. Just because the delays are simulation deltas does not make them 'not a delay'. Since the model he is using implements these delays, the user needs to account for that. If you don't want to account for it, then you should use a different model.

I'm not going to argue with you about this. The models are wrong by
conventions of VHDL. I have seen no evidence that the models are trying
to simulate timing delays. A delta delay is *zero* time in the
simulation. If they wanted to model timing delays they would use a time
delay, not delta delays. The problem with using delta delays is that
they don't even approximate timing values and they corrupt functional
simulation as the OP is seeing. It is a bit absurd to expect users to
insert delta delays in their code to fake out imagined timing delays of
0 ns. There is no utility to this concept.


Just like on a physical board, on the input side to such a model, you need
to insure that you do not violate setup or hold constraints. If you do,
then a physical board will not always work, in a simulation environment
your simulation will fail (which is what you're experiencing). On the
output side of a model, you need to make sure that you're not sampling too
early (i.e. sooner than the Tco min).

This discussion is not at all about setup or hold times. The OP is
performing functional simulation which is very much like unit delay
simulation.

I agree that the OP's problem is not about setup or hold times. The work around/solution I suggested was to add delays in order to conform with setup or hold times, "Just like on a physical board...". My solution has a direct connection with reality (i.e. a physical board with the design programmed in), other solutions might not.

If you're adding something to work around some problem, you're on much firmer ground if there is an actual basis that can be traced back to specifications. On the assumption that the external thing connected to the part being worked around is a physical part, ask yourself if adding Tpd and Tco delays to that model makes it closer or farther away from a 'true' model of that part.

But this is not relevant. I would prefer to add the delta delays where
needed and to document them as being required to deal with the errors in
the Xilinx models which is why they are there, not to add timing
information to a functional simulation which is a bit absurd.


> Someone else posted that they typically worked around this by changing the inputs to be driven by the opposite edge of the clock. That probably works also, but again ask yourself does that make the simulation model closer to reality? Don't think so.

I would consider this to be adding an error to work around the Xilinx
error.


> Of course, there is also the possibility that the stuff connecting to the Xilinx primitive is itself internal to the device in which case I suggested adding a 1 ns (or really whatever small non-zero time delay you want). Again, inside a real device, the output of a flop will not change in zero time so adding a small nominal delay as a work around can be justified as modeling reality.

Now you are starting to understand delta delays. That is what VHDL does
in the simulation. The output of a sequential element changes 1 delta
delay after the clock edge. You are proposing that additional delta
delays be added by the user to compensate for the delta delays being
introduced in the clock path by the corrupt Xilinx model. This is in
conflict with best design practices.

I feel that Xilinx should have added those delays to the input data path
so that the rest of the simulation can be written like a standard VHDL
design.


> In any case, the work around you use should have a rational basis for being the way it is. If the only justification is that 'it was the only way I could get the sim to run' then there is probably a design error that is being covered up, rather than a model limitation that is being worked around.

The rational basis is not "it was the only way I could get the sim to
run", it is "this is the best way to work around the Xilinx model
problems". Ideally the fixes would be added to a wrapper around the
offending Xilinx code if possible.

--

Rick
 
On Sunday, April 6, 2014 11:42:34 AM UTC-4, rickman wrote:
If you're adding something to work around some problem, you're on much
firmer ground if there is an actual basis that can be traced back to
specifications. On the assumption that the external thing connected to the
part being worked around is a physical part, ask yourself if adding Tpd and
Tco delays to that model makes it closer or farther away from a 'true'
model of that part.

But this is not relevant. I would prefer to add the delta delays where
needed and to document them as being required to deal with the errors in
the Xilinx models which is why they are there, not to add timing
information to a functional simulation which is a bit absurd.

Uh huh...when I say to add delays as a work around, you see it as 'not relevant' and 'absurd', but when you suggest adding delta delays you think you're relevant...OK...gotcha.

If you had actually put *any* thought into the problem you would see that all of the 'as being required' places that one would need to add delays would be the inputs (as I suggested) and the delays...well, you never suggested any amount for a delay (where I did). Good tip!

Of course, there is also the possibility that the stuff connecting to the
Xilinx primitive is itself internal to the device in which case I suggested
adding a 1 ns (or really whatever small non-zero time delay you want).
Again, inside a real device, the output of a flop will not change in zero
time so adding a small nominal delay as a work around can be justified as
modeling reality.

You are proposing that additional delta delays be added by the user to
compensate for the delta delays being introduced in the clock path by the
corrupt Xilinx model. This is in conflict with best design practices.

Ah yes, the 'conflict with best design practice' canard. I suggested using a different model if available, and if you're stuck with the model, then here is the way to work around it. What I suggested can be traced back to specifications, what you suggest...well, not so much. Just how many 'delta delays' do you think you can add and trace that code back to a specification?

I feel that Xilinx should have added those delays to the input data path
so that the rest of the simulation can be written like a standard VHDL
design.

Is what you 'fee' supposed to be relevant?

So are you suggesting that one should do nothing until the Xilinx model is fixed? When I encounter a bug, I submit it to the vendor and work on a work around since I can't depend on them to field a fix in a time frame usable by me. I guess you live in a different world where it is OK to say development has stopped while you wait for a supplier to fix something.

In any case, the work around you use should have a rational basis for being
the way it is. If the only justification is that 'it was the only way I
could get the sim to run' then there is probably a design error that is
being covered up, rather than a model limitation that is being worked
around.

The rational basis is not "it was the only way I could get the sim to
run", it is "this is the best way to work around the Xilinx model
problems". Ideally the fixes would be added to a wrapper around the
offending Xilinx code if possible.

So rather than accepting a solution that I suggested that has a basis that can be traced back to specification, can be reused regardless of how many delta delays get added sometime in the future (seems that you forgot about that possibility) you're into
- Railing on Xilinx
- Waiting for them to fix the model
- Or add a magic wrapper being apparently clueless that no wrapper will fix the problem and dismissing my work around as being 'not relevant', 'absurd', etc.

Gotcha.

I'm done with this thread, catch you in the future in some other thread.

Kevin Jennings
 
I once got bitten by this sort of thing.
Turned out that the default modelsim timing granularity was too big and the simulation rounded delays down to zero.

Colin
 
On 03/04/2014 14:01, Carl wrote:
Hi,

This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this.

This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events.

You might extract some useful info from this discussion:

http://verificationguild.com/modules.php?name=Forums&file=viewtopic&t=537

Delta delays avoid a lot of simulation nasties like race conditions but
still suffers from some real world implementation issues as you have
discovered.

Good luck,
Hans
www.ht-lab.com
 
Just to clarify, this is not a post-route simulation. This is a simulation of a larger custom RTL design. In various parts of it, some primitives from the Xilinx Unisim library are used.

There are numerous workarounds of course, they are obvious to all of us, and which someone would choose is much a matter of taste - for me, this is not the central discussion here. I rather seek the lesson to learn (if any) after having spent half a day of debugging, finally having found the behaviour of this primitive to be the cause of the problem.

What the discussion boils down to is if functional models may behave like this. If the answer is yes, there should be a general design practice, that should always be used when interfacing to RTL logic or functional models you haven't developed yourself.

I see from the discussion that the arguments regarding this differs. My original post suggested me leaning towards the Xilinx primitive being flawed, and also after having taken in the arguments above, this is still my opinion. The Unisim library contains simulation primitives. For functional simulation (there's the Simprim library for timing simulations) they should follow design practice of the interfacing logic only being required to hold the input signals valid *on* the active edge of the input clock. Not longer (also not in terms of deltas).

One effect of the user being required to hold inputs active any longer (say, adding 'after 1 ns' to any interfacing logic signals) would be a (sometimes) significant increase of simulation time. One of the powers of functional simulations are that any changes only happens on the clock flanks, and changes around the clock flanks being separated only by deltas, not by time. (Remember, VHDL signals are expensive in this regard. Reducing signal changes means everything to efficient simulations.)


There is another side of this discussion, that is not about how to interface to models/logic by others, but rather how to select your own design rules to avoid these problems within the code you develop on your own. However, I believe a designer seldomly has a legal reason to mess with the clock path in RTL code. Typically, vendor primitives are instantiated for any such functionality (clock muxing etc.). There might be situations where you rather _infer_ than instantiate though, and then this *does* become a problem. However, I never came across such a situation.
 
Den mĺndagen den 7:e april 2014 kl. 10:45:27 UTC+2 skrev HT-Lab:
You might extract some useful info from this discussion:

http://verificationguild.com/modules.php?name=Forums&file=viewtopic&t=537

A very related topic, yes.

If you have several clocks in your design, you must make sure any edges supposed to occur simultaneous *do* occur simultaneous, also in regards to deltas. This requires some care when generating your test bench clocks. I make sure to generate them from within one and the same process, keeping the desired phase relationship. Generating one, and dividing the second from the first, is doomed to fail. Logic interfacing to both clocks then have a big risk of missing signal transactions.
 
Den mĺndagen den 7:e april 2014 kl. 11:43:43 UTC+2 skrev colin:
> Turned out that the default modelsim timing granularity was too big and the simulation rounded delays down to zero.

This though must have been due to time delays rather than delta delays. (I know Xilinx states its primitives requires 1 ps precision, whereas default precision for ModelSim is 1 ns.)
 
On Monday, April 7, 2014 1:34:07 PM UTC-4, Carl wrote:
What the discussion boils down to is if functional models may behave like
this. If the answer is yes, there should be a general design practice, that
should always be used when interfacing to RTL logic or functional models you
haven't developed yourself.

- May they, 'yes'
- Should they, 'no'
- Design practice to add delays should be applied when using models that happen to need it. Since most do not, applying it to all models that you haven't developed is a waste.

I see from the discussion that the arguments regarding this differs. My
original post suggested me leaning towards the Xilinx primitive being flawed,
and also after having taken in the arguments above, this is still my opinion.

Submit a bug report.

One effect of the user being required to hold inputs active any longer (say,
adding 'after 1 ns' to any interfacing logic signals) would be a (sometimes)
significant increase of simulation time.

Whether this is true or not depends on how you work around the problem. A simulator will schedule a signal change to occur at a certain time. Whether that time is on the next simulation delta or 1 ns into the future doesn't change the fact that a new event will need to be evaluated at *some* time in the future. Now it could come down to how you implement that delay:
1. x <= y after 1 ns; -- Should not produce extra signal activity
2. x <= y0; y0 <= y after 1 ns; -- Will produce extra signal activity

#1 is how you would likely tend to implement the delay if you put it at the source where I suggested it belongs which has tracability back to specifications.
#2 is how you would likely tend to implement the delay if you see the source as being pristine and you just want to add a delay line.

One of the powers of functional simulations are that any changes only happens
on the clock flanks, and changes around the clock flanks being separated only
by deltas, not by time. (Remember, VHDL signals are expensive in this regard.
Reducing signal changes means everything to efficient simulations.)

Not sure what functional simulators you're talking about here. Modelsim certainly doesn't work this way (i.e. changes only happen on the clock edges).. Changes on any signal cause events to be scheduled on others.


Kevin Jennings
 
On 4/7/2014 4:45 AM, HT-Lab wrote:
On 03/04/2014 14:01, Carl wrote:
Hi,

This question deals both with an actual problem, and with some more
conceptual thoughts on simulation deltas and how an RTL entity should
behave with regards to this.

This post regards the case of a simulation with ideal time - that is,
no delays (in time) modelled, rather trusting only simulation deltas
for the ordering of events.


You might extract some useful info from this discussion:

http://verificationguild.com/modules.php?name=Forums&file=viewtopic&t=537

Delta delays avoid a lot of simulation nasties like race conditions but
still suffers from some real world implementation issues as you have
discovered.

I would not class this as a problem with delta delays. The problem is
the design of a module which fails in ordinary usage. I remember
learning a long time ago that you *never* run the clock through anything
that will delay it more than the data, including delta delays.
Obviously the designer of the Xilinx module forgot that rule and added
logic to the clock path that needs to be compensated for in the data
paths to any sequential elements on that delayed clock.

Just as adding delay to a clock in a real world design can cause the
design to fail, adding delta delays to a clock in a functional
simulation can cause the design to fail in simulation.

--

Rick
 
What the discussion boils down to is if functional models may behave like
this. If the answer is yes, there should be a general design practice, that
should always be used when interfacing to RTL logic or functional models you
haven't developed yourself.

- May they, 'yes'
- Should they, 'no'
- Design practice to add delays should be applied when using models that happen to need it. Since most do not, applying it to all models that you haven't developed is a waste.

I intended "should" rather than "may" so I'm with you here.

One effect of the user being required to hold inputs active any longer (say,
adding 'after 1 ns' to any interfacing logic signals) would be a (sometimes)
significant increase of simulation time.

Whether this is true or not depends on how you work around the problem. A simulator will schedule a signal change to occur at a certain time. Whether that time is on the next simulation delta or 1 ns into the future doesn't change the fact that a new event will need to be evaluated at *some* time in the future. Now it could come down to how you implement that delay:

1. x <= y after 1 ns; -- Should not produce extra signal activity
2. x <= y0; y0 <= y after 1 ns; -- Will produce extra signal activity

#1 is how you would likely tend to implement the delay if you put it at the source where I suggested it belongs which has tracability back to specifications.

#2 is how you would likely tend to implement the delay if you see the source as being pristine and you just want to add a delay line.

My mental picture was that there is a significant difference of signal changes separated by delta's versus changes separated by time - in terms of simulation performance. However, I was probably wrong here and after having considered it I no longer see no good reason why there should be such a diffference.

One of the powers of functional simulations are that any changes only happens
on the clock flanks, and changes around the clock flanks being separated only
by deltas, not by time. (Remember, VHDL signals are expensive in this regard.
Reducing signal changes means everything to efficient simulations.)


Not sure what functional simulators you're talking about here. Modelsim certainly doesn't work this way (i.e. changes only happen on the clock edges). Changes on any signal cause events to be scheduled on others.

Yes of course; I don't mean the _simulator_ has any influence on this. But the user can, for a simulation run, make sure changes only appear on clock flanks (within delta's). But this of course depends on the design simulated and the stimulus.
 

Welcome to EDABoard.com

Sponsor

Back
Top