Is it possible that a Virtex II device performs below its sp



I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

"Brannon King" <> wrote in message
I would suggest running your internal clock signal out to a pad and
it. That was how I determined I needed to run DCI on some of my input pads
instead of the default TTL. Sometimes the DCM would lock to the wrong
frequency, but it would always get it correct when I slowed the frequency
I am not using the DCM at all... The internal clock is what comes in and it
is only 50 MHz.

up. As for running below specs, as far as I understand that should only
happen when the incoming power or temperature are out of spec.
That doesn't seem to be the case. The temperature is room or slightly
higher, and the core voltage I measured at 1.506V...


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
I would suggest running your internal clock signal out to a pad and probing
it. That was how I determined I needed to run DCI on some of my input pads
instead of the default TTL. Sometimes the DCM would lock to the wrong
frequency, but it would always get it correct when I slowed the frequency
down. It ended up being reflective noise on the line which the DCI cleared
up. As for running below specs, as far as I understand that should only
happen when the incoming power or temperature are out of spec.

"MM" <> wrote in message
I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

The most likely cause is that your design does not, in fact, meet
timing. This comes about by many ways, most usually a bug in the
constraints, or in the design itself.

Sorry that it isn't some bizarre unheard of problem, but I can only
guess based on the thousands of cases that come through.

Common problems: use of the wrong clock edge, design did not specify
global resources so clocks are being routed using general interconnect,
unconstrained paths leading to inefficient placement by the tools.
Multi-cycle constraints confusing the tool and leading to no constraints
at all.


MM wrote:
I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?

Most likely, you failed to properly constrain the timing on some path in the
design. It is also possible that there is a problem in your clock domain
crossings (you usually need something along with the memory used for the data
path to tell the other side there is valid data). If you have multi-cycle
constraints in the design, that would be the first place I looked, good chance
that something is getting interpreted for multi-cycle when it shouldn't. You
shouldn't need multi-cycle constraints on a 50 MHz design though. Also, check
with your IP provider to make sure his design is really meeting timing and is
properly constrained.

MM wrote:

I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

"Austin Lesea" <> wrote in message

The most likely cause is that your design does not, in fact, meet
timing. This comes about by many ways, most usually a bug in the
constraints, or in the design itself.
I agree. However, it is not one of the basic things such as clock not using
global resource. Here are the relevant parts of the PAR report:

| Clock Net | Resource | Fanout |Max Skew(ns)|Max
| clk50_bufgp | Global | 2655 | 0.314 | 1.330
| lclk_bufgp | Global | 600 | 0.300 | 1.316

Constraint | Requested | Actual |
| | |
TS_clk50 = PERIOD TIMEGRP "clk50" 19 nS | 19.000ns | 18.977ns | 16
HIGH 50.000000 % | | |
TS_lclk = PERIOD TIMEGRP "lclk" 26 nS | 26.000ns | 13.491ns | 6
HIGH 50.000000 % | | |
* TIMEGRP "LAD" OFFSET = OUT 13 nS AFTER C | 13.000ns | 22.516ns | 7
OMP "lclk" | | |
TIMEGRP "LAD" OFFSET = IN 16 nS BEFORE C | 16.000ns | 11.499ns | 4
OMP "lclk" | | |
TIMEGRP "LBUS_CTRL" OFFSET = OUT 13 nS A | 13.000ns | 12.808ns | 1
FTER COMP "lclk" | | |
TIMEGRP "LBUS_CTRL" OFFSET = IN 16 nS BE | 16.000ns | 10.185ns | 3
FORE COMP "lclk" | | |

1 constraint not met.

All signals are completely routed.

Total REAL time to par completion: 11 mins 17 secs
Total CPU time to par completion: 11 mins 12 secs

Placement: Completed - No errors found.
Routing: Completed - No errors found.
Timing: Completed - 32 errors found.

The one constraint that has not been met is not relevant to the problem.
There are no multi-cycle constraints. I realize that from the information I
am giving it is almost impossible to conclude anything. I guess what I am
looking for is any information on what might be uncovered by the clock
constraint in a supposedly synchronous design. I believe that the problem
happens in the 3rd party IP core, which uses only 50 MHz clock, but without
source code I don't know how to find where. So instead of finding the
problem in the core I would like to be able to constrain it somehow that it
will work...

BTW, I am using ISE5.2.03i...


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
Here are a couple of things to try -
[1] Take a look at the the -u report ... keep adding contraints until
unconstrained paths drop to zero.
[2] Take gate level netlist into simulator, and see if there are problems
with simulation.
[3] Make an effort to ensure that all I/O connecting to core are
Observations -
[1] Trace does a terrific job with synchronous paths, and (answering
your original question), part problems are typically more design problems
(but it sounds like you already accept this .... just looking for some
[2] On the marginal boards, hit with shot of cold spray to see if chips
start to opperate at 50 Mhz.
[3] The symptom that one board works, but two don't is a little of a
puzzler. That indicates problem may not be in time domain crossings
but rather in synchronous paths which do not meet timing, where
device specific process variations take have an effect. Or it could
also mean there is something marginal at the PWB level .... GND
scheme, decoupling, marginal voltages that push two units under
threshold. (Check VCC levels ...sorry to state obvious)
[4] Key might be to isolate block that is really failing. Is it really
core? Something like a "signature" analysis on outputs of a block
for periods that result in identical processing are helpful. ie...
Do outputs of block 1 across an identical data set differ among
the "good chip" vs the "bad" devices.
[5] If you can over constrain your clock frequency for the entire
design, or just the core, then try place-and-route with modular
aproach, that might give you margin on your synchronous paths.

Anyway .... good luck.
John Retta
Owner and Designer
Retta Technical Consulting Inc.

email :
web :

"MM" <> wrote in message
I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
Just a thought, did you check the decoupling of the 1.5V supply? Those
MLCCs all look the same, if the board got fitted with 10pF rather than
10nF, it might affect the performance!
You could also try reducing the 19ns constraint, but I'd still be
worried about the thing failing when the timing says it should pass.
Good luck, please let us know how you get on!

"MM" <> wrote in message news:<bvmdfc$t21ft$>...

"Brannon King" <> wrote in message

up. As for running below specs, as far as I understand that should only
happen when the incoming power or temperature are out of spec.

That doesn't seem to be the case. The temperature is room or slightly
higher, and the core voltage I measured at 1.506V...

On Mon, 02 Feb 2004 15:15:06 -0500, MM wrote:

I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic), some
state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
memories). The IP core is a fully synchronous design according to its
author. The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?

It could be a reset path, the timing analyzer doesn't check them unless
you add the following to your UCF file

ENABLE= reg_sr_q;
"B. Joshua Rosen" <> wrote in message
On Mon, 02 Feb 2004 15:15:06 -0500, MM wrote:

It could be a reset path, the timing analyzer doesn't check them unless
you add the following to your UCF file

ENABLE= reg_sr_q;

I don't think there are any async resets in the core but I will try it...


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
Thanks to everyone who replied. I think John's comments cover everyone
else's ideas, so I will answer here..

"John Retta" <> wrote in message
Here are a couple of things to try -
[1] Take a look at the the -u report ... keep adding contraints until
unconstrained paths drop to zero.
I will try this.

Take gate level netlist into simulator, and see if there are
with simulation.
I can't see anything wrong in the simulator...

Make an effort to ensure that all I/O connecting to core are
They are.

Observations -
[1] Trace does a terrific job with synchronous paths, and (answering
your original question), part problems are typically more design
(but it sounds like you already accept this .... just looking for some

On the marginal boards, hit with shot of cold spray to see if chips
start to opperate at 50 Mhz.
This is problematic as the thing doesn't fail completely when it fails, it
rather generates erroneous data once in a while...

The symptom that one board works, but two don't is a little of a
puzzler. That indicates problem may not be in time domain crossings
but rather in synchronous paths which do not meet timing, where
device specific process variations take have an effect. Or it could
also mean there is something marginal at the PWB level .... GND
scheme, decoupling, marginal voltages that push two units under
threshold. (Check VCC levels ...sorry to state obvious)
Electrical problems are not likely. I have designed quite a few FPGA boards,
many in produciton. This one is not much different from what I did before
and it is well decoupled, etc. The voltages are all fine...

Key might be to isolate block that is really failing. Is it really
core? Something like a "signature" analysis on outputs of a block
for periods that result in identical processing are helpful. ie...
Do outputs of block 1 across an identical data set differ among
the "good chip" vs the "bad" devices.
I think it is the core, however I can't say that for sure. The board is a
decoder of some sort. It acts as PCI bus master and takes data from the host
memory and puts into an onboard buffer. Then the core takes it from that
memory, decodes and puts into the output buffer memory. Finally, data from
the output buffer is DMA'ed into the host memory. What I see is that
sometimes data in the board output buffer is slightly corrupted (usually in
the LSB of one of a 1000 words). If I simply read my buffer in a loop, the
data is always the same, it fails only when run through the decoder. It
doesn't fail every time, it can go fine for over 100 cycles sometimes...

If you can over constrain your clock frequency for the entire
design, or just the core, then try place-and-route with modular
aproach, that might give you margin on your synchronous paths.
Well, I have several versions, one of them constrained to below 18 ns and
still failing. It must be some other unconstrained path or perhaps a
different kind of error, but then why it works at 45 MHz?...


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
Sounds like you need to narrow the problem a bit more. Apparently, writing into
the buffer is OK if you can read back data all day without errors. What happens
if you DMA a known test pattern into the host memory? Do you get errors there?
How about if you bypass the 3rd party core, does data trasfer OK then?

If the buffer memory is external to the FPGA, I'd look really closely at the
signal integrity and timing at the RAM interface. Also verify with FPGA editor
that the I/O, particularly all of the I/O to the RAM are in fact registered in
the IOB. Check that you have the appropriate pin slew rates, delays, drive
strength etc on all the pins connecting to the RAM as well as your host. You
need to somehow verify that the problem is occuring in the FPGA and not in the
DMA transfer. I'd suggest putting a test pattern generator or read from
internal memory and checking the DMA'd data to make sure it isn't getting
garbled in the process due to either bus timing or bus collisions.

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
MM wrote:

Thanks to everyone who replied. I think John's comments cover everyone
else's ideas, so I will answer here..
[2] On the marginal boards, hit with shot of cold spray to see if chips
start to opperate at 50 Mhz.

This is problematic as the thing doesn't fail completely when it fails, it
rather generates erroneous data once in a while...
Sounds a good idea, you just need to keep the device cold.
eg ICE in a aluminium cup, or a peltier cooler, or whole shibang in the
freezer.... ( tho local cooling is better, as it focuses on the device )
You can also heat it, and check the error rate degrades further ?


"Ray Andraka" <> wrote in message
Sounds like you need to narrow the problem a bit more. Apparently,
writing into
the buffer is OK if you can read back data all day without errors. What
if you DMA a known test pattern into the host memory? Do you get errors
How about if you bypass the 3rd party core, does data trasfer OK then?
I have done a lot of tests with regards to the memories and I am pretty sure
that part works. However not everything I can try easily. Bypassing the core
sounds like a good idea, but I can't do it in the exisiting design. Input
and output data formats are different and that would require quite a bit of
redesign. What I verified was access to the input/output buffers from the
host side. DMA is irrelevant because when an error happens the content of
the board and host buffers is always the same. It doesn't matter whether the
buffer is read with single PCI target reads or if the DMA is used. And, yes,
you can read this buffer for all day long with the same result. Everything
seems to point towards the block, which actually puts data in the buffer,
i.e. the core...

If the buffer memory is external to the FPGA, I'd look really closely at
signal integrity and timing at the RAM interface.
The memories are internal. The only external part is a PCI bridge.


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
MM wrote:

I think it is the core, however I can't say that for sure.

Synthesize the core all by itself, and see if that works.
Next time, get source code, or write your own.

memory and puts into an onboard buffer. Then the core takes it from that
memory, decodes and puts into the output buffer memory.

How is the data synchronized from buffer to core?

Finally, data from
the output buffer is DMA'ed into the host memory.

How is the output buffer synchronized to the cpu?

What I see is that
sometimes data in the board output buffer is slightly corrupted (usually in
the LSB of one of a 1000 words). If I simply read my buffer in a loop, the
data is always the same, it fails only when run through the decoder. It
doesn't fail every time, it can go fine for over 100 cycles sometimes...

Smells like a synchronization problem.

-- Mike Treseler
"Mike Treseler" <> wrote in message
MM wrote:
Synthesize the core all by itself, and see if that works.
I can't really test it without the rest of the design. In simulation all
seems fine.

Next time, get source code, or write your own.
Not always our choice... Besides, it is truly a big and complex core...

memory and puts into an onboard buffer. Then the core takes it from
memory, decodes and puts into the output buffer memory.

How is the data synchronized from buffer to core?
The core is designed to work with BRAM. It puts out read enable when it
needs data. The clock is common for the core and the read side of the

Finally, data from
the output buffer is DMA'ed into the host memory.

How is the output buffer synchronized to the cpu?
It sits on the PCI controller's local bus. A state machine in the FPGA
programs a DMA channel in the PCI controller and it starts reading the

Smells like a synchronization problem.
It sure does...


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
"MM" <> wrote in message news:<bvmaua$u760t$>...
I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The
timing analyzer reports the clock period to be below 19ns. However, in
practice, only one device out of 3 works at this speed. Two others were
happy when I slowed the clock to 45 MHz (I didn't try any intermediate
frequencies). The design basically consists of a 3rd party IP core, for
which I don't have a source (I believe it was designed in schematic),
Do a post PAR simulation. Note that xilinx uses the same times for
min/typ/max in their sdf files. (Do a search in their web database
for ways to get min sdf timing). Do min and max timing simulation.

state machines, a bus interface and some Coregen memories. The bus runs at
slower clock, but it is fully decoupled from the IP core (through the
What are the two clock rates? Are you using dual ported FIFOs? Are
they getting full?

The IP core is a fully synchronous design according to its
Trust but verify (do post PAR timing sim).

The clock comes directly from an external crystal oscillator. I
tried looking at unconstrained paths in the timing analyzer, but couldn't
see anything suspicious...

Any ideas to where to look?
Get a scope out and look for reflections on your signals, especially

Take the device that works and use a hair dryer to warm it, see if it

Look at using a DCM.

Are you gating any clocks?

"William Wallace" <> wrote in message
"MM" <> wrote in message

Do a post PAR simulation. Note that xilinx uses the same times for
min/typ/max in their sdf files. (Do a search in their web database
for ways to get min sdf timing). Do min and max timing simulation.
I did, although not the min/max... The problem with this is that I can only
simulate a few data frame cycles as it takes very long. Surely the problem
would have to manifest itself during the first frame, but I didn't see it...
Perhaps I need to repeat the whole thing...

state machines, a bus interface and some Coregen memories. The bus runs
slower clock, but it is fully decoupled from the IP core (through the

What are the two clock rates? Are you using dual ported FIFOs? Are
they getting full?
The clock rates are 38 MHz for the local bus and 50 MHz for the core and the
state machine that controls it.

Take the device that works and use a hair dryer to warm it, see if it
As soon as I get my hands on the hardware I will. At the moment my
management is satisfied with the thing working reliably at 45 MHz and all
the hardware went to software guys...

Look at using a DCM.

Are you gating any clocks?


To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")
You know, we are all groping in the dark without having access to your
design specifcaitons, board design, and test benches.

1. DCMs. Without seeing your design, I can only speculate how or if
a DCM will help in your design. If you've used DCMs before, you
probably considered it already. If you haven't, browse application
notes from Xilinx and see if it will help in your situation. One
possible reason is to simply generate a PLL.

2. Your original question (is it possible...): Where did you get
your parts? Disty? Gray Market (dumpster diver)?

3. Try to meet 100 MHz timing. Look at the long paths. Fix these.
E.g., move combinatorial logic from the Q side a flip flop to the D
side of the flip flop. If you have a good version of Synplify, it can
do some of this for you.

4. Do you have a clue where the failure is occuring? Off chip
interface? Boundary between clock domains? Recommendation: Divide
and conquer. E.g., run the dual port FIFO flags to pins and monitor
those. If it repeatedly fails without any anomolies there, you know
that is not your problem. I am groping in the dark not having your
specification or implementation. But divide and conquer works best.

5. If the simulations are lengthy, by the time you read this, you
could have run one long simulation.

6. If the problem is the different clock domains, it will be hard to
find these problems. Xilinx have very small set/hold times, and it is
actually hard to hit them in simulation, even if you try to get a
setup and hold violation. Try modeling some random jitter on one of
your clocks sources during the simulation, or sliding the frequency.

7. Do you have a self-checking test bench?

8. If you think it is between the clock domains, study your
implementation of any status signals you are passing between the two

9. Work with the software guys to develop test cases to narrow down a
scenerio that makes the failure occur more often.

10. Have you put offset specifications in your UCF file?

11. Are you doing any fixed point multiplication? Do you have
multi-cycle paths? Are you sure all of your signals are synchronous
to the clocks they are sampled on.

Anyway, these are all pretty generic obvious things too look at. Only
you and your software guys can divide and conquer.

"MM" <> wrote in message news:<c007rh$sqkst$>...
"William Wallace" <> wrote in message
"MM" <> wrote in message

Do a post PAR simulation. Note that xilinx uses the same times for
min/typ/max in their sdf files. (Do a search in their web database
for ways to get min sdf timing). Do min and max timing simulation.

I did, although not the min/max... The problem with this is that I can only
simulate a few data frame cycles as it takes very long. Surely the problem
would have to manifest itself during the first frame, but I didn't see it...
Perhaps I need to repeat the whole thing...

state machines, a bus interface and some Coregen memories. The bus runs
slower clock, but it is fully decoupled from the IP core (through the

What are the two clock rates? Are you using dual ported FIFOs? Are
they getting full?

The clock rates are 38 MHz for the local bus and 50 MHz for the core and the
state machine that controls it.

Take the device that works and use a hair dryer to warm it, see if it

As soon as I get my hands on the hardware I will. At the moment my
management is satisfied with the thing working reliably at 45 MHz and all
the hardware went to software guys...

Look at using a DCM.


Are you gating any clocks?



Welcome to

