FPGAs starting with incorrect bitstream !?

A

Antti Lukats

Guest
Hi

until recently I did live in good faith that all decent FPGAs do have
bitstream integrity checks and do not start in case of configuration loading
errors.

This seems not to be case at least for Xilinx Virtex2 FPGAs.

I do have a desing and FPGA evaluation system where I constantly see
bitstreams that start but have erratic behaviour. This can only be explained
that there have been errors during download but impact (JTAG download) does
not report and error and FPGA starts as it would be OK. After power off and
reconfigure the error is gone.

1) from Xilinx answers: if prog_b pin is being pulsed during JTAG download
then the FPGA configuration sync is lost what yields to bullshit loaded into
FPGA and FPGA starting with that bullshit with no errors being reported
during configuration. My system has a button and pullup resistor on prog
pin - nobody is pushing it during download.

2) Xilinx Virtex2 FPGA have a new feature called AutoCRC what is more
reliable as the CRC used in older FPGAs. The normal CRC check (RCRC command
and write to CRC register) are still being used unless its a debug
bitstream! -- Good god, but why does impact generate bitstreams with CRC
value fixed 0x5F57 for all Virtex2/p/s3 devices ?? the meaning of CRC is
that is not constant but calculated?
Ok, the AutoCRC is written, but the AutoCRC should only operate on frame
data? how are other config writes protected if the normal CRC check seems to
be bypassed ???

Antti
PS 0x0000DEFC !!!

for those who do not know the meaning 0xDEFC its the DEFault Crc value
written to CRC register when CRC check is disabled.
When CRC check is enabled CRC is 0x5F57 but the meaning of that - sorry I
can not decode! it must be a magical value that matches any good CRC value
(a calculated value!)

PPS Xilinx: where is the algorithm for AutoCRC ???
 
Antti Lukats wrote:

I do have a desing and FPGA evaluation system where I constantly see
bitstreams that start but have erratic behavior.
Check your clock and reset.
Consider simulation before synthesis.
There are many possible sources of erratic
behavior after download.

2) Xilinx Virtex2 FPGA have a new feature called AutoCRC what is more
reliable as the CRC used in older FPGAs. The normal CRC check (RCRC
command and write to CRC register) are still being used unless its a debug
bitstream! -- Good god, but why does impact generate bitstreams with CRC
value fixed 0x5F57 for all Virtex2/p/s3 devices ??
I would expect a fixed crc sum for a good packet.
The packet generator should add the proper suffix word (FCS)
to make this happen.

Good luck.

-- Mike Treseler
 
On Sat, 3 Jul 2004 18:54:16 -0700, "Antti Lukats" <antti@case2000.com>
wrote:

<stuff snipped>
its not funny to simulate Full 1M Gate with MicroBlaze !
Does this mean it wasn't simulated?

and you can not simulate badly configured FPGA anyway, can you?
No, but it remains to be seen whether that's the problem. If you
haven't simulated, start there.


Bob Perlman
Cambrian Design Works
 
"Mike Treseler" <mike_treseler@comcast.net> wrote in message
news:-c2dnd1xN7olRnvd4p2dnA@comcast.com...
Antti Lukats wrote:

I do have a desing and FPGA evaluation system where I constantly see
bitstreams that start but have erratic behavior.

Check your clock and reset.
Consider simulation before synthesis.
There are many possible sources of erratic
behavior after download.
this erratic behaviour only happens with known good working bitstream on
some downloads.
the whole system (1M gate system with MicroBlaze system) is working but soft
core microcontroller sees some hard-wired registers return random data (not
pre programmed constant). This bad register is consistent for one download
attempt and persist after hardware reset also.

you have hardwired register that should be read as 0xAA always - but on some
download attempts it reads lets 0xE1 every time you do hardware reset. next
download is ok again.

its not funny to simulate Full 1M Gate with MicroBlaze ! and you can not
simulate badly configured FPGA anyway, can you?

hm but the check clock and reset, hm, that is a good thing todo maybe, the
system has 2 clock domains running from 2 different external clock inputs
and 3 DCMs. So the reset of the system is not simple. and yes the register
that returns bad data is in other clock domain the system SoC.

2) Xilinx Virtex2 FPGA have a new feature called AutoCRC what is more
reliable as the CRC used in older FPGAs. The normal CRC check (RCRC
command and write to CRC register) are still being used unless its a
debug
bitstream! -- Good god, but why does impact generate bitstreams with CRC
value fixed 0x5F57 for all Virtex2/p/s3 devices ??

I would expect a fixed crc sum for a good packet.
The packet generator should add the proper suffix word (FCS)
to make this happen.
but the fixed checksum doesnt seem possible there are 2 checksum locations
1 AutoCRC after frame data this calculated and OK
2 normal CRC this is fixed to 5F57

no way the AutoCRC is correct CRC for previous data and also fixes the next
CRC to have a constant value!!

Good luck.

-- Mike Treseler
thanks, Mike
Antti
 
On Mon, 5 Jul 2004 23:09:27 -0700, "Antti Lukats" <antti@case2000.com>
wrote:

"Bob Perlman" <bobsrefusebin@hotmail.com> wrote in message
news:3vpde0p9u98hlop2aucsd27b11b919h8i8@4ax.com...
On Sat, 3 Jul 2004 18:54:16 -0700, "Antti Lukats" <antti@case2000.com
wrote:

stuff snipped
its not funny to simulate Full 1M Gate with MicroBlaze !

Does this mean it wasn't simulated?

yes it means that the 1M gate desing with 32K application code for
Microblaze has not bein simulated. All the custome IP cores connected to
Microblaze of course have been simulated.

and you can not simulate badly configured FPGA anyway, can you?

No, but it remains to be seen whether that's the problem. If you
haven't simulated, start there.

Dear Bob,

I have a bitstream that starts always OK when loaded from configuration
memory, and start with erratic behaviour 1 from 100 JTAG configuration
attempts (even when JTAG configuration did not show any error during
download).
I don't know what this means. Are you getting erratic behavior in 1
out of 100 JTAG downloads? Or 100% of JTAG downloads?

When the bitstream starts badly it behavies badly after reset
also, only full new reconfiguration makes the system to working again. So I
do assume it is possible that the CRC check is not sufficent in Virtex2
devices and that they actually do start also in case of bad download
sometimes.
Resets do not reset everything. They do not, for example,
re-initialize block RAM. If you are depending on the initial contents
of a block RAM for proper operation, and your circuit occasionally
stomps on block RAM shortly after start-up, your circuit may not work
until you reconfigure.

You suggested this erratic behaviour of bad starting when loading from JTAG
could be found running simulations ?! Well I really cant understand that any
simulation models could take into account the errors that happend during
download. ?? Or what was it what I could possible find in simulation?
As I said in my previous post, you haven't proved that configuration
is the problem. And I'm not, repeat, NOT, suggesting that you somehow
simulate the configuration process. But it would be interesting to
know if there's a way resources like block RAMs could be corrupted
shortly after you come out of reset, perhaps due to problems with
interfaces between mutually asynchronous clock domains.

I can't rule out the possibility that you are occasionally loading a
corrupted bitstream, but it seems very unlikely. Doctors have a
saying: when you hear hoofbeats, think horses, not zebras. If I had a
design that I didn't simulate, and configuration seemed to complete
successfully, I'd start looking somewhere other than configuration for
my problem.

Good luck,
Bob Perlman
Cambrian Design Works
 
Antti Lukats wrote:
<snip>
Dear Bob,

I have a bitstream that starts always OK when loaded from configuration
memory, and start with erratic behaviour 1 from 100 JTAG configuration
attempts (even when JTAG configuration did not show any error during
download). When the bitstream starts badly it behavies badly after reset
also, only full new reconfiguration makes the system to working again. So I
do assume it is possible that the CRC check is not sufficent in Virtex2
devices and that they actually do start also in case of bad download
sometimes.
Have you tried read-back of the FPGA in these cases ?
Could be a candidate for an overnight run of continual
download/readback's - is this single device specific, or
common to multiple FPGAs ?
-jg
 
"Bob Perlman" <bobsrefusebin@hotmail.com> wrote in message
news:3vpde0p9u98hlop2aucsd27b11b919h8i8@4ax.com...
On Sat, 3 Jul 2004 18:54:16 -0700, "Antti Lukats" <antti@case2000.com
wrote:

stuff snipped
its not funny to simulate Full 1M Gate with MicroBlaze !

Does this mean it wasn't simulated?
yes it means that the 1M gate desing with 32K application code for
Microblaze has not bein simulated. All the custome IP cores connected to
Microblaze of course have been simulated.

and you can not simulate badly configured FPGA anyway, can you?

No, but it remains to be seen whether that's the problem. If you
haven't simulated, start there.
Dear Bob,

I have a bitstream that starts always OK when loaded from configuration
memory, and start with erratic behaviour 1 from 100 JTAG configuration
attempts (even when JTAG configuration did not show any error during
download). When the bitstream starts badly it behavies badly after reset
also, only full new reconfiguration makes the system to working again. So I
do assume it is possible that the CRC check is not sufficent in Virtex2
devices and that they actually do start also in case of bad download
sometimes.

You suggested this erratic behaviour of bad starting when loading from JTAG
could be found running simulations ?! Well I really cant understand that any
simulation models could take into account the errors that happend during
download. ?? Or what was it what I could possible find in simulation?

Antti
 
I'm coming a little late to this conversation, but perhaps this has not been
considered.
I sincerely doubt it is a configuration problem. Much more likely, you are not
coming
out of reset at the end of configuration cleanly. The global reset must be
considered
asynchronous to the clock. Most likely, you are occasionally getting a
situation where
one or more flip flops are seeing the end of the configuration reset a clock
cycle before
or after other flip flops in a critical area of your design. Simulation usually
won't catch
this, so you need to do a careful examination of the start up of your design. I
can't tell you
the number of designs I've seen that make this common mistake, even from FPGA
board
vendors with much experience that really should know better.

Check the state machines in your design. The resets for them should come from a

flip-flop in the design that feeds all the reset inputs to the state machine.
You can't
depend on global reset going away on all flip-flops during the same clock cycle.




Antti Lukats wrote:

"Bob Perlman" <bobsrefusebin@hotmail.com> wrote in message
news:3vpde0p9u98hlop2aucsd27b11b919h8i8@4ax.com...
On Sat, 3 Jul 2004 18:54:16 -0700, "Antti Lukats" <antti@case2000.com
wrote:

stuff snipped
its not funny to simulate Full 1M Gate with MicroBlaze !

Does this mean it wasn't simulated?

yes it means that the 1M gate desing with 32K application code for
Microblaze has not bein simulated. All the custome IP cores connected to
Microblaze of course have been simulated.

and you can not simulate badly configured FPGA anyway, can you?

No, but it remains to be seen whether that's the problem. If you
haven't simulated, start there.

Dear Bob,

I have a bitstream that starts always OK when loaded from configuration
memory, and start with erratic behaviour 1 from 100 JTAG configuration
attempts (even when JTAG configuration did not show any error during
download). When the bitstream starts badly it behavies badly after reset
also, only full new reconfiguration makes the system to working again. So I
do assume it is possible that the CRC check is not sufficent in Virtex2
devices and that they actually do start also in case of bad download
sometimes.

You suggested this erratic behaviour of bad starting when loading from JTAG
could be found running simulations ?! Well I really cant understand that any
simulation models could take into account the errors that happend during
download. ?? Or what was it what I could possible find in simulation?

Antti
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
If it is a V2, and you only experience problems when downloading from
JTAG...Keep in mind that the V2 supports partial reconfiguration. As such,
the JTAG bit-banger from IMPACT doesn't invoke the global clear when
re-configuring, so BRAMs will not initialize, and FFs _may_ not. I think I
saw a Xilinx solution record with information on how to invoke the global
initialization through JTAG manually...the Chipscope tool does this
currently. Or, you can manually short the PROG pin to ground.

By the way, this is documented in the V2 design guide.

-S


Dear Bob,

I have a bitstream that starts always OK when loaded from configuration
memory, and start with erratic behaviour 1 from 100 JTAG configuration
attempts (even when JTAG configuration did not show any error during
download). When the bitstream starts badly it behavies badly after reset
also, only full new reconfiguration makes the system to working again.
So I
do assume it is possible that the CRC check is not sufficent in Virtex2
devices and that they actually do start also in case of bad download
sometimes.

You suggested this erratic behaviour of bad starting when loading from
JTAG
could be found running simulations ?! Well I really cant understand that
any
simulation models could take into account the errors that happend during
download. ?? Or what was it what I could possible find in simulation?

Antti
 
Antti Lukats wrote:
"Ray Andraka" <ray@andraka.com> wrote in message
news:40F5B093.70E01445@andraka.com...

I'm coming a little late to this conversation, but perhaps this has not

been

considered. I sincerely doubt it is a configuration problem. Much more

likely, you are not

coming out of reset at the end of configuration cleanly. The global reset

must be

Hi Ray,

didnt notice some more replies to my post, thanks!

well let me again explain the situation:

its Virtex2, it has Microblaze with 32k BRAM, I am using both impact and
Chipscope
to download the bitstreams. The bitstream is known good, but in some cases
after download one hard coded register is read by microblaze like giving
wrong
readback. The readback is constant for given configuration attempt. And the
wrong read value persists after any number of hardware reset. Only goes away
after new reconfiguration. The wrong read value comes from an verilog wire
(that has an assigned constant value). I still do not see how the clocking
or reset problem could do that. If the bitstream is loaded again the problem
disappears. If the same bitstream is loaded from configuration memory there
is never a problem.

BRAMs are initialized, flip flops are initialized ok, or they are not
relevant
in the current problem. If the FPGA is not able to start with errors during
actual configuration download, I would say this problem should never
have occoured.

Ray - if you notice my plea to give information about Xilinx Auto-CRC
has been left un-responded. Virtex 2 bitstream does not include normal
CRC as it used be in spartan II/E. Its replaced with AutoCRC. But there
is no information how it is calculated anywhere in any public documents!

Xilinx says that the old CRC was not good enough and did not catch all
errors during configuration !! But I bet the new one is not much better!

Antti
If I read this right, you are saying that read-back does show the
error, and that error persists on many read-backs until re-config ?
That does sound like a config-write-error.
Have you tried multiple devices (ideally with differing datecodes ?)
If this persists across device/date code boundaries, I would say it
shows a serious blind spot.
In general, any device program includes a verify step, and on an
FPGA devices skipping verify has probably become the norm, because of
'saving time' reasons.
If the CRC is not sufficently reliable, then
that would make config something of a lottery.
[just maybe they do not CRC the whole bitstream ?]

Perhaps someone from Xilinx could clarify more what AutoCRC is, and does ?

-jg
 
On Mon, 19 Jul 2004 10:00:43 +1200, Jim Granville
<no.spam@designtools.co.nz> wrote:

Antti Lukats wrote:
"Ray Andraka" <ray@andraka.com> wrote in message
news:40F5B093.70E01445@andraka.com...

I'm coming a little late to this conversation, but perhaps this has not

been

considered. I sincerely doubt it is a configuration problem. Much more

likely, you are not

coming out of reset at the end of configuration cleanly. The global reset

must be

Hi Ray,

didnt notice some more replies to my post, thanks!

well let me again explain the situation:

its Virtex2, it has Microblaze with 32k BRAM, I am using both impact and
Chipscope
to download the bitstreams. The bitstream is known good, but in some cases
after download one hard coded register is read by microblaze like giving
wrong
readback. The readback is constant for given configuration attempt. And the
wrong read value persists after any number of hardware reset. Only goes away
after new reconfiguration. The wrong read value comes from an verilog wire
(that has an assigned constant value). I still do not see how the clocking
or reset problem could do that. If the bitstream is loaded again the problem
disappears. If the same bitstream is loaded from configuration memory there
is never a problem.

BRAMs are initialized, flip flops are initialized ok, or they are not
relevant
in the current problem. If the FPGA is not able to start with errors during
actual configuration download, I would say this problem should never
have occoured.

Ray - if you notice my plea to give information about Xilinx Auto-CRC
has been left un-responded. Virtex 2 bitstream does not include normal
CRC as it used be in spartan II/E. Its replaced with AutoCRC. But there
is no information how it is calculated anywhere in any public documents!

Xilinx says that the old CRC was not good enough and did not catch all
errors during configuration !! But I bet the new one is not much better!

Antti

If I read this right, you are saying that read-back does show the
error, and that error persists on many read-backs until re-config ?
Good question. Antti, when you say that "readback" is consistent, are
you referring to the MicroBlaze's readback of that one register, or
are you saying that you are seeing an error when you perform a
bitstream readback?

Bob Perlman
Cambrian Design Works
 
"Ray Andraka" <ray@andraka.com> wrote in message
news:40F5B093.70E01445@andraka.com...
I'm coming a little late to this conversation, but perhaps this has not
been
considered. I sincerely doubt it is a configuration problem. Much more
likely, you are not
coming out of reset at the end of configuration cleanly. The global reset
must be

Hi Ray,

didnt notice some more replies to my post, thanks!

well let me again explain the situation:

its Virtex2, it has Microblaze with 32k BRAM, I am using both impact and
Chipscope
to download the bitstreams. The bitstream is known good, but in some cases
after download one hard coded register is read by microblaze like giving
wrong
readback. The readback is constant for given configuration attempt. And the
wrong read value persists after any number of hardware reset. Only goes away
after new reconfiguration. The wrong read value comes from an verilog wire
(that has an assigned constant value). I still do not see how the clocking
or reset problem could do that. If the bitstream is loaded again the problem
disappears. If the same bitstream is loaded from configuration memory there
is never a problem.

BRAMs are initialized, flip flops are initialized ok, or they are not
relevant
in the current problem. If the FPGA is not able to start with errors during
actual configuration download, I would say this problem should never
have occoured.

Ray - if you notice my plea to give information about Xilinx Auto-CRC
has been left un-responded. Virtex 2 bitstream does not include normal
CRC as it used be in spartan II/E. Its replaced with AutoCRC. But there
is no information how it is calculated anywhere in any public documents!

Xilinx says that the old CRC was not good enough and did not catch all
errors during configuration !! But I bet the new one is not much better!

Antti
 
[snip]
Xilinx says that the old CRC was not good enough and did not catch all
errors during configuration !! But I bet the new one is not much
better!

Antti

If I read this right, you are saying that read-back does show the
error, and that error persists on many read-backs until re-config ?

Good question. Antti, when you say that "readback" is consistent, are
you referring to the MicroBlaze's readback of that one register, or
are you saying that you are seeing an error when you perform a
bitstream readback?

Bob Perlman
Cambrian Design Works
Microblaze starts, i.e. DCM works, BRAMs init ok, etc...
I press HW reset and RTL revsison registers (hard-wired)
reads 23.27 as example not 1.21 as it is wired to return.
this wrong readback 23.27 persists after any number of
hardware reset (reset to microblaze and all registered logic).
after reconfing the problem is away. Some other time
the wrong readback maybe differently wrong but again
it remains constant until reconfig.

And yes it looks like there are chances that V2 bitstream
can be starting even if it had errors during download.
And yes i would like xilinx to document AutoCRC function ;)

Antti
 
Antti,

In a recent design we came across "configuration initialization
problem"
that sounds similar to what you are noticing. But in our case it is
a Spartan-IIE device and the failure is incorrect initialization of
the SRL.

In our design we used SRL16E to create a divide by 16 counter.
Essentially
a 16bit circular shift register loaded with 0x0001.
We have a chain of these to a create a 250ms tick from a 66MHz free
running clock. After the successful configuration we expect the 250ms
tick
to be free running. The 250ms tick works most of the time but it fails
once in a while. We couldn't explain the failure, couldn't solve the
problem.
Fortunately we could work around the problem by replacing SRL based
counters with FF based counters. Note that SRLs do not have any reset
pin
hence no reset dependancy.

The question I couldn't answer was if there was a corruption in bit
stream,
and FPGA CRC logic didn't catch it, why did failure only affected SRL
INIT
value? Why didn't it affect, say a SLICE configuration? Why didn't
some other
logic in FPGA mis-behave?

here are more details

More details.
- Target FPGA device is XC2S50E-6FT256C
- FPGA configuration mode is Master Serial (M2, M1, M0 == 0,0,0). The
CCLK is driven by FPGA only.
- Serial PROM used is XCF01SVO20
- RTL : Verilog
- Synthesis tool : SynplifyPro 7.5.0
- Xilinx Tool : ISE6.1i, SP3

Typical SRL usage code
------------------------------------------------------------------------------
defparam u_free_tmr0.INIT = 16'h1;
SRL16E u_free_tmr0 (
.CLK (core_clk_out),
.A0 (1'b1), .A1 (1'b1), .A2 (1'b1), .A3 (1'b1),
.CE (1'b1),
.D (free_out15),
.Q (free_out15));
------------------------------------------------------------------------------
In our design we have mutiple instantiations of SRL similar to what is
shown above. We also use the UART
macro from Xilinx listed in XAPP223. The macro also uses SRL to
implement a divide by 16.
Source clock to the SRL is from a free running osciallator (66MHz)
present on the board.

I confirmed that code was correctly implemented by checking the init
value of the SRL in fpga_editor

Experiments we carried out

Ex1 : Turn on the power. Ensure FPGA is configured (DONE=1, INIT# =
1). Check whether free running clock from SRL is running. If clock is
running, power cycle else stop.
Result : We saw failures where the SRL in our part of design didn't
oscillate. We also saw instances where the SRL
in UART macro didn't oscillate. Some times it took 30
tries sometimes 500 tries.

Ex2 : Turn on the power. Pull PROG# pin of FPGA low. Pull PROG# pin of
FPGA high.
Ensure FPGA is configured (DONE=1, INIT# = 1). Check whether
free running clock from
SRL is running. If clock is running, reconfigure the FPGA by
toggling PROG#.
Result : We saw failures where the SRL in our part of design didn't
oscillate. We also saw instances where the SRL
in UART macro didn't oscillate. Some times it took 30
tries sometimes 250 tries.

Ex3 : We replaced the SRL based "divide by 16" counters with FF based
counters in our part of the design. The UART
macro still contained SRL based divide by 16. Repeat "Ex2".
Result : We saw failures only in the SRL macro of the UART. The divide
by 16 counters implemented using FF never
had any failures.

The problem is seen on multiple boards.

I'm glad I found someone seeing similar symptoms that we were
struggling with
for couple of weeks.

- praveen


"Antti Lukats" <antti@case2000.com> wrote in message news:<cdfom2$1dd$03$1@news.t-online.com>...
Xilinx says that the old CRC was not good enough and did not catch all
errors during configuration !! But I bet the new one is not much
better!

Antti

If I read this right, you are saying that read-back does show the
error, and that error persists on many read-backs until re-config ?

Good question. Antti, when you say that "readback" is consistent, are
you referring to the MicroBlaze's readback of that one register, or
are you saying that you are seeing an error when you perform a
bitstream readback?

Bob Perlman
Cambrian Design Works

Microblaze starts, i.e. DCM works, BRAMs init ok, etc...
I press HW reset and RTL revsison registers (hard-wired)
reads 23.27 as example not 1.21 as it is wired to return.
this wrong readback 23.27 persists after any number of
hardware reset (reset to microblaze and all registered logic).
after reconfing the problem is away. Some other time
the wrong readback maybe differently wrong but again
it remains constant until reconfig.

And yes it looks like there are chances that V2 bitstream
can be starting even if it had errors during download.
And yes i would like xilinx to document AutoCRC function ;)

Antti
 

Welcome to EDABoard.com

Sponsor

Back
Top