EDK : FSL macros defined by Xilinx are wrong

rickman wrote:
On Feb 2, 3:56 am, Emanuele Carraro
emanuele83katam...@googlemail.com> wrote:
1_ since I have constrained the clock period and offset in and out,
with only one clock domain in the entire FPGA( in other words
everything is clocked by the same source) and all the constraints
are met must I trust the clock report? Even if the design does not
behave as expected? I mean, with the same code and constraints, only
changing a synthesis parameters (from AUTO to one-hot encoding, for
example) the constraints are always met and the timing report does
not show any setup/hold violation, how is it possible that the FPGA
behaves differently, if not for a synthesis error/mistake or an HW
failure?

You are barking up the wrong tree. Clearly you have not done much HDL
debug. The tools are the very last thing you should suspect. In my
15 odd years of programming in HDL I have only once seen a problem
with the tools relating to timing and that we very clearly a tool
problem once we tested on the hardware.

You need to pay attention to what people are telling you instead of
focusing on the idea that the problem has to have something to do with
timing constraints.

I have already suggested that you need to look at external interfaces
to see if they could be causing a problem. If you have asynchronous
signals entering your design and they are not properly handled, they
can cause intermittent and unpredictable problems. You could easily
have problems with the hardware design. Ground bounce is always a
concern when you have many outputs switching at the same time.

I suggest that you simplify your code to isolate the problem. Cut out
sections of logic until the design pieces work reliably. If that is
not practical, write some new code to test the various external
interfaces. Do something other than just testing with your entire
body of code. Or insert debug signals that come out to pins you can
probe. This is the tried and true method in place of chipscope.

Obviously what you are doing now is not getting you very far. You
need to change tactics. You need to take some good advice.

Rick
Good advice from Rick. As another answer to your question #1, I had a design where some of the LOC
constraints from the UCF file were commented out as part of an unrelated experiment. They were not
put back and the result was that after every map run those top level nets moved to different pins,
depending on the placement of the related logic. I was new to FPGA at the time and also blamed
timing problems, and it took quite a while to track down the problem. So there is more to it than
just functional simulation and passing timing constraints.

Regarding timing constraints, don't rely on the timing score reported by par. A design that gets a
timing score of 0 in par can have a negative score from the timing analyzer. I think this is because
par doesn't look at unconstrained paths. Run a timing report, and be sure to run it with -u (I
believe) to report unconstrained paths. Your goal should be no unconstrained paths, although that
can be hard on more complicated designs.

Using FPGA editor you can insert probes that are routed around your existing logic. The online help
has specific directions for doing this. Bring out some internal nets and look at them on the scope.
This is my first choice method of debug, after functional simulation, because the turn time is much
faster than rerunning map/par.

--steve
 
On Feb 2, 3:56 am, Emanuele Carraro
<emanuele83katam...@googlemail.com> wrote:
Hello everybody,
I got on Friday a new board, with brand new FPGAs soldered rightly (it seems).
Now I test my non-working FW and the 3 boards that I have behaves equally, so I have only two possibility:

a_ the synthesis tool is failing, bu I tried the same FW on a different machine with a brand new copy of ISE 12.4 installed and it behaves the same in each and every board.

b_ my coding style is somehow wrong. I checked the Xilinx's seminar on the web at this address and I watched all the basic HDL and Spartan 3 FPGA specific seminars. Ok, my code is not perfectly written but not with huge errors (for example everything is synchronous the reset path is correctly done, but I use nested if and case that are not the right thing (this is what Xilinx says)
For example, the reset code pointed out by Mike herehttps://groups.google..com/d/msg/comp.arch.fpga/eQ5EeHECOQw/rO5YroyQhaUJ
is, in Xilinx's opinion, not the best choice.

Now I decided to rewrite the whole project following Xilinx's advices but before, please, focus on these two questions and only on these two:

1_ since I have constrained the clock period and offset in and out, with only one clock domain in the entire FPGA( in other words everything is clocked by the same source) and all the constraints are met must I trust the clock report? Even if the design does not behave as expected? I mean, with the same code and constraints, only changing a synthesis parameters (from AUTO to one-hot encoding, for example) the constraints are always met and the timing report does not show any setup/hold violation, how is it possible that the FPGA behaves differently, if not for a synthesis error/mistake or an HW failure?

2_ If this is normal: if, changing the code style, or the used resources, the behaviour changes even with all the constraints met, how am I supposed to use a debugger like CHIPSCOPE, which uses the internal FPGA resources? In other words, if introducing a CHIPSCOPE debugger the used resources change and subsequently is expected that the behaviour changes, how is possible to debug correctly?

Only doing errors is possible to learn, but without understanding them is not possible to learn at all.

Many thanks,
Emanuele

P.S. I am using the ISE tool for my projects, have you got advices for a different/surely-better tool for synthesis, translate, map and PAR with also a spread availability of IPs?
You are barking up the wrong tree. Clearly you have not done much HDL
debug. The tools are the very last thing you should suspect. In my
15 odd years of programming in HDL I have only once seen a problem
with the tools relating to timing and that we very clearly a tool
problem once we tested on the hardware.

You need to pay attention to what people are telling you instead of
focusing on the idea that the problem has to have something to do with
timing constraints.

I have already suggested that you need to look at external interfaces
to see if they could be causing a problem. If you have asynchronous
signals entering your design and they are not properly handled, they
can cause intermittent and unpredictable problems. You could easily
have problems with the hardware design. Ground bounce is always a
concern when you have many outputs switching at the same time.

I suggest that you simplify your code to isolate the problem. Cut out
sections of logic until the design pieces work reliably. If that is
not practical, write some new code to test the various external
interfaces. Do something other than just testing with your entire
body of code. Or insert debug signals that come out to pins you can
probe. This is the tried and true method in place of chipscope.

Obviously what you are doing now is not getting you very far. You
need to change tactics. You need to take some good advice.

Rick
 
On Feb 2, 6:33 am, comp arch <comparchf...@gmail.com> wrote:
2) Chipscope has helped me countless times, it really is a great tool.
I do see the dilemma here where you are concerned that adding it will
change the utilization, and as a result change the behavior again. So
the only suggestions i have here are:
- turn on design preservation. This should retain the synthesis and
placement/routing of unchanged parts of the design, and should fit
chipscope blocks into 'free' space. Of course some of the design may
have to be moved, but the tools do all they can to prevent that. If
the place and route are preserved, then so should design behaviour
- if inserting chipscope does cause the design to start functioning -
then I would keep iterating and making small tweaks until the design
is failing once more, but this time with chipscope in.
Bingo! This is not uncommon in software development. Turn on debug
output and the problem goes away. Solution, ship with debug on if
your memory allows it. In the case of FPGAs chipscope is much less
likely to cause fit problems.

With with chipscope turned on and see what you can get done. Even if
the symptoms change, at some point the probes and the problem should
converge.

Rick
 
now I am rewriting the whole code in a more clear way as I learnt from xilinx's web seminars. just a tip:

when I have a state machine and want to remain in a fixed state what is better to write in vhdl?
this

when current_state =>
if nBLAST = '0' then
state_m_plx <= next_state; -- conclude the nREADY sequence
else
null; -- remain here until BLAST
if;


or this?

when current_state =>
if nBLAST = '0' then
state_m_plx <= next_state; -- conclude the nREADY sequence
else
state_m_plx <= current_state; -- remain here until BLAST
if;


I am complaining about what the synthesis tool implement when I write NULL; or state_m_plx <= current_state;

does it matter??
 
does it matter??
It depends on the code that you didn't post.

My preferred (and working) way to write a semi complex state machine,
is a big but lean combinatorial process.

The external inputs are pre-processed in separate processes, as much
as possible. For example, a memory interface can pre-process CS/RD
and ADDR to produce a qualified "read area x" signal, which then is
used by the state machine.

At the top of the process, all combinatorial outputs are assigned
either a default value or the previous value (from a register). After
that, a big CASE statement follows, which handles the various states
and their effect on the combinatorial outputs.

A separate process registers the combinatorial output of the state
machine (and honors reset to provide initial state).

This approach is very simple, produces no latches and other pitfalls,
and reduces the size of the "big process" to the very minimum
necessary. It can be used for purely combinatorial outputs (like same-
cycle ack) as well as for pipelined uses, and mixes thereof. It's all
determined by the various pre-/postprocessing steps, not inside the
state machine.

You only have to look at the synthesis output to make sure there are
no latches, and no "signal x is not on the sensitivity list"
warnings. Those indicate that you made a mistake somewhere.

Best regards
 
now I am rewriting the whole code in a more clear way as I learnt fro
xilinx's web seminars. just a tip:

when I have a state machine and want to remain in a fixed state what i
better to write in vhdl?
this

when current_state =
if nBLAST = '0' then
state_m_plx <= next_state; -- conclude the nREADY sequence
else
null; -- remain here until BLAST
if;


or this?

when current_state =
if nBLAST = '0' then
state_m_plx <= next_state; -- conclude the nREADY sequence
else
state_m_plx <= current_state; -- remain here until BLAST
if;


I am complaining about what the synthesis tool implement when I writ
NULL; or state_m_plx <= current_state;

does it matter??
As per previous reply, it depends on the rest of your code, for instance
what your coding style is. 1/2/3-process FSM?

BUT, the 'null' style is more likely to infer a latch. Latch inference i
bad, and XST will issue warnings about them. If you get them, fix them!


---------------------------------------
Posted through http://www.FPGARelated.com
 
On Wed, 2 Feb 2011 00:56:32 -0800 (PST), Emanuele Carraro
<emanuele83katamail@googlemail.com> wrote:

Hello everybody,
I got on Friday a new board, with brand new FPGAs soldered rightly (it seems).
Now I test my non-working FW and the 3 boards that I have behaves equally, so I have only two possibility:

a_ the synthesis tool is failing, bu I tried the same FW on a different machine with a brand new copy of ISE 12.4 installed and it behaves the same in each and every board.
Almost certainly not.

b_ my coding style is somehow wrong. I checked the Xilinx's seminar on the web at this address and I watched all the basic HDL and Spartan 3 FPGA specific seminars. Ok, my code is not perfectly written but not with huge errors (for example everything is synchronous the reset path is correctly done, but I use nested if and case that are not the right thing (this is what Xilinx says)
For example, the reset code pointed out by Mike here https://groups.google.com/d/msg/comp.arch.fpga/eQ5EeHECOQw/rO5YroyQhaUJ
is, in Xilinx's opinion, not the best choice.
Unlikely - Mike Treseler's style is not quite my choice either, but it is
definitely solid and acceptable to XST.

1_ since I have constrained the clock period and offset in and out, with only one clock domain in the entire FPGA( in other words everything is clocked by the same source) and all the constraints are met must I trust the clock report? Even if the design does not behave as expected? I mean, with the same code and constraints, only changing a synthesis parameters (from AUTO to one-hot encoding, for example) the constraints are always met and the timing report does not show any setup/hold violation, how is it possible that the FPGA behaves differently, if not for a synthesis error/mistake or an HW failure?
Within its own limitations, the timing report is reliable.

2_ If this is normal: if, changing the code style, or the used resources, the behaviour changes even with all the constraints met, how am I supposed to use a debugger like CHIPSCOPE, which uses the internal FPGA resources? In other words, if introducing a CHIPSCOPE debugger the used resources change and subsequently is expected that the behaviour changes, how is possible to debug correctly?
All this is pointing to something else - such as power problems or a noisy (or
poorly terminated) clock signal, or some mistake on the I/O pin constraints.

When you DO solve the problem and get it stable, you will find that minor
changes to the code and settings (adding/removing Chipscope cores etc) will make
no appreciable difference.

One point that I haven't seen mentioned, is to check that your input and output
registers are where you expect them - typically, in the IOBs (I/O blocks). To
find out, look at the relevant section near the end of the MAP report (.mrp
file).
If you want decent predictable I/O signal timing, push the IO registers into the
IOBs, and you will see IFF, OFF, ENBFF listed in this section. (There are
constraints and design choices which can prevent this happening by default, and
it can take some time to find out why and how to overcome them).

- Brian
 
On Feb 9, 5:31 am, Emanuele C <emanuele83katam...@googlemail.com>
wrote:
now I am rewriting the whole code in a more clear way as I learnt from xilinx's web seminars. just a tip:

when I have a state machine and want to remain in a fixed state what is better to write in vhdl?
this

when current_state =
   if nBLAST = '0' then
        state_m_plx <= next_state; -- conclude the nREADY sequence
   else
        null; -- remain here until BLAST
   if;

or this?

when current_state =
   if nBLAST = '0' then
        state_m_plx <= next_state; -- conclude the nREADY sequence
   else
        state_m_plx <= current_state; -- remain here until BLAST
   if;

I am complaining about what the synthesis tool implement when I write NULL; or state_m_plx <= current_state;

does it matter??
I don't think all synthesis tools are the same in all regards. If
your code is part of a clocked process, the assignment of the current
state to the next state can either combine the current state in the
logic for the next state, or it can generate a clock enable. The null
assignment likely will generate a clock enable on the FF. I have
never seen the null assignment generate logic combined with the logic
for the next state.

If the above code is in a combinatorial process, the null assignment
will generate a latch if no other assignment is made to the next state
signal.

Rick
 
I didn't think about the generation of a clock enable. I would like to avoid it, as xilinx pointed out in its webseminar it is always a good choice to use a pipeline when the clock enable is used, but in my case, beeing the code implementing a synch state machine is maybe better that a one hot encoding recognize the "loop" condition. I mean, I think it is better to avoid clock enable when a state machine waits for another signal to switch to the following state...

so..

when current_state =
if nBLAST = '0' then
state_m_plx <= next_state; -- conclude the nREADY sequence
else
state_m_plx <= current_state; -- remain here until BLAST
if;
 
On Feb 10, 3:05 am, Emanuele C <emanuele83katam...@googlemail.com>
wrote:
I didn't think about the generation of a clock enable. I would like to avoid it, as xilinx pointed out in its webseminar it is always a good choice to use a pipeline when the clock enable is used, but in my case, beeing the code implementing a synch state machine is maybe better that a one hot encoding recognize the "loop" condition. I mean, I think it is better to avoid clock enable when a state machine waits for another signal to switch to the following state...

so..

when current_state =
   if nBLAST = '0' then
        state_m_plx <= next_state; -- conclude the nREADY sequence
   else
        state_m_plx <= current_state; -- remain here until BLAST
   if;
Like I said, you don't know that this code won't produce a clock
enable. The only way to tell is to look at what the synthesis tool
produces.

I don't know what Xilinx said that made you think a clock enable is a
bad thing. I expect they said you will need clock enables when
designing a pipeline. That doesn't mean they are bad any other time.
They can actually help speed up an implementation.

Can you explain what you think will happen if you allow a clock enable
on the FSM?

Rick
 
I finally was able to rewrite half of the code in a better way. now the synthesis report earn more than 6MHz speed. i am confident that maybe I can reach my aim of 80Mhz. But now i am wondering about OFFSET IN OUT constraints..
I have a 40MHz clock input then a DCM creates a 80MHz clock for the logic. the problem is that I have an interface which works at 40MHz speed and the interface state machine works at 80Mhz.
So.. how can I set the right offset in out constraints? I can set only the constraints relative to the 40 MHz clock even if the logic works at 80.
in the same way I have also a ram interface which ges to 80Mhz. ho w can I set the constraints relative to this interface if i have only the reference clock that is the 40MHz input?
this is the 40MHz interface, supposing a offset in of 10 ns:
TIMEGRP "PLX_communication" OFFSET = IN 10 ns VALID 12.5 ns BEFORE "XCLK_40MHZ" RISING;
which one is correct?
TIMEGRP "PLX_communication" OFFSET = IN 10 ns VALID 25 ns BEFORE "XCLK_40MHZ" RISING;
from my state machine point of view, data are valid for 12.5 ns but the bus keeps the valid for 25 'cos it works at 40MHz speed
and if everything is reölated to the 40MHz clock, and the derived constraints are automatically declared by SW, ho w can I set the constraints for the RAM interface at 80MHz?
 
On Wednesday, March 2, 2011 2:33:34 AM UTC-5, a s wrote:
Johnp, Brian, thank you too for your input! Much appreciated.

I have ran your code through 2 synthesisers and have updated the table
of required resources.

-------------- 32-bit input data --------------
unrolled: XST 74 LUTs, 41 slices
unrolled: SynplifyPro 57 LUTs, 34 slices

loop: XST 100 LUTs, 54 slices
loop: SynplifyPro 57 LUTs, 34 slices

funct: XST 317 LUTs, 161 slices
funct: SynplifyPro 58 LUTs, 34 slices

JohnpV1: XST 62 LUTs, 35 slices
JohnpV1: SynplifyPro 57 LUTs, 33 slices

JohnpV2: XST 78 LUTs, 43 slices
JohnpV2: SynplifyPro 54 LUTs, 32 slices

Brian: XST 57 LUTs, 39 slices
Brian: SynplifyPro 57 LUTs, 41 slices


The latest 3 pairs of results are interesting because even
XST produces good results, especially in Brian's version
where XST is surprisingly even slightly better. But anyway,
it's not that XST is so clever, it is a clever coding of the design.

Regards,
Peter
I didn't catch which device you are targeting, but I
decided to try this myself with XST and Spartan 3A,
using Verilog to see if there are any significant
differences in synthesis performance.

Here's the code:
module count_bits
#(
parameter IN_WIDTH = 32,
parameter OUT_WIDTH = 6
)
(
input wire [IN_WIDTH-1:0] data_in,
output reg [OUT_WIDTH-1:0] data_out
);

always @*
begin : proc
integer i;
integer sum;
sum = 0;
for (i = 0;i < IN_WIDTH;i = i + 1) sum = sum + data_in;
data_out = sum;
end

endmodule

And the results for the 32-bit case (XST)

Number of Slices: 41 out of 1792 2%
Number of 4 input LUTs: 73 out of 3584 2%

which is very close to your original unrolled result.

-- Gabor
 
On Wednesday, March 2, 2011 3:38:09 PM UTC-5, a s wrote:
On Mar 2, 5:52 pm, Gabor <ga...@alacron.com> wrote:
I didn't catch which device you are targeting, but I
decided to try this myself with XST and Spartan 3A,
using Verilog to see if there are any significant
differences in synthesis performance.

I am targeting Virtex4FX.


I get the same results with XST targeting V4.

But that's really interesting how XST produces better results
with Verilog than with VHDL for basically exactly the same input.

Running your module through Synopsys results again
in seemingly "optimum" 57LUTs and 34 slices.

I find it pretty amusing how many options did we come up already
with such a "basic" problem as is counting ones in a word. ;-)

Regards
I thought I should try this with Virtex 5, since it
has larger LUT's and should therefore greatly reduce
the required logic. The results were less than
dramatic. XST still ends up with 65 LUT's for V5.

So I tried again with V6. As far as I know the V6
has a similar LUT to the V5, but suddenly XST
gives me only 35 LUT's (I checked other resources
to be sure it didn't also use DSP blocks). So
either V6 has more flexibly carry logic, or (more
likely) XST has been tuned up a bit to get better
results with V6 and the new optimization is not
applied to the older technology. Yet another
reason to use the latest chips if you want to
use the chip vendors tools.

-- Gabor
 
Gabor <gabor@alacron.com> wrote:
(snip regarding bit counting)

I thought I should try this with Virtex 5, since it
has larger LUT's and should therefore greatly reduce
the required logic. The results were less than
dramatic. XST still ends up with 65 LUT's for V5.
The CSA solution uses the fact that three bits can have
four possible counts of bits that are one, and that
(zero to three) fits in two bits. The extension to that
would allow for seven bits with eight possible counts
(zero to seven) in three bits. The six input LUT
solution, with six bits going to three, would be slightly
less efficient. Also, that solution would be found by
putting two levels of the usual CSA tree into six input LUTs.

-- glen
 
Gabor <gabor@alacron.com> writes:

I thought I should try this with Virtex 5, since it
has larger LUT's and should therefore greatly reduce
the required logic. The results were less than
dramatic. XST still ends up with 65 LUT's for V5.

So I tried again with V6. As far as I know the V6
has a similar LUT to the V5, but suddenly XST
gives me only 35 LUT's (I checked other resources
to be sure it didn't also use DSP blocks). So
either V6 has more flexibly carry logic, or (more
likely) XST has been tuned up a bit to get better
results with V6 and the new optimization is not
applied to the older technology.
XST for the "6" families uses a whole new parser. IIRC you can
enable for older families (as an unsupported option) with a command
line switch.

Although I'm not entirely sure why a new *parser* would help the actual
synthesis, as far as I know it just enables XST to parse a larger
subset of the VHDL language.

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware
 
On Tue, 8 Mar 2011 18:22:22 -0800 (PST), Brian Davis <brimdavis@aol.com> wrote:

Philippe posted:

1. There is no clear definition of what a "benchmark result" is,
you don't know when you are breaching the contract.

I agree that those clauses are a bit much.
Maybe an answer is for someone to set up a site to post benchmark results anonymously.
 
On Tue, 8 Mar 2011 18:22:22 -0800 (PST), Brian Davis <brimdavis@aol.com
wrote:

Philippe posted:

1. There is no clear definition of what a "benchmark result" is,
you don't know when you are breaching the contract.

I agree that those clauses are a bit much.

Maybe an answer is for someone to set up a site to post benchmark result
anonymously.
Somebody already did!
http://www.deepchip.com/


---------------------------------------
Posted through http://www.FPGARelated.com
 
"Symon" <symon_brewer@hotmail.com> wrote in message
news:ioalgd$vho$1@dont-email.me...
On 4/15/2011 9:36 PM, Morten Leikvoll wrote:
Im looking for an analog oscilloscope in the 2Ghz+ analog bw range and
wonder if you have any experience to share. Im used to the infiniium
54825,
but want to go faster (but not spend a fortune on a new one). I've seen a
couple of "old" 54846 on ebay, and one recently went for $2800 wich is a
price I can handle, but the next price on the list is not that nice.
I want to probe LVDS@1-2GHz signals, DVI and ddr3 memory buses at 533Mhz.


Hi Morten,
How much is Hyperlynx?
HTH, Syms.
More than the cost of a decent scope - and it's only a simulation so garbage
in -> garbage out.

HTH

Phil
 
On 4/16/2011 10:37 AM, Phil Jessop wrote:
"Symon"<symon_brewer@hotmail.com> wrote in message
news:ioalgd$vho$1@dont-email.me...
On 4/15/2011 9:36 PM, Morten Leikvoll wrote:
Im looking for an analog oscilloscope in the 2Ghz+ analog bw range and
wonder if you have any experience to share. Im used to the infiniium
54825,
but want to go faster (but not spend a fortune on a new one). I've seen a
couple of "old" 54846 on ebay, and one recently went for $2800 wich is a
price I can handle, but the next price on the list is not that nice.
I want to probe LVDS@1-2GHz signals, DVI and ddr3 memory buses at 533Mhz.


Hi Morten,
How much is Hyperlynx?
HTH, Syms.

More than the cost of a decent scope - and it's only a simulation so garbage
in -> garbage out.

HTH

Phil


Hi Phil,
Perhaps you can explain how you would use a 'scope to measure the OP's
"LVDS@1-2GHz signals"?
Thanks, Symon.
 
"Symon" <symon_brewer@hotmail.com> wrote in message
news:ioc0t1$8h2$1@dont-email.me...
On 4/16/2011 10:37 AM, Phil Jessop wrote:
"Symon"<symon_brewer@hotmail.com> wrote in message
news:ioalgd$vho$1@dont-email.me...
On 4/15/2011 9:36 PM, Morten Leikvoll wrote:
Im looking for an analog oscilloscope in the 2Ghz+ analog bw range and
wonder if you have any experience to share. Im used to the infiniium
54825,
but want to go faster (but not spend a fortune on a new one). I've seen
a
couple of "old" 54846 on ebay, and one recently went for $2800 wich is
a
price I can handle, but the next price on the list is not that nice.
I want to probe LVDS@1-2GHz signals, DVI and ddr3 memory buses at
533Mhz.


Hi Morten,
How much is Hyperlynx?
HTH, Syms.

More than the cost of a decent scope - and it's only a simulation so
garbage
in -> garbage out.

HTH

Phil


Hi Phil,
Perhaps you can explain how you would use a 'scope to measure the OP's
"LVDS@1-2GHz signals"?
Thanks, Symon.
Hi Symon,

???

Use a 2GHz scope with a differential probe. (Tek P7500 series or similar)

Are you new to this game?

Thanks

Phil
 

Welcome to EDABoard.com

Sponsor

Back
Top