EDK : FSL macros defined by Xilinx are wrong

On 11/30/2012 3:36 PM, kaz wrote:
On 11/30/2012 12:55 AM, Bart Fox wrote:
rickman wrote:

I have no idea why you say an async reset won't work in a sync design.
Do I misunderstand your statement? I am talking about FPGAs where every
chip has an async reset during configuration. You can choose to use
this in your design or not, but it is there and it works no matter what
you do. I supposed I should have qualified my statement to FPGAs that
use RAM configuration and have to be configured. There aren't a lot of
true flash based device that come up instantly without a configuration
process.

Rick


Indeed. Altera recommends using the reset async port but with sync signal
pre-synchronised.
That only makes sense if the delay in the async reset path is short
enough and properly analyzed by the tools. There have been any number
of discussions in these groups about how to properly reset a design and
there is no consensus on the best way to do it.


Xilinx, I believe, recommends using synchronous sync. But
again we need to be careful about our wording. Synchronous sync is actually
applied to input D through logic and does not mean necessarily it is
pre-synchronised. Whether you name it async or sync, the signal be default
is not pre-synchronised and for sake of timing at release, they have to be
generated from flip's clock domain before applying it.
The best way to do a reset is design specific. As I said above, there
are many ways and no agreement on which is "best".


With today's large designs I prefer not to apply reset unless absolutely
needed. I know many of us will apply it as routine to masses of buses at
every node but I believe it puts massive burden on fitter to meet
removal/recovery timing when such effort better be directed somewhere more
critical.
That depends on why you are using the reset. If you are only using it
to establish the configuration values you can apply an async reset and
not be concerned with routing since it will use only the dedicated async
reset network.

Rick
 
On 11/30/2012 3:49 PM, kaz wrote:
Note also the resource difference between the case of using async port(just

routing) and sync reset(logic).
I'm not clear on this. I believe some architectures provide sync reset
inputs separate from the D input and so use no logic.

Rick
 
On 11/30/2012 9:25 PM, KJ wrote:
On Friday, November 30, 2012 3:07:17 PM UTC-5, rickman wrote:
My understanding is that logic delays in FPGAs are always longer
than clock delays on the clock trees so that you can't have a
hold time violation. If the clock is routed on the signal routing
then all bets are off. I don't know how well the timing analysis
does with verifying clock delays on the signal routing because I
have never needed to use it that I can remember

It doesn't need to depend on routing. Thomas' example introduced a logic delay between the two clocks. The implementation of that logic will create a race condition.
I didn't see anything in Thomas' example that required "logic". Here is
what I read. Did I read the wrong post?

To demonstrate your point though you simply need to generate the new clock as this:

clk1 <= clk;
This is not logic. In VHDL it inserts a delta delay which is a zero
amount of time but treated as a delay in the simulator. By adding a
delta delay it will disrupt signals from the earlier clk domain that
drive FFs in the later clk1 domain. In a real chip there will be no delay.

Rick
 
rickman <gnuarm@gmail.com> wrote:
On 11/30/2012 3:49 PM, kaz wrote:

Note also the resource difference between the case of using asyn
port(just
routing) and sync reset(logic).

I'm not clear on this. I believe some architectures provide
sync reset inputs separate from the D input and so use no logic.

Well, it will still take some resources, but maybe not much.

-- glen
I am not aware of this type of architecture. I assume it synchronise
reset
per flip and thus seems a waste of silicon compared to the case of user
pre-syncing it once for all relevant flips.

Kaz

---------------------------------------
Posted through http://www.FPGARelated.com
 
rickman <gnuarm@gmail.com> wrote:
On 11/30/2012 3:49 PM, kaz wrote:

Note also the resource difference between the case of using async port(just
routing) and sync reset(logic).

I'm not clear on this. I believe some architectures provide
sync reset inputs separate from the D input and so use no logic.
Well, it will still take some resources, but maybe not much.

-- glen
 
On 12/1/2012 2:05 AM, kaz wrote:
rickman<gnuarm@gmail.com> wrote:
On 11/30/2012 3:49 PM, kaz wrote:

Note also the resource difference between the case of using async
port(just
routing) and sync reset(logic).

I'm not clear on this. I believe some architectures provide
sync reset inputs separate from the D input and so use no logic.

Well, it will still take some resources, but maybe not much.

-- glen


I am not aware of this type of architecture. I assume it synchronises
reset
per flip and thus seems a waste of silicon compared to the case of user
pre-syncing it once for all relevant flips.
I don't know the Altera devices as intimately as others, but I learned
the Xilinx stuff pretty well once. They have (had) two inputs to each
FF, one for reset and one for set, which were configurable between sync
and async.

My point is that if this is built in, which is not uncommon I think,
there are no "used" resources other than dedicated resources for logic
in the case sync reset and nothing for async.

I prefer to design my resets in a customized way where each section of
logic is reset and released from reset asynchronously and each logic
section separately takes care of the problems of cleanly starting up.
That way there is no global reset competing for either routing or logic.
Often nothing special needs to be done for a finite state machine
(FSM) because it starts by waiting for some trigger signal anyway.
Counters often use an enable which is disabled by default, etc. I pay
attention to how my circuits operate from reset and so far this has not
bitten me.

Rick
 
On Saturday, December 1, 2012 1:47:06 AM UTC-5, rickman wrote:
On 11/30/2012 9:25 PM, KJ wrote:
It doesn't need to depend on routing. Thomas' example introduced a
logic delay between the two clocks. The implementation of that logic
will create a race condition.
I didn't see anything in Thomas' example that required "logic". Here is
what I read. Did I read the wrong post?
The post that I referring to has the following...
Clk1 <= Clk when Selected else Other_Clk';
[..]
Clk2 <= Clk1 when Enabled else '0';
[..]
process (Clk)
if rising_edge(Clk) then
A <= B;
[..]
process (Clk2)
if rising_edge(Clk2)
B<= A;

I believe the point he was trying to make was that because of the simulation delta delay between Clk1 and Clk2 the two processess would not be clocked at the same time. While it is true that the processes would clock at different times, it's not really because of the simulation delta delay. An actual implementation of the above would have the same problem because it would need to synthesize the logic to create the gated clock. The propogation delay and additional routing delay in creating the additional clock would create a race condition for signals generated in the 'clk' domain and captured in the 'clk1' or 'clk2' domains.

clk1 <= clk;

This is not logic. In VHDL it inserts a delta delay which is a zero
amount of time but treated as a delay in the simulator. By adding a
delta delay it will disrupt signals from the earlier clk domain that
drive FFs in the later clk1 domain. In a real chip there will be no
delay.
And that was my point.

Kevin Jennings
 
On 12/1/2012 7:33 PM, KJ wrote:
On Saturday, December 1, 2012 1:47:06 AM UTC-5, rickman wrote:
On 11/30/2012 9:25 PM, KJ wrote:
It doesn't need to depend on routing. Thomas' example introduced a
logic delay between the two clocks. The implementation of that logic
will create a race condition.
I didn't see anything in Thomas' example that required "logic". Here is
what I read. Did I read the wrong post?

The post that I referring to has the following...
Clk1<= Clk when Selected else Other_Clk';
[..]
Clk2<= Clk1 when Enabled else '0';
[..]
process (Clk)
if rising_edge(Clk) then
A<= B;
[..]
process (Clk2)
if rising_edge(Clk2)
B<= A;

I believe the point he was trying to make was that because of the simulation delta delay between Clk1 and Clk2 the two processess would not be clocked at the same time. While it is true that the processes would clock at different times, it's not really because of the simulation delta delay. An actual implementation of the above would have the same problem because it would need to synthesize the logic to create the gated clock. The propogation delay and additional routing delay in creating the additional clock would create a race condition for signals generated in the 'clk' domain and captured in the 'clk1' or 'clk2' domains.

clk1<= clk;

This is not logic. In VHDL it inserts a delta delay which is a zero
amount of time but treated as a delay in the simulator. By adding a
delta delay it will disrupt signals from the earlier clk domain that
drive FFs in the later clk1 domain. In a real chip there will be no
delay.

And that was my point.

Kevin Jennings
I don't know where that code came from, but yes, I think you are
accurately analyzing it. Your first post was replying to the OP's post
containing the link to the blog. I didn't see anything like this in the
blog code, there was no muxing of the clock. Where did you get the code
shown above?

Rick
 
On Sunday, December 2, 2012 10:43:04 PM UTC-5, rickman wrote:
I didn't see anything like this in the blog code, there was no muxing of the clock. Where did you get the code shown above?
I was replying to Thomas' post on Nov 28. I must've clicked the wrong 'Post Reply' button or something. Link is
https://groups.google.com/forum/#!search/In$20DSP$20I$20guess$20you$20have$20in$20general$20only$20one$20clk$20and$20all$20modules$20/comp.arch.fpga/zec7-hbtrJ8/TJOleMXV490J

Kevin
 
On 12/3/2012 9:22 PM, KJ wrote:
On Sunday, December 2, 2012 10:43:04 PM UTC-5, rickman wrote:
I didn't see anything like this in the blog code, there was no muxing of the clock. Where did you get the code shown above?

I was replying to Thomas' post on Nov 28. I must've clicked the wrong 'Post Reply' button or something. Link is
https://groups.google.com/forum/#!search/In$20DSP$20I$20guess$20you$20have$20in$20general$20only$20one$20clk$20and$20all$20modules$20/comp.arch.fpga/zec7-hbtrJ8/TJOleMXV490J

Kevin
Ok, I finally found it, no thanks to Google groups. Somehow the post
didn't link to the right place in my reader. I've seen it screw up before.

I've kinda lost track of your point. But Thomas seems to be correct in
what he said. But your point is correct, that adding logic delays to
the clock distribution makes life difficult. I believe the tools
typically handle that. If they didn't you would be limited to the
number of global clock routes on a chip. In the real world you can have
local clock distribution and the timing tools should verify and report
setup and hold timing violations.

Certainly adding logic to clock distribution makes things much more
complex.

Rick
 
On Wednesday, January 9, 2013 3:43:54 PM UTC+13, rickman wrote:
I'm wondering if there is something about the architecture of these
parts that precludes an up/down counter in one LUT/bit.
I thought the tile in an iCE40 was not as smart as the MachXO2, and so that was why any iCE40 design uses more cells ?
Did you try re-targeting to the MachXO2 ?

-jg
 
On 1/10/2013 3:50 PM, jg wrote:
On Wednesday, January 9, 2013 3:43:54 PM UTC+13, rickman wrote:
I'm wondering if there is something about the architecture of these
parts that precludes an up/down counter in one LUT/bit.

I thought the tile in an iCE40 was not as smart as the MachXO2, and so that was why any iCE40 design uses more cells ?
Did you try re-targeting to the MachXO2 ?

-jg
I looked at the documentation and the only restriction I could see is
that all the clock enables within a block share the same input pin. So
there is only one clock enable for the entire block. I wouldn't think
this is a real problem, at least not one that required using extra LUTs.
Even if you are using the extra input, each LUT has four and there are
only four inputs, counter value, carry in, up/down flag and enable.
Also, the extra LUT is combining the up/down flag and the enable flag.
Why would that have to be done more than once? So this is messed up in
three different ways...

We'll see what Lattice says.

Rick
 
I believe the "preserve" option was not even available after they first introduced the Advanced waveform viewer. I am going to guess that the reason that signals were not preserved was related to a software implementation difficulty, and had little to do with the "user experience"...
 
That is really useful, are you powering the EPC2 with the JTAG or it used a power supply?
 
On 6/3/2013 10:53 PM, Nikolaos Kavvadias wrote:
Nikolaos Kavvadias
Thank you, Mr. Nikolaos Kavvadias, for the SPAM!

I'll remember that next time my company evaluates IP vendors.

JJS
 
Hi John,

Thank you, Mr. Nikolaos Kavvadias, for the SPAM!

I'll remember that next time my company evaluates IP vendors.
I think that such kind of announcements (straight, informative and to
the point) are far from spam.

Personally, I find myself reading such "ANN"s, most of the time.

I didn't intend to make you feel offended, just to let people know
what kind of IP is available as a standalone offering.

If you have any technical questions, observations or suggestions I
will be glad to answer them.

Kind regards,
Nikolaos Kavvadias
http://www.nkavvadias.com
http://www.perfeda.gr


 
Thank you, Mr. Nikolaos Kavvadias, for the SPAM!
Nowadays, I think we should be happy for every bit of traffic we get here at comp.arch.fpga ;-)

On-topic:
I am not sure if your IP allows for arbitrary Zs (e.g. x MOD 253) or just special ones.

About a year ago I had to do mod 3 and mod 7 operations on about 12b wide operands. After thinking almost a complete day, I came up with a solution that was extremely fast and small (I think just about 20 LEs for each case). But my approach would not work for every modulo.

Regards,

Thomas
www.entner-electronics.com
 
Hi Thomas,

Nowadays, I think we should be happy for every bit of traffic we get here at comp.arch.fpga ;-)
I feel this way too ^_^

On-topic:
I am not sure if your IP allows for arbitrary Zs (e.g. x MOD 253) or just special ones.
The modv IP (x mod by an independent variable, z) allows you to use
any z.

The modk IP has to be fixed at compile/elaboration time to a specific
constant K.
However, you can change the value of K (it is a generic) and therefore
modk can be configured to support any positive integer constant.

About a year ago I had to do mod 3 and mod 7 operations on about 12b wide operands. After thinking almost a complete day, I came up with a solution that was extremely fast and small (I think just about 20 LEs for each case).. But my approach would not work for every modulo.
Yes, you are right. mod 3, 7, 10, 12 are some of the most popular
constants.
A general circuit description has the benefit of removing all this
redesigning burden from you.

Thomas, have you read the product brief and the documentation?
Here are corresponding links to both:

http://perfeda.gr/data/documents/xmodz-pb.pdf
http://perfeda.gr/data/documents/xmodz-README.pdf

I also think that is possible to provide Modelsim (or GHDL on Linux)
compiled files (not the HDL itself) for evaluation (for free). And/or
to supply a synthesis report for your specific cases, just for
reference purposes.

Best regards
Nikolaos Kavvadias
Regards,

Thomaswww.entner-electronics.com
 
Hi Thomas,

this is a summary of synthesis reports for XMODZ on your suggested
configurations.

The results have been obtained with Xilinx XST/ISE 12.3 for a small
Virtex-6 device (XC6VLX75T).

"REG" designs have pipeline registers at each stage, "COMB" designs
only have registers at the output.

(This table is best viewed with a monospace font).

+--------+------+----------+------------+------------+------------
+-------------+
| Design | Mode | Latency | Throughput | Min. clk per| LUTs |
Regs |
| | | | | |
| |
+--------+------+----------+------------+------------+------------
+-------------+
| modv | REG | 14 | 1 | 3.5 ns | 174 |
235 |
+--------+------+----------+------------+------------+------------
+-------------+
| modv | COMB | 1 | 1 | 32 ns | 154 |
13 |
+--------+------+----------+------------+------------+------------
+-------------+
| modk=3 | REG | 12 | 1 | 2.53 ns | 111 |
98 |
+--------+------+----------+------------+------------+------------
+-------------+
| modk=3 | COMB | 1 | 1 | 5.76 ns | 36 |
13 |
+--------+------+----------+------------+------------+------------
+-------------+
| modk=7 | REG | 11 | 1 | 2.53 ns | 112 |
94 |
+--------+------+----------+------------+------------+------------
+-------------+
| modk=7 | COMB | 1 | 1 | 21.86 ns | 88 |
13 |
+--------+------+----------+------------+------------+------------
+-------------+

Best regards
Nikolaos Kavvadias
 
So I finally got around to adding some debug signals which I would
monitor on an analyzer and guess what, the bug is gone! I *hate* when
that happens. I can change the code so the debug signals only appear
when a control register is set to enable them, but still, I don't like
this. I want to know what is causing this DURN THING!

Anyone see this happen to them before?

--

Rick
Yes, This is called a "Heisenbug". Usually involves a clock domai
crossing
mistake.


John Eaton


---------------------------------------
Posted through http://www.FPGARelated.com
 

Welcome to EDABoard.com

Sponsor

Back
Top