EDK : FSL macros defined by Xilinx are wrong

Jim Granville · Apr 21, 2006

fpga_toys@yahoo.com wrote:

http://www.fpgajournal.com/articles_2006/20060321_bell.htm

and also this one :
http://www.eet.com/news/design/showArticle.jhtml?articleID=183702216

Here, AMIS claim 3000+ (rolling sum) FPGA->ASIC conversions.

Atmel also have a ULC conversion program.

-jg

Ray Andraka · Apr 21, 2006

Austin Lesea wrote:

Bob,

We designed the PMCD so that the outputs all all matched within tens of
picoseconds across P-V-T. From there, you get onto the BUFGs, and you
end up with the usual +/-100ps rule for match between BUFgs.

Austin, That is basically what I was told when Virtex I came out too,
but it turned out that jitter on the DLL input could cause a spread
between the 1x and 2x clocks (IIRC, I had specifically asked both here
on the NG and through the hotline about the alignment of the 1x and 2x
clocks and whether it was safe to cross clock bounds without extra logic
and was told it was fine. I don't think anyone realized then what
effect jitter would have on the relative phases of the 1x and 2x outputs.

So my question is, I realize you've designed the PMCDs for tight
tolerances, but how well does that stand up to jitter on the clock input?

Austin Lesea · Apr 21, 2006

Johnp,

The simple answer is, no, we don't publish the information you are
asking for, as we have practically no reason to support 'hand crafted'
designs (results in too many unhappy people -- been there, done that).

Does the path in question span a BRAM column? That would be one reason
for the difference.

Generally, differences are real, and we know they are there, and they
are ususally there for a very good reason (that is the way it was in
layout).

The "accepted" way of doing this, is to create a macro or block with its
own contstraints, hard fixed or relatively fixed, and let the tools
place it properly...but I admit that doing that is tough to fight the
tools to squeeze ps out of a design. Resorting to FPGA_Editor, and just
placing it exactly where it belongs and works is easier. It is just
hell to support, and maintain.

There are many on this forum who know how to squeeze and navigate, and
do what you need done, but I suspect they get paid for that knowledge...

Austin

johnp wrote:

I'm working on a V2Pro design that needs to have a small
portion operate at over 400 MHz. As I've looked into the
timing, I've noticed that similar routing between slices
seems to have different timing delays. For example:

Location Delay type Delay(ns)

-------------------------------------------------
SLICE_X34Y1.YQ Tcko 0.374

SLICE_X34Y3.BY net (fanout=1) 0.614
SLICE_X34Y3.CLK Tdick 0.202

-------------------------------------------------
Total 1.190ns

**************************************************

Location Delay type Delay(ns)

-------------------------------------------------
SLICE_X66Y42.YQ Tcko 0.374

SLICE_X66Y44.BY net (fanout=1) 0.407
SLICE_X66Y44.CLK Tdick 0.202

-------------------------------------------------
Total 0.983ns

Note that both circuites route from a YQ output, jump two slices,
then go to a BY input. Yet, the net delays vary by 200 psec.

Ideally, I'd pack the 2 flip-flops in one slice, but in my design they
are clocked by opposite clock edges as I convert a DDR signal from
the negedge into the posedge domain.

Can anyone explain the difference in interconnect delay? Does
Xilinx publish anything that really explains how to get the
shortest routing delay?

Thanks!

John Providenza

johnp · Apr 21, 2006

Austin -

Thanks for the comments. I'm just frustrated because if I run
multi-pass P&R, the tools can find a solution that meets timing,
but I don't see any hints as to how to use RLOCs to force the
critical cells into magical alignments that produce the smaller
interconnect delay.

John Providenza

Jim Granville · Apr 21, 2006

Austin Lesea wrote:
<snip>

AMIS:

FY 2001 2002 2003 2004 2005
$$ 326M 345M 454M 517M 504M
SA 0 0 96.7M 119M 110.4M

Hmmm....

<paste> Austin's earlier claims...

155M$ is the whole MARKET (IN 2005, ISuppli). That was spread over as many as 12 companies in that year.
LSI had the largest share of that, at 35M$. Everyone one else had a
smaller chunk than 35M$.

If I read your table above correctly, you have AMIS at $110.4M SA in
2005, but just a few "Austin-Arm-Waves" ago, you had LSI as the Largest
player, at $35M ?!

Still, I guess it makes for rather less dramatic arm-waving, if it is
not _actually_ the largest player that has just exited... ?

Shame to let the numbers get in the way of a good spin

-jg

Jim Granville · Apr 21, 2006

johnp wrote:

Austin -

Thanks for the comments. I'm just frustrated because if I run
multi-pass P&R, the tools can find a solution that meets timing,
but I don't see any hints as to how to use RLOCs to force the
critical cells into magical alignments that produce the smaller
interconnect delay.

Since Austin has revealed these are real numbers, could you
somehow fine tune the costraints, so the longer paths
do not (quite) make the cut ?
-jg

Ray Andraka · Apr 21, 2006

johnp wrote:

Austin -

Thanks for the comments. I'm just frustrated because if I run
multi-pass P&R, the tools can find a solution that meets timing,
but I don't see any hints as to how to use RLOCs to force the
critical cells into magical alignments that produce the smaller
interconnect delay.

John Providenza

Ideal placement no longer guarantees an ideal route, sorry to say. In
releases before the 5.1 "escape", there used to be a delay based
clean-up so that if you gave it a perfect placement, the router did a
darned good job of getting the routing correct. Since then however, the
router only works as hard as it has to for the whole design to meet
timing. The thing is, it no longer picks the low hanging fruit (ie the
direct connects) consistently, which in turn congests the other routing
resources. As a result, the router winds up stepping all over itself
trying to get something that meets timing; if you are pushing the
performance hard, the router will often not be able to find a solution
that meets timing in a dense design unless you happened to have the
right cost table (MPPR iterates using different cost tables to affect
the routing order). In those cases, about your only option, assuming
you've already tried setting the router to maximum effort continue on
impossible, is to use directed routing...basically hand routing it with
the FPGA editor and then exporting the routing info into a ucf. I don't
wish that on my worst enemy, as it is a gruelling task if more than a
small design or macro.

Ben Jackson · Apr 21, 2006

On 2006-03-19, adventleaf@gmail.com <adventleaf@gmail.com> wrote:

when FRAME# is asserted target is expected to latch the Address/Command
respectively.
because after that FRAME# will be de-asserted and DATA/Byte_Enable
follows.

For a config cycle you have to look at IDSEL instead. The config
read/write commands are different from mem read/write. IDSEL will
often be the same as one of the address lines, so you must ignore it
when the command is not a config cycle.

For an example try:

http://www.ben.com/minipci/

Specifically you can see on:

http://www.ben.com/minipci/verilog.php

the difference between cfg_hit and addr_hit.

--
Ben Jackson
<ben@ben.com>
http://www.ben.com/

Austin Lesea · Apr 21, 2006

JG,

Who ya going to believe?

AMIS finacial post on their webpage (to the federal governement)?

Or some silly market research company that they try get people to buy?

I noticed this too.

So, is AMIS overstating their Structured ASIC wins to their stockholders
and the governement? Shades of Enron?

Or is Isuppli using a different definition? Or just doesn't know anything?

If this is about < 35M$ vs ~104M$, then I rest my case: structured ASIC
is dying...or already dead.

What the investors thought they would be seeing is another multi billion
dollar market by now. And growth.

These numbers (pick any) are just pitiful.

Austin

Jim Granville · Apr 21, 2006

Ray Andraka wrote:

johnp wrote:

Austin -

Thanks for the comments. I'm just frustrated because if I run
multi-pass P&R, the tools can find a solution that meets timing,
but I don't see any hints as to how to use RLOCs to force the
critical cells into magical alignments that produce the smaller
interconnect delay.

John Providenza

Ideal placement no longer guarantees an ideal route, sorry to say. In
releases before the 5.1 "escape", there used to be a delay based
clean-up so that if you gave it a perfect placement, the router did a
darned good job of getting the routing correct. Since then however, the
router only works as hard as it has to for the whole design to meet
timing. The thing is, it no longer picks the low hanging fruit (ie the
direct connects) consistently, which in turn congests the other routing
resources.

If a PCB router did this, it would be laughed out of the market...

-jg

John McGrath · Apr 21, 2006

Hi,
Can you do this:
Take the tricky part of the design into a new blank design on its own.
Get the P&R to churn away on that until it gets something that works
(should be easier, as there is nothing from a huge design to get in its
way).
Then export that routing to a ucf, as suggested by Ray, and finally re
run with this ucf for the full design.
I could be completly wrong - but it might save you the hassle of hand
placing.

Maybe something like PlanAhead also - that seems to be incredible for
constraining the P&R tools in a graphical way - check out the Demos on
Demand for it on the Xilinx website.

Brian Davis · Apr 21, 2006

Ray wrote:

The thing is, it no longer picks the low hanging fruit (ie the direct connects)
consistently, which in turn congests the other routing resources.

One thing I've done in 5.x and 6.x ( but haven't tried in 7.x or 8.x )
that seems to work OK is to go into FPGA editor with a simple test
design, find the delays for the direct connect paths I want it to use,
then stick a MAXDELAY on those nets to force the router to use
those connections.

This has worked well in conjunction with placed logic without
resorting to the directed routing constraints ( at least for the small
sections of critical logic that I've used it for so far, I'm not sure
if a horde of MAXDELAYs would blow up P&R for a big RPM ).

Brian

rickman · Apr 21, 2006

Austin Lesea wrote:

Back to Kevin:
"Every customer that doesn't use a particular function
that is hard-wired on the chip is essentially paying for wasted silicon. "
Austin

"Paying for wasted silicon"... isn't that what FPGAs are all about,
wasting a bunch of silicon so that you get the benefit of the small
portion that you use? In fact, isn't that the crux of the philosophy
of Easypath, not testing all the unused, wasted silicon?

Once a Xilinx FAE was trying to explain the economics of FPGAs (logic
vs. routing) and expressed the fact that the routing dominates the die
area by saying, "We sell you the routing and give you the logic for
free". That was supposed to mean that I should not be concerned that I
could not use all the logic (wasted silicon) because of routing
congestion.

Ironic that Xilinx would hold Kevin's feet to the fire for
understanding wasted silicon when Xilinx's entire business model is
founded on "wasted silicon".

;^)

Austin Lesea · Apr 21, 2006

Rick,

I don't understand your comment.

Of course we have "wasted" silicon. That is the basis of the whole FPGA
architecture. It is also the basis for our success:

Knowing what to put in the chip, and what not to put in the chip.

Knowing what to hardern, and what not to harden.

Making bizillions of the same chip to get economies of scale so the
"wasted" silicon doesn't cost the customer what it would if it was an ASIC.

It is more of an issue if you are a structured ASIC (back to Kevin... he
gets it). If you are a structured ASIC, you are attempting to be a
platform for a wide range of designs, with two or three masks for all
customization.

You need: MGTs, BRAMs, DSPs, PLLs/DLLs, EMACs, PCIe, (and the list goes
on, and on, and on).

As an ASIC this is really expensive, if no one uses even one of these
cells. That is serious area we are talking about here. Doubling the
area with wasted stuff.

For an FPGA, the routing alone used up area (along with the memory cells
to control it) so adding a whole bunch of hardened IP just made the FPGA
more of a bargian, not less. Adding less than 5% area with stuff that
may, or may not get used, but is used far more often than interconnect!

Austin

rickman wrote:

Austin Lesea wrote:

Back to Kevin:
"Every customer that doesn't use a particular function
that is hard-wired on the chip is essentially paying for wasted silicon. "
Austin

"Paying for wasted silicon"... isn't that what FPGAs are all about,
wasting a bunch of silicon so that you get the benefit of the small
portion that you use? In fact, isn't that the crux of the philosophy
of Easypath, not testing all the unused, wasted silicon?

Once a Xilinx FAE was trying to explain the economics of FPGAs (logic
vs. routing) and expressed the fact that the routing dominates the die
area by saying, "We sell you the routing and give you the logic for
free". That was supposed to mean that I should not be concerned that I
could not use all the logic (wasted silicon) because of routing
congestion.

Ironic that Xilinx would hold Kevin's feet to the fire for
understanding wasted silicon when Xilinx's entire business model is
founded on "wasted silicon".

;^)

johnp · Apr 21, 2006

Thanks for all the thoughts about this issue.

With my design, I put tight (but achievable) constraints on my critical
signal and used RLOCs to lock the flip-flops in reasonable postitions.

If I use the multi-pass P&R, sometimes the tool makes timing on the
critical net, sometimes it just misses. So having a 'good' constraint
won't force P&R to perform correctly.

Gien that the identical verilog is used for my tests, Sean's comments
about local clock inversion probably don't apply. It purely appears to
depend
on P&R.

I'll give Brian's MAXDELAY tip a try next.

I'll keep you posted on results.

John Providenza

Ray Andraka · Apr 21, 2006

Brian Davis wrote:

Ray wrote:

The thing is, it no longer picks the low hanging fruit (ie the direct connects)
consistently, which in turn congests the other routing resources.

One thing I've done in 5.x and 6.x ( but haven't tried in 7.x or 8.x )
that seems to work OK is to go into FPGA editor with a simple test
design, find the delays for the direct connect paths I want it to use,
then stick a MAXDELAY on those nets to force the router to use
those connections.

This has worked well in conjunction with placed logic without
resorting to the directed routing constraints ( at least for the small
sections of critical logic that I've used it for so far, I'm not sure
if a horde of MAXDELAYs would blow up P&R for a big RPM ).

Brian

Brian, I've tried that. For one or two it seems to work. For many, it
slows PAR way down and usually won't find a solution where all the
maxdelays get met, not to mention making the UCF a nightmare.

Brian Davis · Apr 21, 2006

Ray Andraka wrote:

For one or two it seems to work. For many, it slows PAR way down
and usually won't find a solution where all the maxdelays get met

Ok, that sounds (dare I say) about par for the course...

Have you ever gotten anywhere in convincing Xilinx to add a flag
to the router to restore the old, more consistent, behavior?

not to mention making the UCF a nightmare.

I just stuck the MAXDELAYs in the source near the net keep directives.

I have enough UCF nightmares already; on the bright side, at least
they haven't made the UCF a binary file yet

Brian

JustJohn · Apr 21, 2006

dotnetters@gmail.com wrote:

Hello All!
(virtex board)... we had to write a hardware stack. After having made it work,
we thought of optimizing the design and hence removed a few states and
reduced the no. of states from 8 to 4. The older code was getting
synthesized in around 20 mins, but the new code takes hours together to
get synthesized, and so does the PAR. How can we reduce the synthesis
time? Why is that the code which took lesser time to get synthesized is
now taking longer?

I'm going to assume here that by hardware stack, you mean a pushdown
stack involving RAM and pointers and all that. I ran into a problem
yesterday (with ROM, not RAM, but may be the same root cause). The ROM
was inferred by VHDL code, not generated with Coregen. The description,
although it was perfectly valid, was not one that XST recognized as
being synthesizable into BlockRAM, and the RAM (ROM) was being put into
distributed memory instead. This absolutely kills performance of both
XST and PAR. What XST could not handle was the addition of a reset
(init value) to the code description. Although the part (Spartan 3) has
this functionality, XST can't handle it. So check your synthesis
report, see whether your RAM is going into BRAM or distRAM.

Side note: I see that XST now does support inference of Dual(write)Port
BRAMs as of 8.1i (I just loaded SP 3, out today). Kudos to Xilinx, keep
it coming, please add the Set/Reset function to XST's BRAM inference
repertoire soonest!

HTH,
John

Jim Granville · Apr 21, 2006

Brian Davis wrote:

Ray Andraka wrote:

For one or two it seems to work. For many, it slows PAR way down
and usually won't find a solution where all the maxdelays get met

Ok, that sounds (dare I say) about par for the course...

Have you ever gotten anywhere in convincing Xilinx to add a flag
to the router to restore the old, more consistent, behavior?

From a Software Architecture viewpoint, the _sensible_ thing would
be to allow a selection of router by resource, and also an order.

That way, you could tell it to use a simple, direct path router
on the fast nets first, and then the speed-driven router on the
other nets..

I fear the 'old router' code is long lost, in the mists of time...

not to mention making the UCF a nightmare.

I just stuck the MAXDELAYs in the source near the net keep directives.

I have enough UCF nightmares already; on the bright side, at least
they haven't made the UCF a binary file yet

ssssshhh ! - someone in Xilinx might think that's a good idea!

-jg

Apr 21, 2006

v_mirgorodsky@yahoo.com> schrieb im Newsbeitrag
news:1143294891.415443.17250@v46g2000cwv.googlegroups.com...
Hello, ALL!

In our design we are planning to use Spartan-3E in PCI 33/66
environment. We have developed our own PCI core. Since the code is
completely RTL and does not have any platform-specific features we were
able to test it with existing Altera ACEX-1K PCI33 board. Running on
speed grade 2 Altera ACEX-1K device our core has about 1.5-2ns out of
7ns Tsu margin and even more for Tout. Now for production design we are
planning to move to Xilinx Spartan-3E 500 FPGA. During detailed
investigation of FT256 package we found several strange pins, marked as
IRDY1, TRDY1, IRDY2 and TRDY2. Do these pins have any significant
meaning for PCI designs? Unfortunately, I did not find any explanations
in Spartan-3E datasheet, neither in accompanying application notes.

With best regards,
Vladimir S. Mirgorodsky

Antti Lukats wrote:

there is undocumented PCI_LOGIC primitive in S3e, so the special pins should
be used with that special primitive. there is no official info but several
people/companies have 'reverse engineered' the PCI_LOGIC and are actually
using it (or able to use it)

Antti

Hello Antti,

and thanks for your response. So, that seems that I will not have any
advantages using those pins in my design

I was hoping to avoid
whatever floorplanning stuff during migration to PCI66, as I did for
PCI33. Is there any possibility, that Xilinx will disclose
functionality behind those pins in the future? Is there any recommended
layouts available for PCI33/66?

With best regards,
Vladimir S. Mirgorodsky

EDK : FSL macros defined by Xilinx are wrong

Jim Granville

Guest

Ray Andraka

Guest

Austin Lesea

Guest

johnp

Guest

Jim Granville

Guest

Jim Granville

Guest

Ray Andraka

Guest

Ben Jackson

Guest

Austin Lesea

Guest

Jim Granville

Guest

John McGrath

Guest

Brian Davis

Guest

rickman

Guest

Austin Lesea

Guest

johnp

Guest

Ray Andraka

Guest

Brian Davis

Guest

JustJohn

Guest

Jim Granville

Guest

Guest

Log in

Welcome to EDABoard.com

Sponsor