EDK : FSL macros defined by Xilinx are wrong

Martin Riddle · Apr 21, 2006

"Ziggy" <Ziggy@TheCentre.com> wrote in message news:XLMbe.25668$NU4.19867@attbi_s22...

David wrote:
On Wed, 27 Apr 2005 11:42:58 +0000, Uwe Bonnes wrote:

In comp.arch.fpga license_rant_master <none@nowhere.net> wrote:
: I am an ASIC engineer who frequently 'takes work home' with me.
: Recently, I began using ssh to remotely login to our company's
: servers to run some Verilog/VHDL simulations. Launching
: sims (from the UNIX command line) is fairly easy and painless,
: but any kind of interactive (GUI) operations are pitifully
: slow over an WAN/internet connection. In the past, I
: haven't needed to do much more than check on running jobs,
: restart them, then logout. Now, I find the need to do some
: interactive debugging work (waveform viewing, code editing,
: etc.)

Look at NX. It what LBX (Low Bandwidth X ) promised, but NX
delivers. Probably not to easy to set yet, but worth a try.

It's easy enough to set up the server (either look at the commercial
version from www.nomachine.com, or google for "freenx" or "nxserver") on
linux, and clients are even easier (download free from nomachine). It is
said to be usable over a modem connection - I have certainly found it
works well over ADSL for most work. It's definitely faster than tightVnc
(which is also okay for many things - and works well for pretending you
are sitting at your office windows desktop).

Bye

Too bad its not easy to setup a server in FreeBSD ( i know, totaly OT )

Like others will most likely point out, something like TightVNC would
support compression, and is even easier to setup then NX..

With SSH, He is actually half way there, just forward port 5900 and install VNC.

Ray Andraka · Apr 21, 2006

Antti Lukats wrote:

Hi Ray,

yes sure I forgot to mention those details.. its totally different thing
internally.
I was just looking at AT94 and AT40 in order to see if I could maybe have
an application for them, the AT94S10 is $19, its true single chip, has
onchip
25 MIPS RISC and can do dynamic reconfiguration. could be used as
replacement (way more flexible) for SystemACE, that where my interest was.
I still have the very secret document about all the bitstream cell bit info
of
the AT40K so still having ideas doing something that really benefits from
dynamic reconfiguration.

Antti

I got one of those around here somewhere too. The AT40K is getting

kinda long in the tooth though. The only advantage it has over the
Virtex parts is the fact that you can partially reconfigure down to the
cell level, where Virtex requires you to reconfig a whole column (or a a
whole column segment for V4). Regardless, the tools for partial config
have never really been developed far enough to make it much more than a
lab curiousity. I did make some forays into partial configuration years
ago, and it was painful. As far as I've been able to determine, the
design described in my dynamic video pipeline processor paper (on the
website) is the first application that attempted to do partial
configuration with the clock running. All the designs described in the
prior literature suspended the clock while reconfiguring. Running the
clock opened a whole new can of worms, and the Atmel architecture was
not well suited for it because you had to be careful what order you
removed and replaced wires to avoid damaging conflicts (shorts). In any
event, the place and route tools are far from what is really necessary
to reasonably handle dynamic partial reconfiguration.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Peter Alfke · Apr 21, 2006

I remember that we shipped XC3000 and XC4000-type devices for down-hole
(oil exploratory drilling) applications, where they were used for
week-long operation at 175 degr C.
We never guaranteed operation, and parameters do of course become much
slower and leakier, but the customer had no functional problems. They
ran into problems at 200 deg C.
I doubt that any manufacturer will go to the trouble of qualifying
parts at that temperature, for a very limited market. Well-designed Si
should not have a serious short-time problem, but some plastic packages
behave strangely above 140 degrees.

Peter Alfke
====================
info_ wrote:

Thanks for the follow-up.

One of the "valid" reasons I didn't give details about was that these
chips are running at an ambient temperature of 175 degrees Celsius

This customer won't qualify a new family every now and then
as you can surely guess ;-)

Any semi wiling to qualify the latest chips at this temp ?

Peter Alfke · Apr 21, 2006

I remember that we shipped XC3000 and XC4000-type devices for down-hole
(oil exploratory drilling) applications, where they were used for
week-long operation at 175 degr C.
We never guaranteed operation, and parameters do of course become much
slower and leakier, but the customer had no functional problems. They
ran into problems at 200 deg C.
I doubt that any manufacturer will go to the trouble of qualifying
parts at that temperature, for a very limited market. Well-designed Si
should not have a serious short-time problem, but some plastic packages
behave strangely above 140 degrees.

Peter Alfke
====================
info_ wrote:

Thanks for the follow-up.

One of the "valid" reasons I didn't give details about was that these
chips are running at an ambient temperature of 175 degrees Celsius

This customer won't qualify a new family every now and then
as you can surely guess ;-)

Any semi wiling to qualify the latest chips at this temp ?

Benjamin Menküc · Apr 21, 2006

Hi,

my problem is that behavior simulation doesnt work somehow. Therefore I
have to do post-translate simulation. However during translation XST
doesnt keep the net-names, so its a little bit hard to find the right
net in modelsim, I don't even know which net to take. Do I have to pick
the right net using fpga-editor? Or how do I get an overview over the
new nets?

regards,
Benjamin

Marc Randolph · Apr 21, 2006

Antti Lukats wrote:

"Simon" <news@gornall.net> schrieb im Newsbeitrag
news:I4OdnQ3Pm-Iy6enfRVn-rw@comcast.com...

Just trying to figure out what the rough price of V4 FX12/FF668
part
is... AVNet and NuHorizons aren't showing any prices or stock atm,
and
digikey don't do anything even vaguely recent :-(

Anyone bought any (quantity 1-10) recently and want to give me a
ball-park figure ?

Cheers,
Simon.

the ball-park figure is $100 if you get a very very very good deal,
if not
then multiply by the tambov's constant* to get your price.

Antti

*tambov's constant: a constant multiplier that when applied always
gives the
correct result for any equation.

Pretty funny, Annti! And true. I've found in the correct volume,
nearly everything is about $100 (except the stuff that is already under
$100

.

But to directly answer the OP's question:

http://groups-beta.google.com/groups?hl=en&q=XC4VFX12

Marc

Peter Alfke · Apr 21, 2006

Take a look at XAPP802 (overview) and XAPP702 (memory interfaces) and
XAPP700 (network interfaces) Just enter these names in the upper right
hand corner search window on the Xilinx website.
Here is a (not so short) description of a powerful approach:

Capturing the Input Data Valid Window.

Let's assume a continuously running clock and a 16-wide data input
bus.
Let's assume the clock is source-synchronous, i.e. its rising
transition is aligned with the data transitions, and all these
transitions have little skew.

The user faces the problem of aligning the clock with respect to the
data in such a way that set-up- and hold-time specs are obeyed and
(hopefully) data is captured close to the center of the data valid
window.
Given the fairly wide spread between worst-case set-up- and hold-time
as specified by the IC manufacturer, a carefully worst-cased design
will achieve only modest performance, since the designer is forced to
accomodate the specified extreme set-up and hold time values of the
input capture flip-flops. Typical values are positive 300 ps set-up
time, negative 100 ps hold time, which implies a 200 ps window. The
actual capture window is only a small fraction of a picosecond, but,
depending on temperature, supply voltage or device processing, it might
be positioned anywhere inside the specified wide window.

Here is a self-calibrating design approach that achieves much better
performance by largely eliminating the uncertainty of the flip-flop
characteristics.

This approach assumes reasonable tracking of the input flip-flops
driven by the data and clock inputs, and assumes programmable delay
elements at each input buffer.

The incoming clock is buffered and used to clock all data input
flip-flops. The incoming clock is also used as if it were data, run
through its own delay element X, then driving the D input of a clocked
flip-flop. Its output is then used to control a state machine that
manipulates X to find the two edges of the valid window, where the
flip-flop output changes. Note that changing X has no impact on the bus
data capture operation, it only affects the control flip-flop. Once
both edges are found, the state machine calculates the center value,
and applies this in common to all data input delays.

This auto-calibration circuit can run continuously (or
non-continuously), since it does not interfere with normal operation.
It means that the user can completely ignore the flip-flop set-up and
hold time specifications, the spread between set-up and hold-times, and
their possible variation with temperature and Vcc.
This circuit does not compensate for skew between data lines, or any
skew between data and clock, and it assumes good tracking between all
input flip-flops, and relies on a reasonably fine granularity in the
delay adjustments.
Fundamentally, this auto-calibration reduces the data capture
uncertainty from a first-order problem, to a second order issue, thus
permitting substantially higher data rates and/or higher reliability of
operation.
Virtex-4 programmable input delays have 75 picosecond granularity. A
low-skew data bus can thus be captured at bus data rates in excess of
1Gbps, even when the data valid window is smaller than 200 ps.
Peter Alfke 3-31-05

Preben Holm · Apr 21, 2006

The input flip-flop is different, since it gets its data from the
outside, not created by the internal clock.

And this is the only difference?
So it get's instantiated automatically when writing a process and this
process is taking input from the "outside" of the FPGA?

The delay from the chip's clock input to the clock arriving at the
input flip-flop (or any other flip-flop) is much longer than the delay
between Data input to arriving at the D input of the input flip-flop.
The clock must drive thousands of destiantions, the data only one.

But can a "minimum" delay of the clock be guaranteed - all these
parameters is based on maximum delay of clock, maximum delay of
input-buffer and so on.. But in my case I find the minimum a rather
usefal value, since the data is valid after 7ns (3ns before rising edge,
when 100MHz) and delaying this a 5ns total is bad for me if the internal
clock isn't delayed more than 2ns (the data will not be ready to the
rising edge of the clock). So a minimum 2ns delay is a very useful value
whether or not to use the delay element (it works fine in my
test-application, but only 50MHz testing and supposed to run 100MHz in
the end).

Whenever the clock arrives later than the data, data must be held (hold
time) which is an ugly parameter that most flp-flops do not specify.

That's why it helps to delay the data input, and Xilinx has done that
for the past 18 years on all our parts, but in Virtex-4 it has become
much more sophisticated.

Okay, I'm very new to FPGA's and is in the learning part - I just
learned about VHDL a few years ago, and now I'm really trying to learn
about the FPGA's..

Xilinx has very nice XAPPs for many things, but this is one place where
explanations is missing.

Thanks
Preben Holm

digi · Apr 21, 2006

Have you the OPB Intc as a core or do you have it withhin the IPIF as
Devise Interrupt Controller?

digi · Apr 21, 2006

Hallo Joseph

When you have only ModelSimXE (Xilinx Edition) there are no simple
ways to simulate bus signals. You must create you testbech and
manualy put the sim plb signals to stimulate your core. Or you
simulate the IPIC Interface because it is quite simpler to simulate.

When you have ModelSim PE or SE you can build a simple environment
whis simulate the Bus signals (BFM Simulation)

digi · Apr 21, 2006

Its right

By the Simaulation must your entry signals have I defined state!
And for real use it can but musn't be.

Ben Twijnstra · Apr 21, 2006

Hi Peter,

Remember, any circuit that does not work close to its speed limit
represents waste.

Well, I've seen a fair share of 15-25ns CPLD designs, filled 60% and running
at 4 or 8MHz. Sometimes applications can simply be slow. And developed,
debugged and programmed in under an hour and a half. And, especially
nowadays, without a smaller or slower part that is any cheaper.

But, that's good, isn't it? It would be horrible if the lower end of the
market couldn't take advantage of modern technology.

Best regards,

Ben

Vladislav Muravin · Apr 21, 2006

Hello,

first, whatever you do, avoid gated clocks at all cost, especially in cases
like you have.
I am not sure that i understand exactly what you would like to do, but what
you should probably do is:

(1) Either increase the synchronizing clock (125 MHz will run on VirtexE,
but this is close to maximum (LUT-level),
i.e.timing constraints must be very thorough). Personally, i would not go
for this one.
(2) Just create an asynchronous interface, i.e. your "write" strobe is your
clock and the "read" strobe is the read clock.
But this is more like spplication-specific, depends what exactly you'd like
to do.

if you could give more details...
hope this helps.

regards,
Vladislav

"Wenju Fu" <fwj@nmrs.ac.cn> wrote in message
news:ee8dffe.-1@webx.sUN8CHnE...

I posted following message, but nobody respond(I don't know the reason,
maybe it is too naive). I I post it here again, wish someone could help me.

I am using VirtexE to communication with an ADI's chip. The interface
include, write, read, Data, and Address. I wish FPGA communication with
the chip on FPGA main clock, which is up to 65MHz. I used a synchronized
signal gated with the Clock to generate the write, read signal. Data and
Address signal are synchronized. The problem is: 1) write/read signal
often generate one more period than what I needed. although I could
overcome it by adjust control signal's edge sensitivity, but it maybe
reappear when I resynthesize the design. The reason is time delay of 2
inputs(one is clock, one is control signal) of the LUT4 vary greatly. Can
I limit delay difference of the 2 inputs to an acceptable level? if it is,
How could I do? 2) the timing of address, data and write/read is
inconsistent with the timing required by ADI. I could delay some signal by
add buffers or invertors. But I am afraid if it is work well if I add this
modular to the top design. Is there any better way?

if I generate the w/r signal synchronized with the clock, the problem may
do not exist. But I should drive the clock twice high, I don't know if
VirtexE can work well on 125MHz.

Thank you for your advice.

Peter Alfke · Apr 21, 2006

Ben,
that's the problem with glaring generalizations: there always are
exceptions.
Peter Alfke

Mayil · Apr 21, 2006

Hi Paul:

As you pointed out, I forgot to include style option as "MIX" in the
mpd file. After updating the .mpd file, it works fine.

Thanks for your help,
Aroul

Symon · Apr 21, 2006

"Vladislav Muravin" <muravinv@advantech.ca> wrote in message
news:8Taee.12828$3U.745079@news20.bellglobal.com...

(2) Just create an asynchronous interface, i.e. your "write" strobe is
your
clock and the "read" strobe is the read clock.
But this is more like spplication-specific, depends what exactly you'd
like
to do.

You bad man! ;-) Personally, I'm against adding clocks wherever possible.

I'd much rather retime the data strobes into enables in a master clock
domain if it's at all possible. It's more work up front, but a lot easier
when you include the time taken to build your timing constraints in the UCF
file and debug the unsimulatable (!) timing errors that occur one in a
[m|b|tr]illion operations!.
YMMV, Syms.

Ben Twijnstra · Apr 21, 2006

Hi Peter,

that's the problem with glaring generalizations: there always are
exceptions.

Us in apps do tend to mostly see the corner cases - extreme speed, extreme
size, trying to shoehorn that last MHz out of the silicon while trying to
shoehorn a few hundred extra lines of code into the silicon etc. I'm
getting the feeling that we tend to see the exceptions, more than the rule.

In the last two years I have seen, with the introduction of Cyclone and
(slightly less so) Spartan 3, the performance bar at the lower end of the
spectrum has been raised considerably. The amount of performance and
capacity that is available for under $10 nowadays is just amazing compared
to three years ago.

It's a fun field we're working in.

Best regards,

Ben

Apr 21, 2006

Johnsons. Joe wrote:

Hello

I am using a Virtex2Pro board and lately I was trying to use the PowerPC
at the highest speed (300MHz) on my board. I have a function which uses a
lot of floating point instructions

The 405 used does not have floating point in hardware! (unless Xilinx 405 is
something extra and does...)
But the PPC instruction set always support them, in this case by taking
exception (or never compile to them and use library routines instead)

for calculating the log, sine, cosine
and such stuff. When I ran this program on the PowerPC it took almost 2
minutes to perform 1000 iterations at 100MHz.

Each log, sine and cosine takes lots of floating point operations...

Then we wanted the code to
run a little more faster and so we implemented the same design at 300MHz.
Even if we didn't expect a three fold increase in speed, there was only an
improvement of a couple of seconds. Can somebody tell me the reason.

Would you like to do better than that?

If your input has limited range (integers) you can
1) precompute look up tables for each possible input value, next step could
be to move the look up tables to the FPGA...
2) do your math in fixed point

/RogerL

rgebru · Apr 21, 2006

Thanks everyone for your suggestions, I finally got it to work...

austin · Apr 21, 2006

Correct,

The 405 core we use does not have a FPU.

There are FPU cores available that can be used with the new V4 APU, or
with the older V2 Pro 405 PPC through the bus.

The new FPU in hardware + APU in V4 offers a roughly 80X improvement
over the software FPU alone.

Something to seriously consider if you have FPU intensive work to do.

The new APU interface allows for single cycle multiple word transfers
to/from the CPU.

Otherwise, you may use the soft FPU that replaces FPU instructions with
subroutine calls to code.

Austin

roger.larsson@norran.net wrote:

Johnsons. Joe wrote:

Hello

I am using a Virtex2Pro board and lately I was trying to use the PowerPC
at the highest speed (300MHz) on my board. I have a function which uses a
lot of floating point instructions

The 405 used does not have floating point in hardware! (unless Xilinx 405 is
something extra and does...)
But the PPC instruction set always support them, in this case by taking
exception (or never compile to them and use library routines instead)

for calculating the log, sine, cosine
and such stuff. When I ran this program on the PowerPC it took almost 2
minutes to perform 1000 iterations at 100MHz.

Each log, sine and cosine takes lots of floating point operations...

Then we wanted the code to
run a little more faster and so we implemented the same design at 300MHz.
Even if we didn't expect a three fold increase in speed, there was only an
improvement of a couple of seconds. Can somebody tell me the reason.

Would you like to do better than that?

If your input has limited range (integers) you can
1) precompute look up tables for each possible input value, next step could
be to move the look up tables to the FPGA...
2) do your math in fixed point

/RogerL

EDK : FSL macros defined by Xilinx are wrong

Martin Riddle

Guest

Ray Andraka

Guest

Peter Alfke

Guest

Peter Alfke

Guest

Benjamin Menküc

Guest

Marc Randolph

Guest

Peter Alfke

Guest

Preben Holm

Guest

digi

Guest

digi

Guest

digi

Guest

Ben Twijnstra

Guest

Vladislav Muravin

Guest

Peter Alfke

Guest

Mayil

Guest

Symon

Guest

Ben Twijnstra

Guest

Guest

rgebru

Guest

austin

Guest

Log in

Welcome to EDABoard.com

Sponsor