EDK : FSL macros defined by Xilinx are wrong

Nju Njoroge · Apr 21, 2006

Thanks Peter. This suggestion worked. For those who are interested, I had
to add my own signal, IP2Bus_RdWdAddr coming from the "user_logic" module
in the slave pcore that intercepted the plb ipif slave signal,
Sl_rdwdaddr[0:3]. I had to set the bits of Sl_rdwaddr[1:3] =
IP2Bus_RdWdAddr. Additionally, I had to delay the signal of
IP2Bus_RdWdAddr by one clock cycle because the PLB IPIF module asserts
the Read Ack on the PLB bus and Sl_rdwaddr one cycle after the user_logic asserts
IP2Bus_RdAck.
On Thu, 17 Feb 2005, Peter Ryser wrote:

Nju,

PLBC405DCURDWDADDR[1:3] must be driven to the PPC in the order in which
you deliver the data.

See the PPC processor block manual for more detail.

- Peter

Nju Njoroge wrote:
Hello,

I'm trying to disable "Critical-word first" loads for cache loads. That
is, when the cache is performing a cache refill, it first loads the target
data from memory, then loads the remaining words in the cacheline from
memory--all as part of a burst transaction. I'm looking for a way to
disable this type of cache fill. Instead, I would like the cache to load
the cacheline starting from the base address of the cacheline. Any one
tried this before? The reference guide claims that the PLB memory
controller can send back the data in the order it desires
(http://www.xilinx.com/bvdocs/userguides/ppc_ref_guide.pdf, page 146).
However, in reality, when my PLB slave pcore sends back the data in order
of ascending addresses, the processor assumes that I sent it back the
target data first, so it uses the wrong word.

Thanks,

NN

Kolja Sulimma · Apr 21, 2006

austin wrote:

FIFOs,

Are useful as a means to synchronize the passing of data between
asynchronous clock domains.

And even if you only have a single clock domain you use a FIFO in
hardware for all applications that you use a FIFO in software, albeit
in software the name Queue is more often used.

bobrics@gmail.com wrote

If it's more like temporary memory, then why not just use intermediate
signals to store input signals and delay them (use shift register) a
specific number of clocks if needed?

In a shift register you delay data for N cycles.
In a FIFO you delay up to N data words as long as necessary.

Kolja Sulimma

Peter Alfke · Apr 21, 2006

Here are my 2 cents:
I call a FIFO an ideal "Black Box", where you clock data in on one
side, and pull it out with a different clock on the other side. And you
need not worry about frequency or phase differences between the clock,
and how much or how little data is in the FIFO (until you reach full or
empty.)
What makes a FIFO with independent clocks so attractive to the user
(its simplicity), makes it really tough on the black box designer, for
whom all problems stem from the unknown phase relationship between the
two clocks. That's why there are Gray-coded counters and sneaky
resynchronization circuits that are (almost) immune to metastable
problems. Metastability is covered up by extra latency that releases
empty a clock tick later than theoretically possible...

Long answers to a short question.
Each BlockRAM in Virtex-4 has a FIFO controller buried in the BlockRAM.
Not very big since it is done in dedicated hard logic, but saves the
user all the work and also all the thinking.
And we tested it with 10e14 resynchronizing cycles at 500 MHz clock
rate, with no failure.
Peter Alfke, Xilinx Applications (from home on a rainy Saturdey)

Piotr Wyderski · Apr 21, 2006

austin wrote:

If you used our soft core FIFO (previous to V4), you would also be safe.
But many looked at the FIFO IP, and said to themselves "I am smarter
than that -- look at all the wasted logic to do such a simple job!"

Well, sometimes it is the case. Have a look at Altera's NCO/DDC
megafunction and related examples. They explicitly generate a pair
of sine and cosine waves and then pass it to a mixer composed
of two rapid multipliers. This consumes many LEs, even on Stratix,
if the NCO uses the CORDIC approach. I have done a better and
smaller DDC/DUC without multipliers on Cyclone. Simply, the
algorithmic approach is a bit more sophisticated.

Best regards
Piotr Wyderski

austin · Apr 21, 2006

Piotr,

A comment on the soft cores (IP) of Altera, or Xilinx in general.

Basically, soft IP (cores, or functions) need to be:

* bullet-proof (fool-proof)
* easy to use
* versatile
* configurable

What these constrains lead to are cores that are generally not optimal
for any specific function, but do the job.

Any given core can probably be trimmed, modified, sped up, and improved.

The FIFO core is one where that would be very dangerous, unless you were
familiar with FIFO design, synchronizers, and asynchronous logic
fundamentals.

Many engineers are very smart, and can do a better job on any specific
function (for their particular application). I would caution against it
when dealing with synchronizing elements, like FIFOs.

But, I would also caution that if you think you can do a better job,
remember that verification is 90% of the work. Anything you modify, or
improve makes you responsible for its verification.

As I design for the next technology node, I look at what others did
before me, and if I think I can improve upon it, I first ask:

- why did they do it this way?
- what may be other functions of this block that I am ignoring?
- is there something I am not considering?
- am I prepared to verify what I have changed?
- do I have access to the test benches and simulations that were done by
the prior owners?
- what will I accomplish by my changes?
- am I better off working on something else?
- does it make business sense?

Remember the classic engineering maxim:

"If it ain't broke, don't fix it."

Austin

Piotr Wyderski wrote:

austin wrote:

If you used our soft core FIFO (previous to V4), you would also be safe.
But many looked at the FIFO IP, and said to themselves "I am smarter
than that -- look at all the wasted logic to do such a simple job!"

Well, sometimes it is the case. Have a look at Altera's NCO/DDC
megafunction and related examples. They explicitly generate a pair
of sine and cosine waves and then pass it to a mixer composed
of two rapid multipliers. This consumes many LEs, even on Stratix,
if the NCO uses the CORDIC approach. I have done a better and
smaller DDC/DUC without multipliers on Cyclone. Simply, the
algorithmic approach is a bit more sophisticated.

Best regards
Piotr Wyderski

Quiet Desperation · Apr 21, 2006

In addition to async clock domains as other have mentioned, I have a
case where a data stream is split into MSBs and LSBs. The LSBs undergo
some processing that takes several hunded clock cycles. The MSBs pass
through, but need to be aligned with the output associated with the
original LSBs with which they were paired. I *could* make 500 F/F long
shift registers, but a block RAM based FIFO is must more efficient.

I had another case where a PC would send commands to a FIFO, and the
circuitry would fetch and process the commands in its own time, but
gueess that's just another example of async clock.

I have one case now where the clock is the same frequency on both sides
of the FIFO, but the input clock is a fairly low quality
(unavoidable... long story) while the output clock is a low phase noise
uberclock divided from a faster clock. The clocks are coherent, but
only through a large external loop involving a fractional-N PLL and
other synthesis. Rather than trying to time up two clocks like this,
it's far easier to just pass the data through a FIFO and not even worry
about it. So it serves two functions: reclocking to a *cleaner* clock
domain and avoiding a gnarly timing situation.

Quiet Desperation · Apr 21, 2006

In article <1108855173.976052.110150@f14g2000cwb.googlegroups.com>,
Peter Alfke <alfke@sbcglobal.net> wrote:

Here are my 2 cents:
I call a FIFO an ideal "Black Box", where you clock data in on one
side, and pull it out with a different clock on the other side. And you
need not worry about frequency or phase differences between the clock,
and how much or how little data is in the FIFO (until you reach full or
empty.)

The real fun is if you have a situation like one of my designs where
the data propagation delay through the FIFO of has to be identical from
reset to reset within a certain spec.

Synchronous reset of input and output pointers? Synchronous to which
clock? Say you use the input clock. When that reset deasserts, the
rising edge of the output clock can be anywhere within the period of
the input clock (it's *asynchronous*) thus causing a prop delay
uncertainty of one period on the input clock.

Does not seem a big deal unitl I realize the clock period is 40 ns and
my uncertainty spec is 5 ns.

I did solve it, BTW, but as it might be patentable, I cannot speak of
it.

fpgawizz · Apr 21, 2006

Thanks KCL. I used a state machine to model that part of the design. Seems
like its working.ITs only a piece of a bigger part. I am trying to have
this display module be one of the modules for a VHDL vending machine. Do
you know any materials in the internet that can help me design this
vending machine. It has the following features:
1) 5 products price - 55/60/65/70/75c
2) 3 different coin inputs -25 c/10c/5c
3) Need to display the product price and price entered via the 3 coin
inputs.
4) When the value of product selected is reached, it should be dispensed
and any change displayed.
5) System should reset after this and also reset if done asynchronously.

David Hand · Apr 21, 2006

Matt wrote:

Hi,

I'm a student new to fpga design and am trying to design a board with a
spartan-3, 2x(1Mx16) SRAM's, a 2Mx16b FLASH, and a coolrunner-II cpld;
all sharing a common address/data bus. I'm interested in accomplishing
the following:

1. The ability to configure the spartan-3 from FLASH using the CPLD.
2. Access to SRAM, FLASH, and a few memory mapped cpld registers using
the Xilinx EDK/EMC core.

My confusion is with regards to how the address bus should be connected
to the FLASH and SRAM's. The design's I have seen seem to connect the
FLASH A0 to A1 or A2 of the FPGA address bus. Similarly the RAM is
then connected to A2. My suspsion is that this is related to the data
width of each device or the size of the data that is allowed to be
written? I should note that my SRAM's are being used as a single
32b-wide memory and the FLASH and CPLD are only 16b wide. Anyway if
someone could clarify the issue it would be much appreciated.

Matt

As I found out the hard way, this required careful reading of the FLASH
and SRAM datasheets. The address bus coming from the EMC is a byte
address. In our case, the FLASH address bits were a byte address, but
the SRAM address bits are a word address. So the FLASH needed A0 -> A0,
etc, but the SRAM A0 goes to A2 of the FPGA bus (assuming you are using
2 16-bit SRAM for a 32-bit bus). The CE bits to the SRAM act as A1. A0
is not used. So if you want something at byte address 12, the FPGA bus
will have a 12, but this is only the 12/2 or 6'th word in an single SRAM
(much like a C array of short integers) or 12/4 for two with the CE
saying which to use. Not using A0 and A1 effectively divides by 4 to
give a word address. Your chips may be different, so you'll have to read
the datasheet.

Note that Xilinx chose and incredibly confusing convention for bit
ordering with EDK, so what I'm calling A0 may actually be A22 or
whatever your low order bit is.

AL · Apr 21, 2006

Hi, Thanks Bart for that answer. Yeah that's actually what I am working on right now, and stuck on one part, how do you know when it fail, and how do you know what the bit error rate is? In simulation I can see everything, but when I actually download the code to the FPGA, I don't know what's going on in there. I tried reading the result back via JTAG register, but it didn't work, BSCAN JTAG only allows me to read back register with very simple program. With a program this complicated, it didn't work. In addition to this bit error rate measurement, my boss wants a DNL and INL measurement; so as soon as I get done with this bit error rate measurement, I have to work on the DNL and INL part. Greatly appreciate if anyone can help! Thanks, Ann

AL · Feb 20, 2005

Hi Jason, But isn't that for simulation only? I want something to display after I download the program to the FPGA and ran it. Thanks, Ann

Zerang Shah · Apr 21, 2006

I think you're confused about the capabilities of FPGAs. You can think
of an FPGA as a bunch of logic with input wires and output wires. To
display any kind of output message you will need some kind of output
peripheral, such as an LCD. Or you can go even simpler than that and
use a few 7-segment displays. You can then interface your display
device with the FPGA and then write HDL code to drive the display and
make it display what you want.

Asking an FPGA to display a message directly is kind of like asking the
power socket on the wall to play your DVD - the power socket can drive
a DVD player, but it cannot directly play a DVD. Similarly, an FPGA can
drive a display device but it cannot directly display a message.

AL wrote:

Hi Jason, But isn't that for simulation only? I want something to
display after I download the program to the FPGA and ran it. Thanks,

Ann

Marco · Apr 21, 2006

To print text on display I need to have font on a rom.

Could you explain how to create a file to copy into rom?

TonyF · Apr 21, 2006

Antti Lukats wrote:

Hi all

I re-found once again my own "Rules of Life" what I first published 21 aug
2001

1 No Promises.
2 Keep Promises.
3 Give away what you do not need.
4 Do what you want to do.
5 Be Happy.

In order to comply with Rules [5], [4] and specially [3] from the above
list, I am giving a promise (those braking rule #1) that I will make all
projects of my past live available as public domain. That includs all I can
publish (ie all that IP that belongs to me and is not covered by 3rd party
agreements), with the exception of maybe a few selected projects I am
actually working on at the moment.

In order to comply with [2] first project is made public today at:
http://gforge.openchip.org

there is OPB I2C IP-Core that uses the OpenCores I2C Core by implementing a
OPB 2 Wishbone adapter.

Just noticed that in your VHDL code you don't use inout ports, resulting
in 200% bloating of a normal inout port declaration. I presume this is
because XST is too lazy to parse inouts so that we have to do some kind
of backend annotation alongside HDL programming, resulting in a not very
elegant code.

This is probably the price to pay for such a cheap tool, so I should not
really complain. Synplify will allow you to use inouts in sub modules,
but it costs much more than XST.

TonyF

Kolja Sulimma · Apr 21, 2006

bob wrote:

So I guess I am asking if anybody has some VHDL code for a parallel in
SPI out latch, or a SPI output counter.

In Project Navigator 6.3i chose from the menu:
Edit -> Language Templates -> VHDL -> Synthesis Constructs -> Coding
Examples -> Shift Registers -> Parallel In, Serial Out

Kolja Sulimma

Brian Drummond · Apr 21, 2006

On Fri, 18 Feb 2005 21:55:42 +0100, "IgI" <igorsath@hotmail.com> wrote:

Hi!

I'm using Virtex-II (XC2V1000-FF896-4C) in one of the product which we have
been selling for over 3 years. Recently we got "new" batch of Virtex-II
chips and problems started to arise.
PCBs with chips from batch B and C are working fine, on the other hand none
of the 42 PCBs, where chips from batch A are used are working.
We are currently using ISE 5.2 SP3 for this design. I have verified the bit
stream by reading it back from the chip and it's ok.

I
can't use ISE 6.1 or newer because the routing is not successful or ISE
simply doesn't meet the timing constraints (the chip is 99% full).

Have you re-run timing analysis on the 5.3 design, but using the latest
timing analyser and latest speed files?

Sometimes the speed files are changed to reflect new information about
the devices ... usually in the "right" direction. But if the old
(formerly successful) design fails with new speed files, that might
point you towards a solution.

With 6.1, have you tried MPPR (multi-pass pacement and routing)?
Sometimes modifying the placement (in FPGA editor) of failing paths and
re-running "re-entrant routing" can fix problems, if there are only a
small number of failing paths.

- Brian

Falk Brunner · Apr 21, 2006

"TonyF" <not@valid.address> schrieb im Newsbeitrag
news:Wp%Rd.1219$%F6.1075@newsfe4-gui.ntli.net...

Just noticed that in your VHDL code you don't use inout ports, resulting
in 200% bloating of a normal inout port declaration. I presume this is
because XST is too lazy to parse inouts so that we have to do some kind

Nonsense. XST can handle inouts quite good.

Regards
Falk

newman5382 · Apr 21, 2006

There is a school of thought that all off chip IO should be
inferred/instantiated at the top level, and not in sub-modules.

"TonyF" <not@valid.address> wrote in message
news:Wp%Rd.1219$%F6.1075@newsfe4-gui.ntli.net...

Antti Lukats wrote:

Hi all

I re-found once again my own "Rules of Life" what I first published 21
aug
2001

1 No Promises.
2 Keep Promises.
3 Give away what you do not need.
4 Do what you want to do.
5 Be Happy.

In order to comply with Rules [5], [4] and specially [3] from the above
list, I am giving a promise (those braking rule #1) that I will make all
projects of my past live available as public domain. That includs all I
can
publish (ie all that IP that belongs to me and is not covered by 3rd
party
agreements), with the exception of maybe a few selected projects I am
actually working on at the moment.

In order to comply with [2] first project is made public today at:
http://gforge.openchip.org

there is OPB I2C IP-Core that uses the OpenCores I2C Core by implementing
a
OPB 2 Wishbone adapter.

Just noticed that in your VHDL code you don't use inout ports, resulting
in 200% bloating of a normal inout port declaration. I presume this is
because XST is too lazy to parse inouts so that we have to do some kind of
backend annotation alongside HDL programming, resulting in a not very
elegant code.

This is probably the price to pay for such a cheap tool, so I should not
really complain. Synplify will allow you to use inouts in sub modules, but
it costs much more than XST.

TonyF

There is a school of thought that all off chip IO should be
inferred/instantiated at the top level, and not in sub-modules.

-Newman

TonyF · Apr 21, 2006

Falk Brunner wrote:

"TonyF" <not@valid.address> schrieb im Newsbeitrag
news:Wp%Rd.1219$%F6.1075@newsfe4-gui.ntli.net...

Just noticed that in your VHDL code you don't use inout ports, resulting
in 200% bloating of a normal inout port declaration. I presume this is
because XST is too lazy to parse inouts so that we have to do some kind

Nonsense. XST can handle inouts quite good.

Only if they are at the top level. If they are in a sub-module, XST will
complain about not finding the *_I, *_O and *_T ports in your sub-module
(see my other post).

TonyF

TonyF · Apr 21, 2006

newman5382 wrote:

There is a school of thought that all off chip IO should be
inferred/instantiated at the top level, and not in sub-modules.

In the end, everything is flattened and becomes top-level, but in your
HDL code it is useful to have sub-modules for clarity, code maintenance
and reusability. It should be obvious or possible to tell to a synthesis
tool that your inout port in your sub-module really is an external port.

TonyF

EDK : FSL macros defined by Xilinx are wrong

Nju Njoroge

Guest

Kolja Sulimma

Guest

Peter Alfke

Guest

Piotr Wyderski

Guest

austin

Guest

Quiet Desperation

Guest

Quiet Desperation

Guest

fpgawizz

Guest

David Hand

Guest

AL

Guest

AL

Guest

Zerang Shah

Guest

Marco

Guest

TonyF

Guest

Kolja Sulimma

Guest

Brian Drummond

Guest

Falk Brunner

Guest

newman5382

Guest

TonyF

Guest

TonyF

Guest

Log in

Welcome to EDABoard.com

Sponsor