EDK : FSL macros defined by Xilinx are wrong

Nitesh · Apr 21, 2006

ITs an AMIRIX AP1070 board . It has a pci bridge external to fpga which
acts as an interface between (10/100mbps etherenet, pmc module , 64 bit
pci) and the fpga. . I can send data to the pci bridge . i got to find
out how to address this data to the host through the 64 it pci. I can
send data to the etherenet over the pci brdge..I dont know whther the
pci bridge will generate the control signals needed for data transfer
to the host since this is not in my control.I can just send data to the
bridge . I am also not sure how to confirm that the data has indeed
reached the host.
Nitesh

Jerome · Apr 21, 2006

Nitesh,
I took a quick look on amirix site : interesting board...
The board guide (developboard.pdf) claims that theboard is provided with all
necessary stuffs
in particular the "apcontrol-windows" (board windows driver + test app)
If PCI transfers are single Read/Write initiated from the host, some signals
will be updated and your FPGA design
should exploit them : to simplify : transfer flag , transfer direction,
transfer adress
When host reads data, you must provide them in the appropriate register(s)

For other PCI transfers - burst / bust mastering , it is a bit more
complicated ....

The main thing for you is to look at the PCI controller of your board
(documentation , programmation method)

"Nitesh" <nitesh.guinde@gmail.com> wrote in message
news:1134675967.936128.110380@g44g2000cwa.googlegroups.com...

ITs an AMIRIX AP1070 board . It has a pci bridge external to fpga which
acts as an interface between (10/100mbps etherenet, pmc module , 64 bit
pci) and the fpga. . I can send data to the pci bridge . i got to find
out how to address this data to the host through the 64 it pci. I can
send data to the etherenet over the pci brdge..I dont know whther the
pci bridge will generate the control signals needed for data transfer
to the host since this is not in my control.I can just send data to the
bridge . I am also not sure how to confirm that the data has indeed
reached the host.
Nitesh

Antti Lukats · Apr 21, 2006

"Bart" <bart_trzynadlowski@yahoo.com> schrieb im Newsbeitrag
news:1134717308.862664.31090@o13g2000cwo.googlegroups.com...

Hi,

First time poster here. I'm using Xilinx's cheap Spartan 3 Starter Kit
Board to create a simple VGA "controller" (framebuffer output) that I
want to interface to 8051 and Z80 systems I've constructed myself
(breadboard projects.)

[snip]

generic advice, if in such trouble as you are then it is usually VERY
helpful to "look" into the FPGA, so get the ChipScope eval, add the ILA to
some of the signals and look whats actually happening.

you can use one DCM to get say 150MHz and use that ILA clock so you would
see several samplings per each system clock sample

Antti

johnp · Apr 21, 2006

Bart -

Have you checked the quality of the input(4) signal? If it is slow or
has noise, glitches, etc the ram write address could get incremented
multiple times for each write operation.

Also, you mentioned trying to synchronize the sram_do_write signal
to the 50MHz clock - you must do that! Otherwise, you've got an async
signal feeding into your state machine - just a matter of time before
it screws up.

Hope this helps!

John Providenza

Ben Jones · Apr 21, 2006

Hi kl31n,

My design requires me to acquire data from an ADC and then,
after some processing to do a division between a couple
floating point numbers every 200ns.

The performances of the core aren't big enough to use just
one, so I implemented a core which feeds several dividers
(made with the Xilinx core) and then I reserialize it all.

What exactly is your performance requirement? How often do you need to start
a new divide operation? How long can you afford to wait for the result?

The Xilinx speed-optimized floating-point divider will run at well over
250MHz and allows you to initiate a new divide on every cycle (i.e. every
4ns) if you wish.

The design works fine till I pass numbers with a period
down to 260ns, going for lower periods the results get weird

I'm afraid I don't understand what you mean by "pass numbers with a period
down to 260ns", could you explain your circuit to me?

-Ben-

Kolja Sulimma · Apr 21, 2006

I would suggest to try to get a simplified system to run at first.
Use internal SRAM as frame buffer. This will not be enough for full
screen, but that does not matter at first, considering the magnitude of
your difficultied.

The internal SRAM is dual ported so you can write with one state machine
and read with another, using different clocks. If you still see
problems, it is very likely that the quality of the input signal is the
problem.

If not, you have a problem with synchronization or SRAM access. Try to
think of a design change that can separate the two.

Kolja

Bart · Apr 21, 2006

John and Kolja: The quality of the input signal is something I
suspected initially but I ruled out any major interference problems
early on. Here's my line of thinking:

- The bad pixels are always the same color as the image data. Corrupted
colors do not occur. If I create a white background with a blue box in
the center, there will be some white pixels scattered inside the box
and blue pixels scattered around outside.

- Each time the input is clocked (by the external Z80), the internal
write pointer is incremented. If there were any glitches on this line,
the image would be offset and for my test cases (boxes and such), it
would be very noticable.

What I initially wanted to do to fix the problem is write some code
like this:

if clk_50mhz'event and clk_50mhz = '1' then -- main clock
if input_clk_prev = '0' and input_clk = '1' then
-- input clock detected. sample the data
end if;
input_clk_prev <= input_clk;
end if;

It didn't work. Although it may have sort of helped solve the problem,
what was happening is that it was triggering much more often than it
should, causing the write pointer to increment incorrectly and thus
garbling up the image.

I did a test where I had an 8051 send a set number of clock pulses to
the FPGA and I displayed the number on the LEDs. It was always correct.
When I used the above code, it would be off by 1 every now and then.

johnp wrote:

Also, you mentioned trying to synchronize the sram_do_write signal
to the 50MHz clock - you must do that! Otherwise, you've got an async
signal feeding into your state machine - just a matter of time before
it screws up.

This is probably the issue. Last night, I may have fixed it. What's
unusual is that I had tried similar code before to synchronize the
signal and it failed to make a difference.

I'll try to show you what I did. I don't have my code with me at the
moment so this is going off of memory. Any thoughts on whether this is
a sound method? Is it too much of a kludge?

process(input, clk_50mhz)
begin
if input(4)'event and input(4) = '1' then -- input(4) is input CLK
do_write <= NOT do_write;
end if;

if clk_50mhz'event and clk_50mhz = '1' then
if do_write_prev /= do_write then
sram_do_write <= '1';
end if;

if sram_state = SRAM_WRITE then
sram_do_write <= '0';
end if;

do_write_prev <= do_write;
end if;
end process;

The idea is that I monitor do_write and when it changes, I signal
sram_do_write in sync with the FPGA master clock.

Oddly enough, I tried code almost exactly like this earlier (both in
this process and in the SRAM state machine) and it never worked! I was
using a XOR to detect a change (if the XOR result was 1, a change
occured) -- would that make a difference? To me, it seems the two are
functionally equivalent.

Thanks for the input so far!

johnp · Apr 21, 2006

Bart -

I suspect the problem is that you are supplying no setup/hold time
on the sram address bus. Also, no hold time on the data bus:
sram_addr <= sram_write_addr;
sram_ce <= '0';
sram_we <= '0';
sram_io_t <= '0';

Try using a 3 clock cycle sequence to write the rams:
a) assert the correct address
b) assert data, CE, and WE
c) remove CE and WE

If your ram is fast enough, you can shorten the to 2 clock cycles by
using
a negedge clock to assert/remove WE.

Good luck!

John Providenza

Jerome · Apr 21, 2006

Nitesh,
This is the PCI controller of your board which initiates the burst transfer
(bus mastering) ,
Before that, either your application (C/linux) or your design (VHDL) must
provide transfer length AND physical memory
start adress.
Lookt at your doc to see how the actual 'go' signal (i.e start DMA
transfer) is provided to the controller (onthe board i use at work, it is up
to the C application to write to a register of the DMA ctlr)

"Nitesh" <nitesh.guinde@gmail.com> wrote in message
news:1134779159.628317.217180@g43g2000cwa.googlegroups.com...

yes the main problem was the apcontrol program ( I am using linux based
) can read data from memory onboard. But I want to do burst transfers
to host from module in fpga and I need to capture this data on host
side. I am integrating PLB master module to transfer data to the pci
and then will look into the host side communication. My doubt initially
was how does the tranfer between the pci bridge and host pci take
place. Should I intiate some control signals. However there are no
extra control signals ... i.e I can send a ping to etherenet over the
pci bridge with no additional signals just address and data .Hopefully
it will be same in the pci case. Any other advise for what I should be
looking into....

Antti Lukats · Apr 21, 2006

<xavier.tastet@gmail.com> schrieb im Newsbeitrag
news:1134858416.979455.73070@o13g2000cwo.googlegroups.com...

Hi all !
I bought a S3board from digilent. I ran some program to test it,
included a serial parallel multiplier I'm currently thinking of a
new project "just for fun". When I learned vhdl at university, we start
a project by coding a very simple processor, who looks like a very
simple picoblaze.
Do you think if it's do-able to implement a pico blaze, plug it
(internally in the fpga) with an uart and to run a kind of monitor ?
For example with a 68K board, when from the console you can access to
memory, dump or modify it, run some assembler command etc etc. I would
like to run a kind of very simple SBC with this system.
I read the documentation about picoblaze, but I'm afraid that the
program must be written when compiling the vhdl ? Could we load the
execution code, for example in the sram ?
Thanks a lot, xavier.

look at thre are several examples

http://www.dulseelectronics.com/

BTW we have a gui for the Picoblaze based Logic Analyzer from the dulse
website

Antti

Karl · Apr 21, 2006

This should fit your needs....

http://altera.com/education/univ/materials/boards/unv-de2-board.html

Greats, Karl.

Markus Knauss wrote:

Hi,

I am looking for a FPGA based video prototype board with
the following components:

1 Video decoder for 1 color composite Video in
1 FPGA preferably Altera
About 1 Megabyte SRAM
Some spare IO pins
1 Color composite video out

The boards I have found are too expensive
and have a lot of other hardware like multiple FPGAs or DSPs,
which I don't need.

Someone an idea?

Thanks!

Markus

Apr 21, 2006

This is the PCI controller of your board which initiates the burst transfer
(bus mastering) ,
Before that, either your application (C/linux) or your design (VHDL) must
provide transfer length AND physical memory
start adress.
Lookt at your doc to see how the actual 'go' signal (i.e start DMA
transfer) is provided to the controller (onthe board i use at work, it is up
to the C application to write to a register of the DMA ctlr)

So...lets say that the FPGA was capable of bus-mastering, and thus

could initiate a PCI write transaction. Lets say that the main system
processor (x86 or whatever) had already provided a set of physical
addresses to the FPGA through some other means (perhaps as a totally
separate PCI write transaction from the x86 to the FPGA). My
understanding is that the FPGA is then capable of initiating a PCI
write transaction, where it can write data to, say, SDRAM that is
connected up to the main system processor, without any intervention
from the system processor itself. I think the order of events would go
like this:
0) FPGA arbitrates for the PCI bus
0.5) FPGA is granted the PCI bus
2) FPGA starts a write transaction, with the target address being that
of SDRAM hooked up to the system processor.
3) Each data word is then clocked onto the PCI bus, received by the PCI
controller hooked up to the system processor, which then arbitrates for
the processor's local bus in order to write data to the local SDRAM.
4) When the FPGA has written all the data it wants, it would assert one
of the INTx lines to indicate to the system controller that data is now
available at the previously-assigned physical address.

Thus, no intervention from the system processor at all here...any flaws
in this logic? Also, I assume that if the FPGA can do a burst write in
step 3 above, all of the data that is bursted onto the bus by the FPGA
will end up in the sequential physical memory locations in the SDRAM,
until either the system processor/PCI controller stops the transaction
or the FPGA simply ends.

Does this make sense?

TIA,

John

rmanand · Apr 21, 2006

You can try with universalscan software

"smu" <pas@d.adresse> wrote in message
news:clih72$b8f$1@s5.feed.news.oleane.net...
Hello,

I am developing a FPGAs (BGA case) board.

Is it possible to check the connections between two pins on two
different FPGAs with the Boundary scan?

If so, exists there a tool that is able to make this kind of test
using
the board schematic?

Thank you in advance.

smu

yes, checking connections is defenetly possible. Also many other things,
like programming non JTAG memories and other non JTAG devices if those
are
connected to FPGAs or devices with JTAG boundary scan.

check out
www.jtag,com

I guess pretty much expensive tools.

if you dont mind some programming yourself I have an windows application
that uses 2 PLDs in JTAG chain to program an FPGA (not on boundary
scan!)
and then via that FPGA an parallel Flash memory. This application could
be
modified to test your custom board. The schematics extraction would be
manual, or if you can import netlist from sch a netlist reader could be
added to provide the link to your schematics.

Antti

Kolja Sulimma · Apr 21, 2006

Peter Alfke schrieb:

Marco, I am sure that you will not find anything beyond very basic
tutorial information.
These are the "crown jewels" of any FPGA company, and these jewels are
well guarded, but also polished daily. The quality of these tools
determines the success of our companies, and each of us wants to be at
leats a step ahead of the other company.
BTW, the continuous investment by companies like Xilinx and Altera (to
name just the two biggest) is enormous, and it is unlikely that an
individual engineer can provide significant improvements. Unless you
are a genius and addressthe problem in a very unconventioanl way.

I totally disagree.

There is a difference between an algorithm and an implementation with
tweaked parameters for a given architecture.
Im am doing EDA-algortihm research for quite some time, and most
substantial progress has been made by individuals or very small teams.

These people do not provide industrial grade implementations that can do
a better job on any given FPGA than the vendor tools, but provide enough
evidence that a certain approach might be better suited.

I hear pathfinder variants are used a lot for fpga routing. Two authors
only.
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/fpga/1995/2550/00/2550toc.xml&DOI=10.1109/FPGA.1995.15

And without flowmap Xilinx probably would build MUX-based FPGAs now.

You and can be greatful that Leiserson and Saxe did not patent retiming
and sell it to altera.
http://www.springerlink.com/(equ2gbjfwu2v44ebrbkulu55)/app/home/contribution.asp?referrer=parent&backto=issue,2,47;journal,56,61;linkingpublicationresults,1:400117,1
Otherwise ISE would have to be an web based application with servers
running in europe.

Xilinx used placement based on simulated annealing in the past.
(UC Berkeley, Carl Sechen, Sangiovanni-Vincentelly, et. al.)
What is it now? Quadratic placement imported from Munich?

The OP should turn to University of Toronto were a lot of FPGA placement
and routing research has been done. Also, scholar.google.com tells me
that André DeHon at CalTech has published in the area of hardware
accelerated placement and routing recently (2003)

Kolja Sulimma

marco · Apr 21, 2006

Thank you very much for the information Kolja.
I greatly appreciate the information that you included in your reply.
Best Regards,
Marco.

Phil Hays · Apr 21, 2006

On Wed, 21 Dec 2005 22:44:22 GMT, "John_H" <johnhandwork@mail.com>
wrote:

My opinion is that the process of mapping separate from place & route is
archaic (to use kind words) and that spreading the logic out so each slice
has just one LUT is *not* the way to alleviate the problem.

Yes. Xilinx has added "map -timing" to do just that. Mappping logic
is now with placement, and the result works rather better.

--
Phil Hays

Ray Andraka · Apr 21, 2006

Austin Lesea wrote:

It is true that "significant gains" are still (often, not always)
realized by some careful hand placement, and some careful partitioning.

That suggests that the design languages lack something important, as the
intent of the designer is not being communicated to the tools.

....

The software folks here at Xilinx are amazing: they have managed to
improve every generation the performance while reducing the time to
compile the designs; all the while we in IC design follow Moore's law
and double the density. Not to mention we add more custom IP, and
customers are getting more demanding.

Austin,

What is missing is geographic relationships between parts of the
circuit. Perhaps the biggest piece missing in the current tools is
utilization of the hierarchy in a design. The current xilinx tools
flatten the design before they even start on the place and route
problem, and that greatly increases the workload and time to complete
while also degrading performance. The tools have an opportunity to use
the hierarchy in the design to treat each hierarchical layer as a
mini-design, essentially breaking the problem into smaller problems in a
way that is consistent with the way the designer broke up the design.
Going to a true hierarchical place and route would improve both the
quality of results as well as the run times.

Now, I do disagree with your assertion that each generation of the tools
improves both run time and quality of results. I have indeed seen
improvements in run time, but more often than not the quality of results
has taken a step backwards with each major release of ISE. Yes, I
suppose for flat RTL only designs, the results have gotten somewhat
better, but that is mostly due to large improvements in synthesis, and
small incremental improvements in the automatic placement (which BTW,
still does a dismal job with placing non-LUT resources, with placing
data paths, and with placing logic that has multiple levels of LUTs
between FFs). In the mean time, the routing algorithms have gotten
lazy, apparently in the interest of speeding up run times. For designs
with poor placement, the effects of poor routing are not as apparent as
they are for well placed (eg carefully floorplanned) designs. For my
floorplanned designs, I have seen a steady erosion in the max speeds
attainable by the tools on each new release since 4.1.

One of the biggest steps backward came from eliminating delay based
clean-up (IIRC that happened in 5.2). The result there is that the
tools just stop as soon as all routes meet timing. Every route in the
design is potentially a critical route. The routes to nearby
destinations often take circuitous routes that congest the routing
resources and unnecessarily drive the power dissipation up considerably.
With the current emphasis on power dissipation, I would think that the
Xilinx team would be looking at reinstituting the delay based clean-up.
Based on my empirical observations, that could pick up a 15-20%
improvement in power dissipation for designs that are clocked in the
upper half of the performance envelope.

Eric Smith · Apr 21, 2006

Ray Andraka <ray@andraka.com> writes:

A simple algorithm is to chose one side and just follow that wall.
For example if you chose the left side, you put your left hand on the
wall and walk, always keeping it on the wall. Eventually you will get
to the destination. On average, you'll visit half the maze before
reaching it.

Doesn't work if the goal is in an "island" in the maze, which is allowed
in most Micromouse rules.

Austin Lesea · Apr 21, 2006

Ray,

Some comments,

Austin

-snip-

What is missing is geographic relationships between parts of the
circuit. Perhaps the biggest piece missing in the current tools is
utilization of the hierarchy in a design.

As I said, there is a lot of room for improvement here. You are
assuming that the hierarchy is well done, and that the results from
working on each piece alone will do better. Just don't know if that is
true. Good area for work, I would agree.

Now, I do disagree with your assertion that each generation of the tools
improves both run time and quality of results.

I have to differ here. I understand your issues, but if we deal with
the ever expanding "standard suite" of test designs with better
performance, and better run time, I have to assert that it is better.
Is everything better? Of course not.

One of the biggest steps backward came from eliminating delay based
clean-up (IIRC that happened in 5.2).

I happen to agree with you here, my personal opinion is that the tools
should allow you to choose to go to the extra effort to find the best
paths, and not stop as soon as the aggregate requirements are met (or
stop and give up if it can't meet the requirements). I think you will
appreciate that what was done did provide for a much faster time to get
the design.

We do make the parts bigger every generation, and you may have noticed,
processor power is not keeping up anymore.

Austin Lesea · Apr 21, 2006

Jim,

Some comments,

Austin

-snip-

Austin, perhaps if you used engineering measurements for SW results,
rather than the words like "wizards" and "magic", then the SW might have
a chance to really improve with each release ?

The software group has a very rigorous quality of results metrics
(measurement) system for evaluating their work. I get to use the
superlatives, they do not.

I did wonder how Altera suddenly found power savings in SOFTWARE -

We still beat them on power, ask your FAE for the presentation. They
took a really lousy situation and made it just plain lousy. We still
have a 1 to 5 watt advantage, AFTER they run their power cleanup.

Given the enomous investment the companies claim, these field results
seem rather abysmal - seems the HW is carrying the SW ?.

Rather, the software is now (often) carrying the hardware. Very hard to
get the latest technology to be any faster than the previous one,
without architecture and software. If the software buys a speed grade,
that is all the customer cares about. The silicon get less expensive
with the shrink to the next generation. Who cares if the performance
came from software, hardware, or both?

Still, it does seem there is indeed a lot of 'fat' in Place & Route SW,
so we can expect further 'double digit improvement' claims....

I agree. Until the tools do a better job than a room full of experts,
the tools are just not 'there'. Reminds me of compilers for high level
languages many years ago: there was a time I could write assembly code
that was faster, better, smaller, than any compiled high level language
(anyone recall PLM86?). Then after a while, the compilers got better
and better. Until finally, I had to agree that all that work was not
worth it: often the compiler yielded a better solution that my hand
written assembly code.

EDK : FSL macros defined by Xilinx are wrong

Nitesh

Guest

Jerome

Guest

Antti Lukats

Guest

johnp

Guest

Ben Jones

Guest

Kolja Sulimma

Guest

Bart

Guest

johnp

Guest

Jerome

Guest

Antti Lukats

Guest

Karl

Guest

Guest

rmanand

Guest

Kolja Sulimma

Guest

marco

Guest

Phil Hays

Guest

Ray Andraka

Guest

Eric Smith

Guest

Austin Lesea

Guest

Austin Lesea

Guest

Log in

Welcome to EDABoard.com

Sponsor