EDK : FSL macros defined by Xilinx are wrong

Joey · Apr 21, 2006

I didn't try it with microblaze. But you should be able to do it. I did it
with my power pc on the BRAM as well as the SDRAM. After all its just
reading and writing a memory location. So it must work with Microblaze as
well

"Marco" <marcotoschi_no_spam@email.it> schrieb im Newsbeitrag
news:d7ku3f$kvu$1@news.ngi.it...

"Joey" <johnsons@kaiserslautern.de> wrote in message
news:d7kpmg$m53$2@news.uni-kl.de...

You can just make use of pointers and thats easy enough, isn't it?

"Marco" <marcotoschi_no_spam@email.it> schrieb im Newsbeitrag
news:d7kfc5$fup$1@news.ngi.it...
"John Williams" <jwilliams@itee.uq.edu.au> wrote in message
news:newscache$w5ndhi$t3a$1@lbox.itee.uq.edu.au...
Hi Marco,

Marco wrote:
Which C function should I use to perform read or write into block
ram
(connected to opb bus with opb bus controller)?

Xio_in8 and Xio_out8 ?

Not necessary - just read and write it like normal memory.

Regards,

John

Could you explain, please?

Normally, when I write a C program, I create variables... and
everything
is
stored in memory, but it is implicit.

So, what sohuld I do to read or write into memory? In what way may I
save,
in example a matrix into block ram?

Thanks
Marco

I can use C pointers to point to address space mapped from microblaze?

MM · Apr 21, 2006

When having compiler errors always look at the very first error first:

In file included from inbyte.c:2

../../../include/xuartlite_l.h:48:26: xbasic_types.h: No such file or
director

In this case the compiler simply can't find an include file. Make sure it
exists and make sure include path(s) are set for the project wherever they
are supposed to be set. Include paths can always be given to a compiler in a
command line.

/Mikhail

Apr 21, 2006

Jim George <send_no_spam_to_jimgeorge@gmail.com> writes:

Jorge wrote:
Hi,

I am trying to implement the PCI Brige in a Spartan II board (XC2S200). The problem is when I try to program the board I got that error:

ERROR:iMPACT:583 - '1': The idcode read from the device does not match the idcode in the bsdl File.

I read from the manual that I need to associate the ISO PROM with a dummy.mcs file or a .bsd file to allow the JTAG programming software to pass data through the ISP PROM. I don't know how to that to resolved my problem. If someone know how to resolve the problem please help me.

Thank you Jorge

Jorge,
When Impact asks you to associate a file with the PROM, click the
"BYPASS" button in the file open dialog. This eliminates the need for a
dummy .MCS file for the PROM. I found that when Impact starts up, it
does not scan the chain, you must do this manually by right-clicking an
empty space in the Impact window and choosing "Initialize Chain". Hope
this helped!

Also, don't start several instances of the Impact program. Only the first one
will recognize the cable.

MB
[at least for me, with spartan3 starter kit and parallel cable, 6.3]

--
Michel BILLAUD billaud@labri.fr
LABRI-Université Bordeaux I tel 05 4000 6922 / 05 5684 5792
351, cours de la Libération http://www.labri.fr/~billaud
33405 Talence (FRANCE)

Peter Ryser · Apr 21, 2006

The PPC caches are built into the processor core. They can be enabled or
disabled for any 128MB memory region in real mode (MMU not used) or for
any page size in virtual mode (MMU is used). Unless you are running an
OS you will most likely run in real mode.

I assume that you hook the BRAM to the DSOCM port of the PPC. The DSOCM
is not cacheable, never. This is even true if the DSOCM is mapped into a
cacheable region. DSOCM has similar access characteristics as cache and
thus does not need to be cached.

You might want to look at the OCM section in the PowerPC processor block
reference guide to learn more about OCM. See
http://www.xilinx.com/bvdocs/userguides/ug018.pdf

- Peter

Pit wrote:

Hi,

I've got a question concerning the use of BRAM connected to the Data
Cache Unit. Does the processor still use the internal D-Cache Array
when BRAM is used? If that were the case, is there any possibility to
disable the D-Cache Array, so that the processor is forced to use the
connected BRAM?

Thx in advance,

Pit

John Larkin · Apr 21, 2006

On Thu, 2 Jun 2005 22:18:30 +0200, "new.online.de"
<birkelg@computer.corg> wrote:

Suppose I have an AC power system, and I can digitize a pair of
voltage and current waveforms. I want to report everything: trms
volts/amps, true power, reactive power, phase angle. The line
frequency could vary from maybe 20 to 80 Hz for a stationary
generator, or 200-800 for an aircraft system (including startup and
weird situations.) I'll digitize to 16 bits, at maybe 20K
samples/second or something. I'm considering doing all the signal
processing in an FPGA, crunching maybe 8 voltage+current pairs.

For the rms volts/amps, we could just square the samples, filter, and
allow my pokey uP to occasionally pick up that and square root.

True power is just the product of the e*i samples, lowpass filtered.
Easy.

What's tricky is the reactive power/phase angle thing.

reactive power is defined:

reactive power = squareroot (apparent power ^2 - true power ^2)
cos phi = true power / apparent power
reactive power = sin phi * apparent power

This was done in this way because if voltage and current are not purely
sinusoidal, the reactive power
contains power products of harmonics. They cannot be calculated using the
hilbert transform of the fundamental only.

In most ac power systems, the voltage waveform is reasonably
sinusoidal, so I can get away with defining reactive power as the
averaged product of current * (90 degree shifted voltage) waveforms.

(A good Hilbert shifts all frequencies 90 degrees!)

But I do need the phase angle. Arc-cos is ambiguous.

Reactive power is the square root of the square sums of fundamental rp,
reactive products of equal harmonics, and products of unequal harmonics.
Reactive products of equal harmonics are products of equal harmonics with
integral zero ( i.e. harmonic reactive power).
The time integrals of nonequal harmonics products are always zero. This is
called distortion power, but its is a part of the reactive power.

True power is the square root of the square sums of fundamental real power
and real products of equal harmonics. This is the same as your
True power is just the product of the e*i samples, lowpass filtered

Most important is to use a sampling rate which is high enough to saisfy
Nyquist's theorem. A sampling frequency 100* line frequency may be a
starting point, but *1000 or above may be necessary.

Actually, no. Nyquist is irrelevant: we're not trying to reconstruct
the waveform, but merely gather statistics on it. One can make a very
nice, accurate power meter that samples at a fraction of the line
frequency.

John

Jonathan Bromley · Apr 21, 2006

On 27 May 2005 01:28:02 -0700, "Johnschool" <tanceqi@yahoo.com> wrote:

Hi Jonathan,

No, not me; you're mailing to the newsgroup!

I just read the material about Flancter circuit.
Since the short pulse is shorter than one clock cycle, it is
impossible to connect the short pulse to SET_CE and RESET_CE. So I have
to connect the short pulse to SET_CLK, and the system clock to
RESET_CLK. Is this right? Thanks!

I think that's the right idea. I don't have in front of me
the document you're reading, so I'm not sure; but I think it's right.
The Flancter is a twisted ring-of-2 with separate clocks on the two
flops. An active edge on one of the clocks will make the two
flops' outputs different; an active edge on the other clock makes
the two flops' outputs the same. A simple XOR of the two outputs
then tells you which of the two clocks happened most recently.
Both flops can have a clock enable, but each enable must be in
the clock domain of the corresponding flop's clock. The XOR'd
output is an asynchronous signal, and it is sometimes necessary
to resynch it back into one or the other clock domain using a
traditional resynchroniser circuit.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Neill A · Apr 21, 2006

Well I finally got around to trying it out, but didn't notice any real
difference.

The following summary gives an idea of the size of the design I tried
out:

Importer Summary
===============
Part-Package: APA600-BG456
Core Slots: 21504
RAM/FIFO Slots: 56
I/O Slots: 356 (Globals: 4) (PLLs: 2)

Core Cells: 11965 --> Usage: 55.6 percent
RAM/FIFO Cells: 6 --> Usage: 10.7 percent
IOs: 352 --> Usage: 99.4 percent
PLLs: 2 --> Usage: 100.0 percent

Constraints processed:
IO constraints: 351
Path constraints: 0
Placement constraints: 0
Net constraints: 4

I/O Cells: Core cells:
| Instances | Gates | Tiles
Input. IOs: 87 ----------|-----------|--------|-------
Bidir IOs: 80 Logic | 8777 | 20183 | 8777
Output IOs: 185 Storage | 3185 | 25215 | 3188
Global IOs: 0 RAM/FIFO | 6 | 54144 | 48
Internal Global: 0 | | |
----------------------- ----------|-----------|--------|-------
Total IOs: 352 Total | 11968 | 99542 | 12013

The windows machine used for the test was a Pentium 4 2.4GHz with 512MB
RAM running WIN XP SP2.

The Linux machine was an Athlon XP2200+ with 512MB RAM running CentOS 4
(RHEL 4 clone).

In both case the run time for layout was ~50 mins, so it seems the
information I received was clearly wrong.

Ken McElvain · Apr 21, 2006

Correct. Synplify Proto's versions are all RTL (instead
of gate level tuned for ASIC mapping) implementations so
they can be efficiently mapped to FPGAs.

Jon Beniston wrote:

I thought the deal with Proto was that it provides its own version of
lots of designware modules... Maybe I'm wrong.

Cheers,
Jon

Martin Thompson · Apr 21, 2006

Sean Durkin <smd@despammed.com> writes:

Hi *,

I keep coming across answer records and script files containing the
setting of undocumented environment variables, such as
XIL_ROUTE_ENABLE_DATA_CAPTURE, XIL_BITGEN_VIRTEX2ES,
XIL_XST_HIDEMESSAGES and so on.

Is there a complete list of these hidden cheat codes? Any "official"
documentation at all?

Whenever I'm stuck in a design, and find out that some magical
environment variable just fixes my problem, I wonder if maybe there is
something like the Holy Grail... something like the "Answer to Life, the
Universe and Everything", as in a "XIL_MAKE_EVERYTHING_WORK"- or
"XIL_42"-variable or something. Haven't found it but thought I could ask.

No help for you Sean, just a small rant:

I've never understood this approach to things - it makes version
control a bit of a nightmare! Surely there's a better way, even if
it's a file called enable-budges.txt in the project directory!

Cheers,
Martin

--
martin.j.thompson@trw.com
TRW Conekt, Solihull, UK
http://www.trw.com/conekt

Ben Jones · Apr 21, 2006

Hi John,

Now the Interesting fact I found is, a 32/16 divider from Xlinx core
genrator can be synthesised(using XST synthesis)to 150Mhz easily for a
Virtex-2 (Xc2v2000)FPGA with just one stage pipeline..

Not true, I'm afraid. The Coregen pipelined divider is really, really
pipelined. The pipeline depth (i.e. latency) of a divide will always be at
least one clock cycle per bit of dividend width (in your case, 32 clock
cycles). The field called "Clocks per division" is actually not very well
named; it is really the "initiation interval".

So, if you generate a 32/16 divider with Clocks per division set to '1', and
it runs at 150MHz, then you'll be able to do one instruction every clock
(6.66ns), but each one will take, say, 34 clock cycles (~226ns) to complete.

If you generate the same divider but with CPD set to '8', it will most
likely run at a similar speed and the latency will be about the same
(~226ns). However, you can now kick off only one division every eight cycles
(53ns). (The circuit will be considerably smaller than the first example.)
This compares favourably with your Synopsis example, which would appear to
take 50ns*3 = 150ns to do a divide (about 33% less than the Coregen one),
and does one division every 50ns (about the same as the coregen one). So,
Synopsis' divider is not rubbish, but it's nothing to shout about in
performance terms.

The Coregen divider doesn't currently allow you to vary the pipeline depth
manually (to get your 20MHz implementation, for example, by pulling out a
load of registers).However, you can - with care - run the divider on a
faster clock than the surrounding logic and rate match the two.

Hope this helps,

-Ben-

Gabor · Apr 21, 2006

Jim George wrote:

Sean Durkin wrote:
Hi *,

I keep coming across answer records and script files containing the
setting of undocumented environment variables, such as
XIL_ROUTE_ENABLE_DATA_CAPTURE, XIL_BITGEN_VIRTEX2ES,
XIL_XST_HIDEMESSAGES and so on.

Is there a complete list of these hidden cheat codes? Any "official"
documentation at all?

Whenever I'm stuck in a design, and find out that some magical
environment variable just fixes my problem, I wonder if maybe there is
something like the Holy Grail... something like the "Answer to Life, the
Universe and Everything", as in a "XIL_MAKE_EVERYTHING_WORK"- or
"XIL_42"-variable or something. Haven't found it but thought I could ask.

cu,
Sean

XIL_XST_HIDEMESSAGES is supposed to suppress the warnings that cause my
report files to grow to several tens of thousands of lines long... IT
DOESN'T WORK!

-Jim

I'm using XIL_XST_HIDEMESSAGES with 6.1i and it works fine. You just
can't pick which messages it hides. One thing I noticed was that I
usually need to add these variables to the user environment because for
some reason they are not always picked up by the tools when placed in
the system environment.

-Gabor

Apr 21, 2006

Jedi,

Sorry I did not see this sooner. I just tried a test with Nios II 5.0
of the timer peripheral -- starting with a 10ms period as you
indicated, and tried an experiment to modify the timer period as well
as take snap-shots based on pressing/releasing a button on my dev
board. It all worked fine. Send me an email if this is still a problem
and I'll send you the source code I'm using for the test!

One scenario that comes to mind that may have caused this is that if
your code is not setup to use I/O instructions to peripherals
(recommended for all Nios II software), and you're using the data
cache: there have been modifications to the Nios II data cache in 5.0
-- whether you're using the data cache or not its best to ensure that
you're doing an "I/O" read/write to external peripherals rather than
simple load/store (as you'd get with a regular pointer dereference).
This stuff is described in the Nios II Software Developer's manual.

Jesse Kempa
Altera
jkempa -at- altera -dot- com

Jedi · Apr 21, 2006

kempaj@yahoo.com wrote:

Jedi,

Sorry I did not see this sooner. I just tried a test with Nios II 5.0
of the timer peripheral -- starting with a 10ms period as you
indicated, and tried an experiment to modify the timer period as well
as take snap-shots based on pressing/releasing a button on my dev
board. It all worked fine. Send me an email if this is still a problem
and I'll send you the source code I'm using for the test!

One scenario that comes to mind that may have caused this is that if
your code is not setup to use I/O instructions to peripherals
(recommended for all Nios II software), and you're using the data
cache: there have been modifications to the Nios II data cache in 5.0
-- whether you're using the data cache or not its best to ensure that
you're doing an "I/O" read/write to external peripherals rather than
simple load/store (as you'd get with a regular pointer dereference).
This stuff is described in the Nios II Software Developer's manual.

Everything working fine with periphals (o;

Wasn't the cores but the NIOS2 toolchain producing wrong code
on AMD64 Linux platforms...

Compiling everything on OSX now (o:

best regards
rick

Ben Jones · Apr 21, 2006

"vinch" <vincesusu@yahoo-dot-fr.no-spam.invalid> wrote in message
news:KvednQrWrplsyjjfRVn_vg@giganews.com...

At the end of the first page, -9/4 is translated in binary in a
strange way.
First, 9 is 1001 in unsigned, but -9 doesn't exist in signd binary as
far as I can remember....
in the datasheet :
-9/4=9/-4=-(2 1/4)
this corresponds to :
(1)0111/0100 or 1001/1100
isn't that wrong ?

Well, 10111 would be -9 in 5-bit binary, so it depends what you mean by
"wrong". "Sloppy" might be a better word. I think it's a red herring though.

The point it's illustrating is that if the result of a signed division is
negative, the quotient will always be negative (or zero). However, if you
choose an *integer* remainder then it may differ in sign from the quotient;
if you choose a *fractional* remainder then it will always have the same
sign as the quotient.

Cheers,

-Ben-

Apr 21, 2006

I don't know if anyone is still reading this thread, but could I ask a
couple of more questions?

I am using (or trying to use) the iterative CORDIC algorithm written in
software. I've read Ray Andraka's paper on designing a bit serial
processor, in which he writes that when considering whether or not to
use a bit-serial design:

"...the application for the processor must be able to tolerate any
pipeline delay introduced by the serial processor. The latency in a
parallel system is frequently as high or higher than the equivalent
serial system so this is rarely a concern."

I find this statement confusing. I thought that the advantage of the
bit-parallel was that it has a much lower latency = number of
iterations, while the bit-serial has a latency = word width * number of
iterations. So why is the "latency in a parallel system as high or
higher?"

Thankyou,
Mees

Austin Franklin · Apr 21, 2006

Mounard,

...by not knowing VHDL you'll be stuck
doing simple designs and will need my consulting services to do anything
remotely complex.

Would you consider the Alpha processor remotely complex, or a simple design?

Austin

yanglongzi · Apr 21, 2006

Symon · Apr 21, 2006

"calaf" <calaf_calaf_calaf@yahoo-dot-es.no-spam.invalid> wrote in message
news:SfOdnUiCl8oD9jXfRVn_vg@giganews.com...

Hi all, I am new in this forum, and I have not found a question as the
stated below. Sorry if it has already done. I have been designing
with Spartan-3 and as a consequence of the number of different power
suply and the pinout distribution on the board it is impossible to me
to have a very reduced number of pcb layers. Allowing for simetries
between powers and gnd layers on the stack I almost can't decrease
from ten. Is there any idea I am missing? maybe as 2.5 V is only
used in configuration I can create islands on the 3.3V layer and
share the ground layer return between both power supply?
I think there must be something else that allow me to work effitienly
wilt fewer layers.
Thanks in advance

Without knowing what PCB technology you're using, it's impossible to say.

Here are a few questions off the top of my head.
Can you use microvias? What's your minimum track width? Minimum gap between
tracks? What's the biggest BGA package on the board? Are you prepared to
swap pins on the FPGA to aid the routing process? Is your volume enough to
make it worth spending a lot of time on the layout? How fast are your
risetimes and how long are your traces? Do you have Hyperlynx? Are you gonna
insist on a plane for each of your power supplies, or do you know what
you're doing? ;-)

256 pin BGA, 4mil tack/gap, uvias, two ground planes, routed powers,
quick(ish) turnaround, long and fast traces = 6 layers, maybe even 4 with
one ground plane if you're quite talented and have a lot of time.
Cheers, Syms.
p.s. Simetry (sic)? Pah, I spit on symmetry.

Austin Lesea · Apr 21, 2006

Calaf,

See below,

Austin

calaf wrote:

Hi all, I am new in this forum, and I have not found a question as the
stated below. Sorry if it has already done. I have been designing
with Spartan-3 and as a consequence of the number of different power
suply and the pinout distribution on the board it is impossible to me
to have a very reduced number of pcb layers. Allowing for simetries
between powers and gnd layers on the stack I almost can't decrease
from ten. Is there any idea I am missing?

No. I am sure you have looked at it carefully, and the result is that
the pcb's are definitely more complex for the newer technologies.

maybe as 2.5 V is only

used in configuration I can create islands on the 3.3V layer and
share the ground layer return between both power supply?

Vccaux is used for the DCM, the IO predrivers, and the pass gate
regulators (for interconnect). As such, noise on Vccaux will add to
overall jitter. A dedicated plane for Vccaux may be too expensive, and
not required. It may be shared with 2.5V IO Vcco banks, for example,
but bypassing must be done well (see the SI web pages for pcb guidelines
and bypassing guidelines on support.xilinx.com).

http://www.xilinx.com/products/design_resources/signal_integrity/resource/si_power.htm

Using more advanced bypass capacitors like X2Y style:

http://www.x2y.com/

or the other advanced caps:

http://www.avxcorp.com/docs/Catalogs/licc.pdf

results in less area for the caps, and better bypassing overall, and may
allow reduced plane area, as plane C from power to ground plane is
effective above 100 MHz, where bypass capacitors are hardly able to do
anything at all.

I think there must be something else that allow me to work effitienly
wilt fewer layers.
Thanks in advance

As mentioned above, there are techniques you can use, and tradeoffs you
can make. Please work with your local Xilinx FAE, or Xilinx Disti FAE,
as they have resources they can use to address the issue. In fact, my
FPGA Lab supports the field in their endeavours, along with other groups
within Xilinx to provide the best solutions.

Ray Andraka · Apr 21, 2006

m_oylulan@hotmail.com wrote:

I find this statement confusing. I thought that the advantage of the
bit-parallel was that it has a much lower latency = number of
iterations, while the bit-serial has a latency = word width * number of
iterations. So why is the "latency in a parallel system as high or
higher?"

Thankyou,
Mees

At a given clock frequency, it is true that the bit parallel will have

a lower latency (that should be obvious), however a totally bit serial
design can generally be clocked faster than an equivalent bit parallel
design. In certain pipelined bit serial designs, you can also begin the
next stage before the previous one is completed, hiding some of the
latency, so the overall latency is only a little longer than the bit
parallel latency. Unfortunately, CORDIC is not one of those because you
need the sign (last bit generated) of one stage before you start the
processing for the next stage. Nevertheless, at the time that paper was
written, a bit serial design in the then current FPGAs could be clocked
much faster than a bit parallel arithmetic design in the same part, so
while the number of clocks of latency was greater, the higher clock
frequency makes up for much of that latency in terms of absolute time.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

EDK : FSL macros defined by Xilinx are wrong

Joey

Guest

MM

Guest

Guest

Peter Ryser

Guest

John Larkin

Guest

Jonathan Bromley

Guest

Neill A

Guest

Ken McElvain

Guest

Martin Thompson

Guest

Ben Jones

Guest

Gabor

Guest

Guest

Jedi

Guest

Ben Jones

Guest

Guest

Austin Franklin

Guest

yanglongzi

Guest

Symon

Guest

Austin Lesea

Guest

Ray Andraka

Guest

Log in

Welcome to EDABoard.com

Sponsor