EDK : FSL macros defined by Xilinx are wrong

Göran Bilski · Apr 29, 2008

Hi,

I did a quick test with MicroBlaze.
With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to run
the benchmark.

I added two defines in the program.
#define printf xil_printf
#define double float
The first define is to get a smaller code footprint since the default printf
is bloated and no floating-point is printed.
The second define will make the compiler to use the MicroBlaze FPU
single-precision floating-point compare and conversion instructions.
Neither defines will change the program result since there is no actual
floating-point calculations, just compare and conversions.

Actually the program prints out a relative large number of characters and if
I remove the printf statement that is part of the loop, the program executes
in 6.1 s
The baudrate will have an effect on the execution speed if too many prints
exists in the timed section.

Göran

"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message
news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com...

I trying to get a feel for how the performance of my (so far
unoptimized) soft-core stacks up against the established competition,
so it would be a great help if people with convenient access to Nios
II / MicroBlaze respectively would compile and time this little app:
http://radagast.se/othello/endgame.c (It's an Othello endgame solver.
I didn't write it) and tell me the configuration.

In case anyone cares, mine finished this in 100 seconds in this
configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async
sram. (My Mac finished this in ~ 0.5 sec

Thanks
Tommy

Göran Bilski · Apr 29, 2008

Hi,

Actually the use of floating-point at all seems unnecessary in the program.
Think this is a legacy of PC program where the usage of double (or float) is
not performance critical as on CPU without a FPU.

I think it's safe to change to double in the program to int without any
changes in result.
The program would not run faster on a MAC/PC with this change but it will
have a drastic effect on your CPU.

Göran

"Göran Bilski" <goran.bilski@xilinx.com> wrote in message
news:fv70te$7s01@cnn.xsj.xilinx.com...

Hi,

I did a quick test with MicroBlaze.
With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to run
the benchmark.

I added two defines in the program.
#define printf xil_printf
#define double float
The first define is to get a smaller code footprint since the default
printf is bloated and no floating-point is printed.
The second define will make the compiler to use the MicroBlaze FPU
single-precision floating-point compare and conversion instructions.
Neither defines will change the program result since there is no actual
floating-point calculations, just compare and conversions.

Actually the program prints out a relative large number of characters and
if I remove the printf statement that is part of the loop, the program
executes in 6.1 s
The baudrate will have an effect on the execution speed if too many prints
exists in the timed section.

Göran

"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message
news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com...
I trying to get a feel for how the performance of my (so far
unoptimized) soft-core stacks up against the established competition,
so it would be a great help if people with convenient access to Nios
II / MicroBlaze respectively would compile and time this little app:
http://radagast.se/othello/endgame.c (It's an Othello endgame solver.
I didn't write it) and tell me the configuration.

In case anyone cares, mine finished this in 100 seconds in this
configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async
sram. (My Mac finished this in ~ 0.5 sec

Thanks
Tommy

Göran Bilski · Apr 29, 2008

Hi,

Actually the use of floating-point at all seems unnecessary in the program.
Think this is a legacy of PC program where the usage of double (or float) is
not performance critical as on CPU without a FPU.

I think it's safe to change to double in the program to int without any
changes in result.
The program would not run faster on a MAC/PC with this change but it will
have a drastic effect on your CPU.

Göran

"Göran Bilski" <goran.bilski@xilinx.com> wrote in message
news:fv70te$7s01@cnn.xsj.xilinx.com...

Hi,

I did a quick test with MicroBlaze.
With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to run
the benchmark.

I added two defines in the program.
#define printf xil_printf
#define double float
The first define is to get a smaller code footprint since the default
printf is bloated and no floating-point is printed.
The second define will make the compiler to use the MicroBlaze FPU
single-precision floating-point compare and conversion instructions.
Neither defines will change the program result since there is no actual
floating-point calculations, just compare and conversions.

Actually the program prints out a relative large number of characters and
if I remove the printf statement that is part of the loop, the program
executes in 6.1 s
The baudrate will have an effect on the execution speed if too many prints
exists in the timed section.

Göran

"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message
news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com...
I trying to get a feel for how the performance of my (so far
unoptimized) soft-core stacks up against the established competition,
so it would be a great help if people with convenient access to Nios
II / MicroBlaze respectively would compile and time this little app:
http://radagast.se/othello/endgame.c (It's an Othello endgame solver.
I didn't write it) and tell me the configuration.

In case anyone cares, mine finished this in 100 seconds in this
configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async
sram. (My Mac finished this in ~ 0.5 sec

Thanks
Tommy

HT-Lab · Apr 29, 2008

<digi.megabyte@gmail.com> wrote in message
news:dae35d89-754a-46dc-a797-9902770bcf21@k13g2000hse.googlegroups.com...

Please sombody help me. I am trying to write a vhdl code and i need
to use floating point arithmetic and logarithm. I have downloaded
ieee 2006 library but they do not synthesize in xilinx ISE. I have
been struggling with this for a while now. I also downloaded the fp
library from http://www.ens-lyon.fr/LIP/Arenaire/Ware/FPLibrary and /
http://www.eda.org/fphdl/vhdl.html i have not had any luck. I do not
know how to instantiate the log or floating point arithmetic in
http://www.ens-lyon.fr/LIP/Arenaire/Ware/FPLibrary/libraries. I did
not have any luck with http://www.eda.org/fphdl/vhdl.html.

Thank you in advance

Try this one:

http://sourceforge.net/projects/libhdlfltp

Hans
www.ht-lab.com

MM · Apr 29, 2008

"Jeff Cunningham" <jcc@sover.net> wrote in message
news:48168a98$0$19838$4d3efbfe@news.sover.net...

My guess is that the first part is pre CES4 and the last part is post
CES4.

You are basically right. Your second part is the full commercial silicon,
which is also known as CES5. Your first part is probably CES1. I guess at
that point Xilinx didn't have plans to release different versions of
engineering samples, so they didn't include the number in the marking.
Starting with CES2 the number was included in the marking.

/Mikhail

Brad Smallridge · Apr 30, 2008

When I started with ROMs circa ISE 6.2 the COE thing
wasn't working past some small number of values.
Someone prompty directed me to infer the ROM and not
use the core generator at all.

The VHDL looks something like this:

type i2c_array_type is array(natural range <&gt

of natural;
constant i2c_data_array : i2c_array_type :=(
0,0,0 -- put your constants here
);
signal i2c_bit_index0 : unsigned(2 downto 0);
begin
i2c_data1 <= i2c_data_array(i2c_dat_index0);

You can use natural,integer,enumerated data,std_logic_vectors,
or records to fit your design data type. All the simulation
data will be visible and the ISE reports will tell you how
many BRAMs were used. I don't think any deeper level of
simulation will provide you more confidence.

If you post your code, I'm sure we can help you debug some
of the issues regarding conversions and the like.

Brad Smallridge
AiVision

Göran Bilski · Apr 30, 2008

Hi Tommy,

It depends how you want to benchmark, only using features that your CPU has?
(lacking large local memory).
The code footprint when using optimized printf is around 50k with data.
Using a processor with 8kbyte dcache and 16kbyte dcache on an application
that is just twice the size dont seems to be valid.
Cache effiencies is more likely to show when you have at least a 10-50x
factor between cache size and code size.
Also using cache will also include the external memory type and memory
controller in the benchmark numbers. I guess they are not apples to apples
between you and me.
Using fast async sram as the external memory is not the same as using SDRAM.

Yes, my results was with using float instead of double, I don't think you
need to set the type to long since the values seems to be well within a
byte.

I took my board connected to my laptop, which is a ML505 (Virtex5 slowest
speedgrade) and I didn't pushed the clock frequency.

Göran

"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message
news:0d6ce282-f79a-4dd2-b968-0af4ae735aba@1g2000prg.googlegroups.com...
Thanks Göran,

that's very impressive. You are right about the double precision, and
output. With the below patch applied, I now clock in at 42.5 s. Could
you try it again (I assume your numbers were with floats).

Using local memory however doesn't make for an apples to apples
comparison as this benchmark is memory heavy and local memory (as
opposed to cache + slow memory) will give MB a large advantage.

Thanks
Tommy
PS: Which FPGA was this on?

On Apr 29, 5:31 am, "Göran Bilski" <goran.bil...@xilinx.com> wrote:

Hi,

Actually the use of floating-point at all seems unnecessary in the
program.
Think this is a legacy of PC program where the usage of double (or float)
is
not performance critical as on CPU without a FPU.

I think it's safe to change to double in the program to int without any
changes in result.
The program would not run faster on a MAC/PC with this change but it will
have a drastic effect on your CPU.

Göran

"Göran Bilski" <goran.bil...@xilinx.com> wrote in message

news:fv70te$7s01@cnn.xsj.xilinx.com...

Hi,

I did a quick test with MicroBlaze.
With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to
run
the benchmark.

I added two defines in the program.
#define printf xil_printf
#define double float
The first define is to get a smaller code footprint since the default
printf is bloated and no floating-point is printed.
The second define will make the compiler to use the MicroBlaze FPU
single-precision floating-point compare and conversion instructions.
Neither defines will change the program result since there is no actual
floating-point calculations, just compare and conversions.

Actually the program prints out a relative large number of characters
and
if I remove the printf statement that is part of the loop, the program
executes in 6.1 s
The baudrate will have an effect on the execution speed if too many
prints
exists in the timed section.

Göran

"Tommy Thorn" <tommy.th...@gmail.com> wrote in message
news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com...
I trying to get a feel for how the performance of my (so far
unoptimized) soft-core stacks up against the established competition,
so it would be a great help if people with convenient access to Nios
II / MicroBlaze respectively would compile and time this little app:
http://radagast.se/othello/endgame.c(It's an Othello endgame solver.
I didn't write it) and tell me the configuration.

In case anyone cares, mine finished this in 100 seconds in this
configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async
sram. (My Mac finished this in ~ 0.5 sec

Thanks
Tommy

Göran Bilski · Apr 30, 2008

Hi Tommy,

It depends how you want to benchmark, only using features that your CPU has?
(lacking large local memory).
The code footprint when using optimized printf is around 50k with data.
Using a processor with 8kbyte dcache and 16kbyte dcache on an application
that is just twice the size dont seems to be valid.
Cache effiencies is more likely to show when you have at least a 10-50x
factor between cache size and code size.
Also using cache will also include the external memory type and memory
controller in the benchmark numbers. I guess they are not apples to apples
between you and me.
Using fast async sram as the external memory is not the same as using SDRAM.

Yes, my results was with using float instead of double, I don't think you
need to set the type to long since the values seems to be well within a
byte.

I took my board connected to my laptop, which is a ML505 (Virtex5 slowest
speedgrade) and I didn't pushed the clock frequency.

Göran

"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message
news:0d6ce282-f79a-4dd2-b968-0af4ae735aba@1g2000prg.googlegroups.com...
Thanks Göran,

that's very impressive. You are right about the double precision, and
output. With the below patch applied, I now clock in at 42.5 s. Could
you try it again (I assume your numbers were with floats).

Using local memory however doesn't make for an apples to apples
comparison as this benchmark is memory heavy and local memory (as
opposed to cache + slow memory) will give MB a large advantage.

Thanks
Tommy
PS: Which FPGA was this on?

On Apr 29, 5:31 am, "Göran Bilski" <goran.bil...@xilinx.com> wrote:

Hi,

Actually the use of floating-point at all seems unnecessary in the
program.
Think this is a legacy of PC program where the usage of double (or float)
is
not performance critical as on CPU without a FPU.

I think it's safe to change to double in the program to int without any
changes in result.
The program would not run faster on a MAC/PC with this change but it will
have a drastic effect on your CPU.

Göran

"Göran Bilski" <goran.bil...@xilinx.com> wrote in message

news:fv70te$7s01@cnn.xsj.xilinx.com...

Hi,

I did a quick test with MicroBlaze.
With 125 MHz and 64kbyte of local memory, it takes MicroBlaze 6.8s to
run
the benchmark.

I added two defines in the program.
#define printf xil_printf
#define double float
The first define is to get a smaller code footprint since the default
printf is bloated and no floating-point is printed.
The second define will make the compiler to use the MicroBlaze FPU
single-precision floating-point compare and conversion instructions.
Neither defines will change the program result since there is no actual
floating-point calculations, just compare and conversions.

Actually the program prints out a relative large number of characters
and
if I remove the printf statement that is part of the loop, the program
executes in 6.1 s
The baudrate will have an effect on the execution speed if too many
prints
exists in the timed section.

Göran

"Tommy Thorn" <tommy.th...@gmail.com> wrote in message
news:f005305a-30b9-4ca2-ae01-7fd3e2622853@l17g2000pri.googlegroups.com...
I trying to get a feel for how the performance of my (so far
unoptimized) soft-core stacks up against the established competition,
so it would be a great help if people with convenient access to Nios
II / MicroBlaze respectively would compile and time this little app:
http://radagast.se/othello/endgame.c(It's an Othello endgame solver.
I didn't write it) and tell me the configuration.

In case anyone cares, mine finished this in 100 seconds in this
configuration: 8 KiB I$, 16 KiB D$, 48 MHz clock frequency, async
sram. (My Mac finished this in ~ 0.5 sec

Thanks
Tommy

Symon · Apr 30, 2008

msn444@gmail.com wrote:

Hi everyone,

We've been shipping a Virtex4 FX20 based product for a few months now
with relatively few problems. However, we're seeing a 60+% failure
rate in our latest batch of boards, characterized by the V4 DCMs not
locking or providing any output at all unless the chip is freezing
cold. Neither the design, board house or CM shop has changed at all
from our previous working batch, which generally runs pretty hot
(~60C) with no known problems.

The details I have found are: when powering up from room temperature,
other logic implemented in the chip seems to work, but the DCMs have
no output and are not locked. If I give the FPGA a good shot of cold
spray and power up the device, it becomes fully functional, until the
temperature rises somewhat and the DCM ceases completely. Among the
samples I've tried, the temperature at which it stops working ranges
from really cold to slightly below room temperature.

The clock source is a 25MHz crystal oscillator -- I tried a function
generator and nothing changes. I tried twiddling VCCAux as I've seen
suggested and no difference there, either. Has anyone seen such a
dramatic DCM failure before, or have any ideas what might be causing
this?

Thanks,
Mike.

Hi Mike,
It probably isn't NBTI.
http://www.xilinx.com/support/documentation/white_papers/wp224.pdf
Google :- nbti group:comp.arch.fpga
But, a bake at 150C for 48 hours would prove that it isn't!

You also might liek to read this thread from a few days back about reset.
Google :- "Virtex 4 DCM problem" group:comp.arch.fpga

Good luck, Syms.

Symon · Apr 30, 2008

msn444@gmail.com wrote:

Hi everyone,

We've been shipping a Virtex4 FX20 based product for a few months now
with relatively few problems. However, we're seeing a 60+% failure
rate in our latest batch of boards, characterized by the V4 DCMs not
locking or providing any output at all unless the chip is freezing
cold. Neither the design, board house or CM shop has changed at all
from our previous working batch, which generally runs pretty hot
(~60C) with no known problems.

The details I have found are: when powering up from room temperature,
other logic implemented in the chip seems to work, but the DCMs have
no output and are not locked. If I give the FPGA a good shot of cold
spray and power up the device, it becomes fully functional, until the
temperature rises somewhat and the DCM ceases completely. Among the
samples I've tried, the temperature at which it stops working ranges
from really cold to slightly below room temperature.

The clock source is a 25MHz crystal oscillator -- I tried a function
generator and nothing changes. I tried twiddling VCCAux as I've seen
suggested and no difference there, either. Has anyone seen such a
dramatic DCM failure before, or have any ideas what might be causing
this?

Thanks,
Mike.

Hi Mike,
It probably isn't NBTI.
http://www.xilinx.com/support/documentation/white_papers/wp224.pdf
Google :- nbti group:comp.arch.fpga
But, a bake at 150C for 48 hours would prove that it isn't!

You also might liek to read this thread from a few days back about reset.
Google :- "Virtex 4 DCM problem" group:comp.arch.fpga

Good luck, Syms.

MM · Apr 30, 2008

Mike,

Have you double-checked the range settings of the DCM? That would be my
first guess...

/Mikhail

MM · Apr 30, 2008

Mike,

Have you double-checked the range settings of the DCM? That would be my
first guess...

/Mikhail

Brad Smallridge · May 1, 2008

Thanks for the replies. I don't need help with the code for the ROM.
No problem with my writing the code and letting ISE infer the ROM,
however what if I wanted to use block memory for a dual-port memory.

You mention ROM in your original post. That's what I answered.

This is a non-trivial coding effort.

I haven't done any elaborate dual port BRAMs because everthing
I have done fits into a single or maybe two BRAMs. Yeah, I suppose
spreading init data among several BRAMs is not trivial and the
combining of addresses and outputs. So maybe you should spill
your requirements and see if someone can help?

I am wondering what use are the
memory cores if I am better off writing the code myself? I thought
the purpose of the cores is not only to help the user by having the
code pre-written by Xilinx's experts, but also to ensure that the
design is optimal for fitting into the the FPGA.

That maybe true.

I used a core adder/
subtractor and comparator with no problems.

And did the core work better than an inferred adder/subtractor?

The trick with the memory
seems to be simulating the memory with the contents loaded. so maybe
the dual-port memory would work OK.

Thanks,

Charles

MM · May 1, 2008

"XSterna" <XSterna@gmail.com> wrote in message
news:6e540612-5af3-4031-851e-ed4a27d20bc8@m3g2000hsc.googlegroups.com...

I will work on all ideas because I'm a beginner in all that, so I need
time to understand everything

Basically your supervisor told you that the FPGA will be used to store an
arbitrary waveform and to send it to the DAC. No math is supposed to be done
in it. You are supposed to use MATLAB to design the waveform and then upload
(or is it download?

) it to the memory in the FPGA. The FPGA will then
simply read it back with the DAC sample rate... If you don't need to change
your waveform quickly and/or you need to be able to "play back" some other
waveforms then it is the way to go. Otherwise you could drop the MATLAB part
and design your own hardware chirp generator as Kevin described.

/Mikhail

MM · May 1, 2008

"XSterna" <XSterna@gmail.com> wrote in message
news:6e540612-5af3-4031-851e-ed4a27d20bc8@m3g2000hsc.googlegroups.com...

I will work on all ideas because I'm a beginner in all that, so I need
time to understand everything

Basically your supervisor told you that the FPGA will be used to store an
arbitrary waveform and to send it to the DAC. No math is supposed to be done
in it. You are supposed to use MATLAB to design the waveform and then upload
(or is it download?

) it to the memory in the FPGA. The FPGA will then
simply read it back with the DAC sample rate... If you don't need to change
your waveform quickly and/or you need to be able to "play back" some other
waveforms then it is the way to go. Otherwise you could drop the MATLAB part
and design your own hardware chirp generator as Kevin described.

/Mikhail

MM · May 2, 2008

"Bob" <rsg.uClinux@gmail.com> wrote in message
news:294e5ba1-814c-4e98-a4fd-d331ec15ab7e@w74g2000hsh.googlegroups.com...

Okay, but I'm not sure where to look for this! I've played the file
using iMPACT, and it works - so it would seem iMPACT adds such a "post-
amble" automagically? How do I get this into my svf (and hence, xsvf)
files?

Bob,

Try creating xsvf directly from iMPACT. Simply choose XSVF file in the
output menu. When finished writing go to the same menu and choose Finish
Writing. This works for me...

/Mikhail

RCIngham · May 2, 2008

"Back in the day", ASIC gate counts were mostly estimated by extrapolating
from previous ASIC designs. If you don't have any such information, your
guess will be as bad as mine.

Do as Dilbert does, and just make up the numbers ;-) After all, it's only
the shareholders' money, and what have they ever done for you?

MM · May 2, 2008

<bishopg12@gmail.com> wrote in message
news:50337b3d-7acd-4285-9de3-0e19f98fab96@t54g2000hsg.googlegroups.com...

We are trying to develop a system that utilizes a Xilinx XC4VFX12 chip
and platform flash. One of our main goals is to be able to remotely
upgrade the bitfiles in the platform flash through ethernet and
rs232. Is it possible to program the platform flash from the fpga/
powerpc core using the jtag chain? My idea was to have the powerpc
get the bitfile from whatever source, store it in ram, then send it to
the platform flash using some type of jtag interface (soft core or
software on the powerpc). Many of the examples I have seen involve
using cplds and other external logic, not this way.

The answer is XAPP058

Also read the recent thread about the xsvf player.

/Mikhail

MM · May 2, 2008

<bishopg12@gmail.com> wrote in message
news:50337b3d-7acd-4285-9de3-0e19f98fab96@t54g2000hsg.googlegroups.com...

We are trying to develop a system that utilizes a Xilinx XC4VFX12 chip
and platform flash. One of our main goals is to be able to remotely
upgrade the bitfiles in the platform flash through ethernet and
rs232. Is it possible to program the platform flash from the fpga/
powerpc core using the jtag chain? My idea was to have the powerpc
get the bitfile from whatever source, store it in ram, then send it to
the platform flash using some type of jtag interface (soft core or
software on the powerpc). Many of the examples I have seen involve
using cplds and other external logic, not this way.

The answer is XAPP058

Also read the recent thread about the xsvf player.

/Mikhail

MM · May 2, 2008

Bob,

The startup clock problem didn't occur to me because I am actually
programming a Platform Flash rather than FPGA, thus CCLK is the correct
clock in my case....

Looking at the Properties dialog for the "Generate Programming File"
process, I see "FPGA Start-Up Clock" under "Startup Options" , which
is indeed set to CCLK - seems this is the default setting, is that
true? So you are suggesting I change this to "JTAG Clock", right?

That's what you have to do.

And just in case, where do I "check the startup options to check how
many clocks you need"? Is this in the data sheet?

Under the same "Start Options" you can see Done (output Events) set to
probably 4 and Release Write Enable set to probably 6. The last number I
believe is the number of extra clocks required after the bitstream has been
shifted in the chip. However, I don't think this is your problem as I am
sure iMPACT adds all the required cycles to your xsvf file. I am pretty sure
your problem is with the startup clock as Gabor noticed.

/Mikhail

EDK : FSL macros defined by Xilinx are wrong

Göran Bilski

Guest

Göran Bilski

Guest

Göran Bilski

Guest

HT-Lab

Guest

MM

Guest

Brad Smallridge

Guest

Göran Bilski

Guest

Göran Bilski

Guest

Symon

Guest

Symon

Guest

MM

Guest

MM

Guest

Brad Smallridge

Guest

MM

Guest

MM

Guest

MM

Guest

RCIngham

Guest

MM

Guest

MM

Guest

MM

Guest

Log in

Welcome to EDABoard.com

Sponsor