EDK : FSL macros defined by Xilinx are wrong

Göran Bilski · Apr 21, 2006

inesviskic@gmail.com wrote:

Hello all,
I'm new with using Chipscope Pro 7.1 and need some help in how to use
it correctly.
I have a Virtex-2 Multimedia Board and EDK version 7.1 connected to it.
The project contains MB processor and 2bit LED peripheral and includes
a simple application program that switches LEDs on and off. I would
like to get the exact trace of the waveform of the program, so I'm
using Chipscope 7.1i to analyze the OPB bus.
I placed the data and instruction memory on 2 BRAM blocks and connected
them to the OPB bus. I also connected the chipscope ICON and OPB-IBA
core.
However, I don't know how to set the trigger correctly so to see the
entire trace of the program on the bus. I tried using triggers:
1. OPB_ABUS <> 0
2. OPB_ABUS > 0 && OPB_ABUS < 0000 00C0
If the program has an infinite loop (LEDs on and off always), I see
only the loop. If the program is short (LEDs on only once, with or
without the use of an exit(1) command), all I see is 2 repeating
commands on addresses 0x000000c0 and 0x000000c8 (the diodes are already
lit, and the same instruction is read over and over again).
Any comments and suggestions would be greatly appreciated!
Thank you in advance!
Ines

It depends on how your application execute.
Do a "mb-objdump -S" and look at the code.
What program is initialized into the BRAM in the bitstream?
Do you download your program using XMD and then execute it?

Göran

Karl · Apr 21, 2006

Tommy Thorn schreef:

Having just realized that ISE WebPack support ends at an XC3S1500 I set
out to locate a Cyclone II EP2C70 based dev kit, but found nothing. I
thought these parts were launched quite a while ago, what happend?

(The XC3S5000 based Zefant DDR looked great until I was reminded of the
lack of WebPack support. My budget cannot stretch to the full ISE.)

Thanks,
Tommy -- fpga (at) numba-tu.com

Look at the EP2C35F672 based development kits and swap the device. You
will loose around 50 I/O's on the EP2C70 but if you are lucky the DDR
interfaces remain OK (did not check this)

Happy soldering !

Karl.

Markus Kuhn · Apr 21, 2006

"Frank" <frank@yahoo.com.cn> writes:
|> I am looking for textbook with detailed information on AES encryption
|> and decryption. I have Bernard Sklar's Digital Communications, but it
|> contains DES only.

For a detailed textbook about AES by its designers, have a look at

Joan Daemen, Vincent Rijmen:
The Design of Rijndael: AES - The Advanced Encryption Standard
Springer, 2002, ISBN 3540425802.

Most newer entry-level cryptography textbooks (e.g.,
Douglas R. Stinson's Cryptography: Theory and Practice,
3rd edition) contain a couple of pages on AES.

What textbook is best suited for you really depends on
your background and for what reason you want to learn
about AES (just implement it? start a PhD project to
attack it? teach a class?).

Markus

--
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain

Marco T. · Apr 21, 2006

"Peter Monta" <pmonta@pmonta.com> wrote in message
news:_qsWf.10094$tN3.3272@newssvr27.news.prodigy.net...

Hi Marco,

If use use the linker script there is a way to "choose" which variables
store into a certain region of memory?

Yes, you can use a linker script for this, together with
a gcc attribute to indicate which segment should receive
a given variable.

For example, somewhere in the linker script you could add

. = 0x40000000;
.ocm : { *(.ocm) }

This creates a new segment, "ocm", starting at address 0x40000000
(or wherever your OCM is mapped). To actually cause variables
to be allocated to this segment, use something like

#define __ocm__ __attribute__((aligned(32),section(".ocm")))

and then declare your variables:

int __ocm__ foo;
int __ocm__ bar[128];

You can verify that these ended up in the ocm segment with objdump.
And you'll want "ld -T my_linker_script.ld"; google linker-script
for tutorials.

Cheers,
Peter

Many Thanks for your reply!
Marco

hongyan · Apr 21, 2006

1) No, they didn't. Because I did all the synthesis without I/O
package. So for the adder itself, there is no time delay for the input
signals. And If I add register before that, there will be some Dff gate
delay and wiring delay.

2) In XST, I didn't use register balancing but I still got less logic
levels. In synplifypro, its seems there is no such option. And I don't
know if the register balancing is a default operation or not.

Thank you for your reply.

Guru · Apr 21, 2006

Hi Dumak,

Zara was right (thank you

).
I found .cdc file at the "implementation\chipscope_opb_iba_0_wrapper".
It helps a lot.

Thnx,

Guru

Guru · Apr 21, 2006

Hi Dumak,

Zara was right (thank you

).
I found .cdc file at the "implementation\chipscope_opb_iba_0_wrapper".
It helps a lot.

Thnx,

Guru

Guru · Apr 21, 2006

Hi Michiel,

To correctly use a tristate you need 3 ports in your peripheral design
(VHDL)
for a single port ("port") in MPD and UCF file:
port_I port Input
port_O port Output
port_T port Toggle (low (0) enables you to write to tristate)

and in MPD: PORT port = "", DIR = INOUT, ENABLE=SINGLE,
THREE_STATE=TRUE

in UCF: NET port LOC = xx;

I hope this helps.

Guru

JJ · Apr 21, 2006

Precisely, anyone who has done FPGA cpu design knows how limiting FPGAs
can be .ie 20-120MHz is typical for unfriendly architectures.

If your not in that very small club of Intel, AMD, IBM, then even full
custom is also pretty limiting by the extreme expense of it all. The
top tier may still be directly instantiating transistors as well as
flops. But then transistor level design is still able to significantly
outperform standard cell logic using a variety of mostly nmos
differential techniques, I guess by a factor of 3. At Suns level, they
are much closer to full standard cell with synthesis with a fraction of
the clock of P4s but they make up for it by going to massive threading
and latency hiding to bring out the throughput.

Theres the clue there. The same can be done in FPGA cpu for
multithreading architecture to simplify the design so that you are not
limited to 32b carry ripples. In my Transputer design I was seeing
300MHz on the PEs because it could use 2 clocks per basic opcode and
used 8 clocks for 4 thread instructions, alot of cycle limiting logic
just vanishes, ie no hazard logic or register forwarding paths. The
hardware design of the MMU hasn't started so there is nothing to
release.

For information, my PE used 500 LUTs & 1 BlockRam and a few hundred LOC
in RTL Verilog. Given V4 can hold upto 554 BlockRams, means I coudl
instance quite a few of these PEs too. In some ways it is quite similar
similar to the Niagara/Sparc, whats the difference between slidy
register files with stack spilling v register files in memory but
cached on demand (process swapped) into register caches as the T9000
did.

If I wanted to see a Niagara core in FPGA I think I would go back to
the Sparc architecture documents and maybe LEON and see if a threaded
design could be done from scratch that executes the ISA but possibly
make some very different choices so the FPGA version wouldn't get
crippled. I wouldn't be constrained to 1 opcode per clock either, using
more clocks, lowers PE performance by clock but allows a much faster
clock and much less logic so more PE cores.

I am surprised that we haven't seen alot more native FPGA MTA designs
though,.

John Jakson
Transputer guy

Mich · Apr 21, 2006

Hi

I have test it and got this error
ERROR:MDT - test_I (IO_0) - D:\Thesis\Test\XPS_versie8\IO\system.mhs
line 358 -
port is driven by a sourceless connector

this is what I have done
first I added this in the vhdl
test_I : in std_logic_vector (7 downto 0);
test_O : out std_logic_vector (7 downto 0);
test_T : out std_logic_vector (7 downto 0);
and this
s_test <= test_I;
test_O <= s_test;
test_T <= "00001111";

then I have added this in the MPD file
PORT test = "", DIR = INOUT, ENABLE=SINGLE, THREE_STATE=TRUE, VEC =
[7:0]
PORT test_I = "", DIR = IN, VEC = [7:0]
PORT test_O = "", DIR = OUT, VEC = [7:0]
PORT test_T = "", DIR = OUT, VEC = [7:0]

and this in the ucf file
Net IO_0_test_pin Loc = "N6";
Net IO_0_test_pin IOSTANDARD = LVTTL;

then I get this error
ERROR:MDT - test_I (IO_0) - D:\Thesis\Test\XPS_versie8\IO\system.mhs
line 358 -
port is driven by a sourceless connector

there must be something I missed
can you tell me what it is?

Greets
Mich

John Williams · Apr 21, 2006

Hi Dale,

dale.prather@gmail.com wrote:

I'm having great difficulty interfacing my FSL to my external (from
microblaze point of view) VHDL. I want an FSL to communicate between
Microblaze and my external VHDL. I want to be able to import the .xmp
file to my ISE project and then create the instantiation template
(wrapper) for the XMP file (*_stub.vhd). In the file I need to see the
FSL control signals, data etc. I cannot get this to happen. I'm
thinking there needs to be some kind of interface VHDL inside of the
EDK project and then make those signals external. Please offer any
help you can. I'm frustrated.... at the end of my rope .

You'll probably need to create explicit FSL signals and bring these out
as toplevel ports in the EDK / MicroBlaze design.

And at the top of the MHS file, you'll have typical port declarations

PORT FSL0_S_CLK = fsl0_s_clk, DIR = O
PORT FSL0_S_DATA = fsl0_s_data, DIR = I
PORT FSL0_S_CONTROL = fsl0_s_control, DIR = I
PORT FSL0_S_EXISTS = fsl0_s_exists, DIR = I
PORT FSL0_S_READ = fsl0_s_read, DIR =O

then later

begin microblaze
....
PARAMETER C_FSL_LINKS=1
....
PORT FSL0_S_CLK = fsl0_s_clk
PORT FSL0_S_DATA = fsl0_s_data
PORT FSL0_S_CONTROL = fsl0_s_control
PORT FSL0_S_EXISTS = fsl0_s_exists
PORT FSL0_S_READ = fsl0_s_read
....
end

This is for a microblaze slave (incoming) port, it will be a little
different for outgoing ports, but the idea should be the same.

Does that hlep
?

John

Apr 21, 2006

Isaac Bosompem wrote:

I have to ask are you the head developer of this project?

I founded the FpgaC project, and drive it to some extent, but view it
as a group effort.

Also I see University of Toronto is heavily involved with your project.
While I do not go to that school, I go to a lesser known University in
the area, do you have any contacts there?

They have not been active in FpgaC, other than providing BSD licensed
TMCC sources we used as a foundation. I exchange emails with the
original author from time to time, but he is too busy in other projects
to participate at this time.

Tommy Thorn · Apr 21, 2006

JJ wrote:

Precisely, anyone who has done FPGA cpu design knows how limiting FPGAs
can be .ie 20-120MHz is typical for unfriendly architectures.

If your not in that very small club of Intel, AMD, IBM, then even full
custom is also pretty limiting by the extreme expense of it all. The
top tier may still be directly instantiating transistors as well as
flops. But then transistor level design is still able to significantly
outperform standard cell logic using a variety of mostly nmos
differential techniques, I guess by a factor of 3. At Suns level, they
are much closer to full standard cell with synthesis with a fraction of
the clock of P4s but they make up for it by going to massive threading
and latency hiding to bring out the throughput.

I think you're being overly generous to Sun here.

Theres the clue there. The same can be done in FPGA cpu for
multithreading architecture to simplify the design so that you are not
limited to 32b carry ripples. In my Transputer design I was seeing
300MHz on the PEs because it could use 2 clocks per basic opcode and
used 8 clocks for 4 thread instructions, alot of cycle limiting logic
just vanishes, ie no hazard logic or register forwarding paths. The
hardware design of the MMU hasn't started so there is nothing to
release.

I think this is pretty much well known, although no less true. However,
as Amdahl put it, "What would you rather use to plow a field? Two oxen
or a thusand chicken?". In your world things are of course different as
you're coming from a paradigm of many many threads. However the rest of
the world is only slooowly moving to multiple threads.

It is interesting though that by giving up half the speed on single
thread performance, you can gain 3-4 times the throughput for free.
I'll definitely play with that.

I am surprised that we haven't seen alot more native FPGA MTA designs
though,.

In addition to what I mentioned, there's surely more inertia issues and
the complication of multi-threaded software (assuming you can even take
advantage of it).

My $0.01
Tommy

Phil Tomson · Apr 21, 2006

In article <1143687230.719515.136950@j33g2000cwa.googlegroups.com>,
<fpga_toys@yahoo.com> wrote:

The next part of this project, is given a truth table, is to find the
best way to decompose it into LUT's and MUX's.

What format is the truth table in? (BLIF, PLA, some other internal data
structure)

So is this the tech mapping step? You've already created the truth table that
represents some part of the design and you need to map it into the underlying
architecture. Sounds interesting.

This project is
available for anyone that would like it. Since it's pretty issolated
from the compiler guts, and will be implemented in the device specific
output functions, it's a good project for someone wanting to get their
feet wet as an internals developer.

This isn't exactly easy to do well, but almost any implementation will
be better than the current fitting strategy for LUT only mapping, so
refinement can happen over time as the developers skills build in this
technology area. This is an area of active and competitive research, so
it might make an interesting undergraduate project, or graduate level
thesis.

Interesting reading is related to the Quine-McCluskey algorithm
(implemented in FpgaC) and general logic synthesis algorithms. See:

http://www.eecs.berkeley.edu/~alanmi/abc/

Ah, yes, Alan is on the cutting edge with this And-Inverter Graph stuff. He's
also a very good programmer.
Can you make use of what is already in ABC? While I would hesitate to use
code from some other university projects that have appeared in the past, I
would not hesistate to use Alan's code; it's generally engineered very well.

Phil

Apr 21, 2006

Phil Tomson wrote:

What format is the truth table in? (BLIF, PLA, some other internal data
structure)

Currently just a 2^n bit table, with a linked list for the associated
var's.

So is this the tech mapping step? You've already created the truth table that
represents some part of the design and you need to map it into the underlying
architecture. Sounds interesting.

Internally, each signal in TMCC/FpgaC has always been a truth table
with var list, limited to 4 inputs, and mapped to LUTs at netlist
output time. Simple optimizations, like don't cares and duplicate
removal have been applied to this 4-lut set to generate a reasonable
netlist, that's a little deeper than can be achieved by allowing the
internal representation to be wider than a 4-LUT. Widening up the
internal representation allows for F5/F6 MUX's to be easily used,
controlling the depth of the LUT tree, and better don't care removal,
at the cost of more effort to decompose the truth table at the
technology mapping step.

Ah, yes, Alan is on the cutting edge with this And-Inverter Graph stuff. He's
also a very good programmer.
Can you make use of what is already in ABC? While I would hesitate to use
code from some other university projects that have appeared in the past, I
would not hesistate to use Alan's code; it's generally engineered very well.

I've looked at a number of routing and mapping solutions over the last
couple years with conflicting project goals for FpgaC. Both excellent
static map/route and excellent fast dynamic compile load and go or
Just-In-Time styles seem to be needed. Because of frequent larger word
widths, there are some additional twists, like replication to minimize
fan-out which become useful at some point. Where to start has always
been an interesting problem. It seems that maybe just looking at the
technology mapping in a general way for specific devices is a good
start.

The ideas in ABC, and related educational projects, are certainly a
good grounding to start with.

Apr 21, 2006

Marco,
We just designed a board that does just what you've described above.
We have a signal going in to CCLK, which is also tied to a GPIO for use
after configuration. It's stated in the datasheet that all inputs to
CCLK after configuration are ignored, so that's not a problem. However
I wouldn't assume that the GPIO is Hi-Z during during configuration.
It depends on what you've done with the HSWAP_EN pin. We have that pin
tied to GND which gives all of the GPIO weak pullups during
configuration. This has not been a problem for us, so having external
logic is not necessary. We are using parallel slave config mode.
After configuration D0-D7 are used for the normal DSP data bus.
Another thing to keep in mind: What is the signal voltage of your DSP?
If it's not 2.5V, I would recommend having a look at this app not from
Xilinx.

http://www.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID=1&getPagePath=20477

Hope this helps.

Dale

Erik Widding · Apr 21, 2006

fpga_toys@yahoo.com wrote:

Phil Tomson wrote:
What format is the truth table in? (BLIF, PLA, some other internal data
structure)

Currently just a 2^n bit table, with a linked list for the associated
var's.

Does this bit table represent two or three states per bit?
(i.e. True, False, or True, False, Don't Care)

While on a sequential processor it might not make sense to worry about
the don't care state and simply enumerate the table, in logic it can
make a very big difference. You have probably already thought of this.
The use of "bit" just threw up a red-flag for me, as the bit type in C
generally has only two states.

Regards,
Erik.

---
Erik Widding
President
Birger Engineering, Inc.

(mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233
(fax) 617.695.9234
(web) http://www.birger.com

Markus Kuhn · Apr 21, 2006

"Angelos" <aamanat@ee.duth.gr> writes:
|> Hi all,
|>
|> I would like to ask if anyone has implemented a physical usb interface in a
|> development board of altera that has not mounted a usb inf.
|>
|> The core for usb1.1 and 2.0 are available but do i need certain physical
|> layer to implement usb?
|>
|> I heared that for low speed i dont need to implement anythin just drive the
|> wires into the fpga but for high speed i need to implement the phy intf.

If you need a High Speed (480 Mbit/s) USB 2.0 interface, it is probably
safest and least trouble to pick an existing integrated PHY such as

http://www.smsc.com/main/catalog/gt3200.html

Markus

Apr 21, 2006

Göran Bilski wrote:

It depends on how your application execute.
Do a "mb-objdump -S" and look at the code.
What program is initialized into the BRAM in the bitstream?
Do you download your program using XMD and then execute it?

Göran

I got it to work now, thanks.
The problem was the delays between switching diodes on and off were too
long, so Chipscope was displaying the waveform of one delay (I was
mistaken that the problem ende, when instead the program was in the
idle for-loop). That's why I wasn't seeing the instructions for loading
values into diode register.
Thanks for your help!

Ines

Jim Granville · Apr 21, 2006

Erik Widding wrote:
<snip>

If a default value is placed into the table, it artificially over
specifies the logic, and will result in a suboptimal result. There is
no problem with equations of no more than four terms as your basic unit
is a 4-lut. But when you start getting into complex state machines
with dozens of terms, this can result in being an order of magnitude
off in utilization in specific areas of a design. I bring this up now,
as the early test cases are going to be simpler, so it may be some time
before you see that there is really a problem.

So I am not disagreeing with you that FPGAC may not care about this for
now. I am suggesting that you should allow for this future
optimization that you will almost certainly need, by considering that
you should specify your fitter to take as input the additional state of
"don't care". All of the work out of Berkely (espresso, etc) supports
this addional state as it is crucial to quality of result.
snip

I'll expand this a little, by adding that there can also be
an 'inferred else' operation, that depends on the register used.

Thus a .D ff state engine, will tend to 00000 state, if
it hits a non-covered instance, whilst a .T ff state engine
will stay where it was.
If you want to include glitch/noise recovery, that can
matter...

-jg

EDK : FSL macros defined by Xilinx are wrong

Göran Bilski

Guest

Karl

Guest

Markus Kuhn

Guest

Marco T.

Guest

hongyan

Guest

Guru

Guest

Guru

Guest

Guru

Guest

JJ

Guest

Mich

Guest

John Williams

Guest

Guest

Tommy Thorn

Guest

Phil Tomson

Guest

Guest

Guest

Erik Widding

Guest

Markus Kuhn

Guest

Guest

Jim Granville

Guest

Log in

Welcome to EDABoard.com

Sponsor