EDK : FSL macros defined by Xilinx are wrong

Brian Davis · Apr 21, 2006

Sean Durkin wrote:

We use a lot of ADS527X-ADCs from TI. Those parts output 12bit/sample
via LVDS-DDR-links running at up to 480Mbit/s. Up to now, using Virtex-2
Pro, getting this into the FPGA is a little tricky (see xapp774). In
short, the current way is to feed the serial data into two carefully
hand-placed 6-bit-shift-registers that are clocked with 180-degrees
shifted clocks, and read those shift registers out in parallel once all
12 bits have arrived. Takes quite a bit of hand-placement, you have to
be careful which I/Os you use to connect the clocks and data, you need
DCMs to do phase-shifting, etc. Kinf of tricky, but it works.

(posted without having looked at the ADS datasheet or V4 IO clocking)

I've sucessfully done this sort of thing in V2-ish parts using one
of the nifty _DIFF_OUT buffers ( or hand built equivalent) to create
complementary local clocks to the DDR IOB registers (XAPP609),
with the next CLB register stage constrained only by a MAX_DELAY,
and the DCM clocks only used for the half rate logic.

This makes it fairly easy to hit IOB DDR timing without needing
any funky DCM phase shift delay calibration logic, only LOC's on
the I/Os to the proper local clocking region.

At 480 Mbps, I'd advise sticking with LVDS & DT terminators.

have fun,
Brian

Symon · Apr 21, 2006

"Brian Davis" <brimdavis@aol.com> wrote in message
news:1138387310.365316.257760@z14g2000cwz.googlegroups.com...

Sean Durkin wrote:
We use a lot of ADS527X-ADCs from TI. Those parts output 12bit/sample
via LVDS-DDR-links running at up to 480Mbit/s. Up to now, using Virtex-2
Pro, getting this into the FPGA is a little tricky (see xapp774). In
short, the current way is to feed the serial data into two carefully
hand-placed 6-bit-shift-registers that are clocked with 180-degrees
shifted clocks, and read those shift registers out in parallel once all
12 bits have arrived. Takes quite a bit of hand-placement, you have to
be careful which I/Os you use to connect the clocks and data, you need
DCMs to do phase-shifting, etc. Kinf of tricky, but it works.

(posted without having looked at the ADS datasheet or V4 IO clocking)

I've sucessfully done this sort of thing in V2-ish parts using one
of the nifty _DIFF_OUT buffers ( or hand built equivalent) to create
complementary local clocks to the DDR IOB registers (XAPP609),
with the next CLB register stage constrained only by a MAX_DELAY,
and the DCM clocks only used for the half rate logic.

This makes it fairly easy to hit IOB DDR timing without needing
any funky DCM phase shift delay calibration logic, only LOC's on
the I/Os to the proper local clocking region.

At 480 Mbps, I'd advise sticking with LVDS & DT terminators.

have fun,
Brian

Guys,

It's Friday night and I hear the siren call of the pub, so excuse the
briefness of this answer!

I've also had great success with DDR links without resorting to DCM phase
shift complexity. Check out XAPP233 figs. 9 and 10. The key insight is to
use a latch (NOT a FF) to align the clock enables in the rising and falling
edge clock domains. The latches are fast enough to meet the timing way
beyond 480M, my stuff is working great at 622M.
Agree totally with Brian's recommendation re. LVDS_DT.
HTH, Syms.

Symon · Apr 21, 2006

"Phil Tomson" <ptkwt@aracnet.com> wrote in message
news:drdphr02ac0@enews1.newsguy.com...

In article <43da132a$0$15788$14726298@news.sunsite.dk>,
Symon <symon_brewer@hotmail.com> wrote:
Alternatively, For SIN/COS you might consider the Sunderland algorithm and
the sine-phase difference algorithm.

Often, for sin and cos you can get by with a lookup table. Depending on
how
much accuracy you actually need your lookup table may not even need to be
all
that large, especially since you only need from 0 to 90 degrees in the
table.

Phil

Yep, the algorithms I mentioned dramatically reduce the size of the lookup
table for a given accuracy.
Cheers, Syms.

Duane Clark · Apr 21, 2006

shane wrote:

hi can anyone tell me how can i use opb emc for controlling more than
two memory banks. opb emc can be used for 8 memory banks each bank
can have different data width. one memory bank already being used is
external memory SRAM with 32 data width. but i want to add flash
memory to EMC with a data width of 8. is there any tutorial of how i
can add a memory bank via EMC?

I would suggest that you not attempt to add it as a separate bank on an
existing opb emc. Instead, put in a separate opb emc instance just for
the flash. Here are settings I use for one particular flash, from the
system.mhs file:

# TE28F640-J3, flash mem
BEGIN opb_emc
PARAMETER INSTANCE = my_flash
PARAMETER HW_VER = 2.00.a
PARAMETER C_NUM_BANKS_MEM = 1
PARAMETER C_OPB_CLK_PERIOD_PS = 10000
PARAMETER C_MEM0_BASEADDR = 0x94000000
PARAMETER C_MEM0_HIGHADDR = 0x94FFFFFF
PARAMETER C_MEM0_WIDTH = 32
PARAMETER C_INCLUDE_DATAWIDTH_MATCHING_0 = 0
PARAMETER C_TCEDV_PS_MEM_0 = 120000
PARAMETER C_TAVDV_PS_MEM_0 = 120000
PARAMETER C_THZCE_PS_MEM_0 = 35000
PARAMETER C_THZOE_PS_MEM_0 = 15000
PARAMETER C_TWC_PS_MEM_0 = 120000
PARAMETER C_TWP_PS_MEM_0 = 70000
BUS_INTERFACE SOPB = opb_bus
PORT Mem_A = flash_a
PORT Mem_DQ = flash_dq
PORT Mem_CEN = flash_cs_l
PORT Mem_OEN = flash_oe_l
PORT Mem_WEN = flash_we_l
PORT Mem_RPN = flash_rst_l
END

Apr 21, 2006

Symon wrote:

Hi Pete,
There's a guy here on CAF who's something to do with FpgaC.
http://fpgac.sourceforge.net/
Perhaps you could ask him to help implement the COS function, he seems to be
at a loose end! ;-)

Actually it would be a fun project, and down the line of where
we are headed long term anyway. The FpgaC project has
a way to go to be fully featured, but it seems that this project
could be done as a series of user written functions with little
trouble.

As far as being a loose end, I'll take that as a complement

Life is very disappointing when you sit around a wait for things
to happen. Stirring things up a little is sometimes required to
shake the cobwebs off that are holding everyone stuck in place.

Brian Davis · Apr 21, 2006

johnp wrote:

I don't see any documentation on the DIFF_OUT buffers you mention.
Do you have any info on them or pointers to doc?

All the V2-ish differential input buffers have a complementary output

available, that can be used to create a 180 degree clock without
needing a DCM.

These can also be used just to invert a differential input without
needing any other logic (or board cuts & jumps).

Look at the DIFFS component in fpga_editor to see what's going on;
besides the normal 'phantom' route from the DIFFS to the DIFFM,
there's also a route from the DIFFM to a differential receiver in the
DIFFS that outputs the complement signal.

I first spotted these when they showed up in early versions of
XAPP622 as a hard macro.

Support & tool bugs for these have varied version to version,
see Answer Record 21958 for recent problems.

I've banged into various other problems in using them over the years;
if I get a chance this weekend, I'll try to dig up some old webcase
code showing how to create one out of two normal IBUF{G}DS's as
a work around.

These can be used on regular IOB inputs as well as global clock
inputs, but you've generally needed to LOC the global input buffer
and bufg's to allowed sites to get this to work.

search for
ibufgds_diff_out
ibufds_diff_out

Brian

Johan Bernspang · Apr 21, 2006

Without knowing the particular package you use, but having
written a fullblown tcp/ip implementation for PPC, I would say
that either your client is messed up (does not send the right
ack segments so you have to manually send them) or, more
likely, some of the segments the server sends after the first
one get lost and thus the client never sends an ACK.
If the IP layer is doing fragmentation/defragmentation, this
could be a place to look at. Also, I have encountered similarH
behaviour when I have had simply physical layer problems
(too many lost packets), but in most cases the TCP retransmission
would take care of that.
Hope this is of some help...

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------

Hi Dimiter and thanks for your input,

The client is made in .NET by a colleague, and I don't think that the
socket package is behaving badly. I have also tried to connect with a
terminal program to my server as well with the same results.

I have been sniffing the communication with Etherreal and what happens
is that the server starts listening on a certain port (i.e. accept is
polled until a client connects). The initial SYN and ACKs are carried
out correctly and the server is waiting for the client to send it some
sort of commando. This is done in the same loop as accept on the
established connection. When the client tells the server to start
sending data a xilsock_send command is added to the loop. That is, each
looping does first an accept to check for packets, second a parsing of
the available data (if any), and finally a sending of data. In Etherreal
I have seen that the first data packet is sent successfully followed by
an ACK from the client, the second data packet is sent too (with the
correct seqno and ackno), but the ACK to that data packet is a duplicate
of the first ACK (thus with the wrong ackno). The server then tries to
retransmit the failing packet without success... I wouldn't be surprised
if my server application is lacking some feature, but I really can't see
what it is.

Johan (from the living room couch on a Friday night)

--
-----------------------------------------------
Johan Bernspĺng, xjohbex@xfoix.se
Research engineer

Swedish Defence Research Agency - FOI
Division of Command & Control Systems
Department of Electronic Warfare Systems

www.foi.se

Please remove the x's in the email address if
replying to me personally.
-----------------------------------------------

dp · Apr 21, 2006

Johan Bernspang wrote:

...... In Etherreal
I have seen that the first data packet is sent successfully followed by
an ACK from the client, the second data packet is sent too (with the
correct seqno and ackno), but the ACK to that data packet is a duplicate
of the first ACK (thus with the wrong ackno). The server then tries to
retransmit the failing packet without success... I wouldn't be surprised
if my server application is lacking some feature, but I really can't see
what it is.

Johan (from the living room couch on a Friday night)

Hi Johan,

this really sounds like the segment you call "second packet" is either
bad when being sent or lost at the receiving side, thus the server
retransmits. Since it does have the right seq and ack numbers,
the reason must be elsewhere - could be the tcp checksum,
IP header checksum, some other bad field, damaged IP, misrouted,
you name it. Please feel free to contact me priavately if you feel like
it,
I'd be happy to look into the IP packet exchange (but I do think
it will just take you another hour or so of thorough
inspection and you'll figure it out, these things are bulky yet
pretty straight forward).

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------

Eric Smith · Apr 21, 2006

John Adair wrote:

EDK 8.1 is available from late last week to existing customers by the
download mechanism.

I just spoke to someone in Xilinx Customer Service, and she said that
although they had expected the official release of EDK 8.1 to occur this
past Monday, that apparently it did not, and she's not sure when it will
be.

The same thing happened with ISE 8.1; some customers were able to
download it before the official release, but others were not.

It seems reasonable to expect that the official release will occur
soon.

Eric

Brian Davis · Apr 21, 2006

Symon wrote:

The key insight is to use a latch (NOT a FF) to align the clock enables
in the rising and falling edge clock domains. The latches are fast enough
to meet the timing way beyond 480M, my stuff is working great at 622M.

Alas, I swore off Xilinx latches after the Great Latch Inversion of '00

Brian

p.s.
Actually, it's not the actual latches I distrust; it's the software
guys configuring the latches, having to interpret the latch enable
sense when it's not properly documented in the datasheet.

GoogleGroups "CLB latch clock inversion"

from latch_bug1.txt:
: For an active low latch in an EDIF input file, you get
: the following results when tracing the clock signal through
: the tool flow:
:
: tool stage XC4000EX derived Virtex Derived
: -------------------------------------------------------
: EDIF in active low active low
: MAP out CLK:CLKNOT CLK:CLK
: EPIC display dot clock normal clock
: SIMPRIM model X_INV -> X_LATCH X_INV -> X_LATCH
: real HW transparent low transparent low
:
: note: X_LATCH SIMPRIM models are transparent high

Simon Peacock · Apr 21, 2006

I think you are taking the wrong approach. I'm sure you can get the C code
running. But in reality if you have to implement sin/cos & sqrts, then the
C code will be far from optimal, and you might need a vertex to do the job
of a Spartan

Simon

"Pete Hudson" <pete.hudson@baesystems.com> wrote in message
news:1138358605.349842.85640@z14g2000cwz.googlegroups.com...

I have been presented with a c program to implement on an fpga.
I am investigating the possible processes/tools I could employ rather
than a straight rewrite in VHDL.

Current candidates are:

Impulse C
Handel-C
Xilinx System Generator

The algorithm is littered with sin cos sqrt & divides. So I expect that
I require some of the xilinx IP cores that come with my ISE tool.
(That's why XSG is getting a look in)

Q. How do I implement this algorithm's cos functions (for example) in
Impulse C so that it is represented in the resultant HW?

Sean Durkin · Apr 21, 2006

Erik Widding wrote:

Use the ISERDES setup you describe here to get 6 bits at a time.
Really quick pseudo code for a solution:

signal left : std_logic_vector ( 5 downto 0);
signal sample : std_logic_vector ( 11 downto 0);

if ( falling_edge( sample_clock ) ) then
left <= ISERDES_out(5 downto 0);
end if;

if ( rising_edge( sample_clock ) ) then
sample <= left && ISERDES_out(5 downto 0);
end if;

Sample clock needs to be properly phased to the ISERDES output, and a
timing constraint with 1/2 the period of sample_clock is placed between
ISERDES_out and the two destination registers. The tools will take it
from here.
I see what you're getting at. But I'd still need to supply the ISERDES

with a 1/3-divided clock to get a 6bit-output (this is the clock the
ISERDES output is registered with). That clock I don't have, I'd need to
use a DCM or something to generate it. The sample clock on the other
hand I get from the ADCs along with the data. The whole point of using
the ISERDES is to save on clock nets, DCMs etc., and simply use what I'm
supplied with.

cu,
Sean

Kolja Sulimma · Apr 21, 2006

Sean Durkin schrieb:
The whole point of using

the ISERDES is to save on clock nets, DCMs etc., and simply use what I'm
supplied with.

The point of the ISERDES is to meet timing that you would not with a LUT
based SERDES.

The point of the IO delay lines is to save DCMs and clock nets. Without
them you would need a DCM per ADC anyway to do the phase alignement,
even if you could use the frequency 1:1.

Kolja Sulimma

Apr 21, 2006

Simon Peacock wrote:

I think you are taking the wrong approach. I'm sure you can get the C code
running. But in reality if you have to implement sin/cos & sqrts, then the
C code will be far from optimal, and you might need a vertex to do the job
of a Spartan

Actually, I find it not that different in many cases from similar
VHDL/Verilog
and have the advantage of being able to tweek the output in FpgaC when
I don't like it. Especially when I do a boolean design, it goes right
thru FpgaC
as boolean logic equations to the LUT. The C may look a bit ugly that
way, but
you certainly have nearly exact control over the output. For example:

[jbass@fastbox tests]$ cat example1.c
int a:1,b:1,c:1;
#pragma inputport (a,a9)
#pragma inputport (b,a10)

int sum_of_products:1;
#pragma outputport (sum_of_products,a11)

main()
{

c = (sum_of_products = (a&b) + c);
}

produces:

[jbass@fastbox tests]$ fpgac -target cnf example1.c
[jbass@fastbox tests]$ cat example1.cnf
_example1_Running^CLK = VCC;
_a^CLK = port(_a,"a9");
_b^CLK = port(_b,"a10");
_c^(CLK*_example1_main__state_C1) = (~_c*_b*_a)+(_c*~_a)+(_c*~_b);
port(_sum_of_products,"a11")^(CLK*_example1_main__state_C1) =
(~_c*_b*_a)+(_c*~_a)+(_c*~_b);
_example1_main__state_C1 = ~_example1_Running;

which in xnf/edif is a bit more verbose (so I'll just take the meat of
it):

SYM, _sum_of_products-OBUF, OBUF
PIN, I, I, _sum_of_products
PIN, O, O, sum_of_products
END
EXT, sum_of_products, O,, LOC=_a11
SYM, FFin-_sum_of_products, EQN, EQN=((~I0*I1*I2)+(I0*~I2)+(I0*~I1))
PIN, I2, I, _a
PIN, I1, I, _b
PIN, I0, I, _c
PIN, O, O, FFin-_sum_of_products
END
SYM, _sum_of_products, DFF
PIN, D, I, FFin-_sum_of_products
PIN, C, I, CLK
PIN, CE, I, _example1_main__state_C1
PIN, Q, O, _sum_of_products
END

Apr 21, 2006

Larry Doolittle wrote:

No, I don't think the ISE license is, or should be, OSD-compliant.
The discussion here is whether code that I write, based on the
documentation given in ISE, can be released under an OSD-compliant
license, like BSD or GPL.

The question goes far past that. Consider the BYU JHDL has XDL
interfaces in
it. As does the University of Massachusetts VPR for Virtex + JBits
Interface
project built on top of the University of Toronto work. Besides the VPR
work
at UofT, several other projects like EVE have XDL interfaces as well.
As
does the UC Berkeley work for Post-Placement C-slow Retiming. As does
Peter Sutton's work JPG - A Partial Bitstream Generation Tool to
Support
Partial Reconfiguration in Virtex FPGAs" at UQ Australia. As does the
DAGGER work done at universities in Greece. The MIT team doing 3-D Fpga
architecture research was using it. The French researchers at IRIAS are
doing FPGA array layouts with it. As is research and thesis work at
Lund Institute of Technology, Sweden in dynamic reconfiguration. As is
research and thesis work at Virginia Polytechnic Institute. As is
research
and thesis work at Seoul National University, Korea. As is research
work
at Pennsylvania State University. As is research and thesis work at
UCLA.
As is research and thesis work at Stanford. ... and the list goes on,
and on.

Many of these projects have released source and documentation which
has XDL formats and xilinx device interfaces embedded.

In searching, we also find that XDL is discussed in class room
lectures, and
is the subject of class exercises.

Many of these projects describe XDL as an open interface, and treat if
freely
as such. That is clearly stated in many places including the Xilinx/BYU
work
for Los Alamos SEU project. Some of these projects include Xilinx
co-authors.

So, clearly the cat is out of the bag, and Xilinx should either
properly protect
it's IP, or just sit down and release the XDL formats and library
interfaces,
along with the exposed chip architectures and routing that all of the
above
listed projects have already fully disclosed.

Larry Doolittle · Apr 21, 2006

[Pardon me while I partially mix threads, I blame it on an erratic
news server. So I'm using one article as a surrogate to paste in
comments I retreived via Google. So while the author attributions
are correct, the text doesn't come from the articles listed in the
header.]

On 2006-01-28, fpga_toys@yahoo.com <fpga_toys@yahoo.com> wrote:

Larry Doolittle wrote:
Second concern: can open source software be published that
works with XDL? IANAL (honest), but we have to look at this
from a lawyer's perspective. [chop]

The answer unfortunately is that the EULA NDA restrictions and
open source are mutually exclusive. The EULA NDA is so resrictive
that you can not even talk to anyone about your "performance" with
the experience as benchmarking includes everything from bugs, to
comparitive results, to non-comparitive objective results.

While I despise terms like this in an EULA, they almost make sense
when discussing the running software itself. The most egregious
consequences come from trying to apply them to documentation, as
Xilinx's license does.

UC Berkeley['s] Post-Placement C-slow Retiming, [chop] Peter Sutton's
JPG, [chop] Greek universities [unspecified] DAGGER, [chop] IRIAS
FPGA array layouts, [chop, plus] research and thesis work [in
Sweden, VPI, Seul, Pennsylvania State, UCLA, Stanford, all of which
use XDL].

That's a long list. I should spend more time with Google.

Xilinx should either properly protect it's IP,

By pulling its head out of the sand, sending C&D letters to all
of the above projects, and posting a FAQ (oops, "answer record")
saying "No, you can't publish code that speaks XDL because of the
terms of the ISE EULA".

or just sit down and release the XDL formats and library
interfaces, along with the exposed chip architectures and
routing that all of the above listed projects have already
fully disclosed.

Whoa, where did "library interfaces" come from? Now you're talking
software, not just documentation. Drop that item from your wish
list. You'll raise fewer hackles.

The "exposed chip architectures" are very sensitive to Xilinx, but
I think they would listen to a reasoned argument that information
of that type can not be put back in the bottle. Too much of it is
already out there in the patent literature, for example.

Xilinx to post the
XDL documentation on-line. Why isn't it already in the "Xilinx
ISE 8 Software Manuals and Help"? Or maybe it is, and I'm just
too blind to see it?

I searched more fully, it's not there. And posting the material
(obviously NDA-free, unlike the ambiguous stuff in the ISE download)
would quickly end this discussion. I have to assume Xilinx would
like that result.

- Larry

Duane Clark · Apr 21, 2006

shane wrote:

hi

Thanks for replying

iam using microblaze for first time so please bear with my questions

iam using data width for flash device as 8 but does opb emc support
data width matching for flash devices?

There is a section in the opb_emc document entitled "Data-Width Matching
for Flash Memories".

if it does support, updating the settings of system.mhs file takes
care of it or do i need to change any other settings

If you are using the GUI tools, them I don't know whether changes made
in the system.mhs file will get overwritten. I don't use the GUI, so I
make all changes by editing that file, and that is the only place
changes need to be made.

wat values should i use for these parameters C_THZCE_PS_MEM_0,
C_TWP_PS_MEM_0 ,C_TCEDV_PS_MEM_0, C_TAVDV_PS_MEM_0, C_TWC_PS_MEM_0

You need to look at the data sheet for the flash you are using, and
compare that to the definitions in the opb_emc data sheet. The values I
gave as an example are only good for that particular flash.

Jan Coombs · Apr 21, 2006

Antti Lukats wrote:

"Ed McGettigan" <ed.mcgettigan@xilinx.com> schrieb im Newsbeitrag
news:drbp92$nh99@cliff.xsj.xilinx.com...

fpga_toys@yahoo.com wrote:

I've actually told
everyone who I am, and it's not that difficult to figure it out,

I can't find any reference to you, so who are you then.

Ed

Ed,

you are right he hasnt told that, he actually told to me that I should get
used to people (like he) hiding its identify.

Earlier today in this thread, 11:50 uk time.

but if I am not mistaken then his name is: John Bass

Handles are good deflectors of casual interest. I found the discussion very
interesting, and had noticed Johns name and interest before today.

Jan Coombs

Mahmoud · Apr 21, 2006

What are you trying to do (design)?
Why did you choose Handel-C? why not VHDL or Verilog?
Roberto wrote:

Hi all.
I must develop a software for a Digilent 2-SB (with chip Xilinx Spartan 2E)
coupled with a Digilent Digital I/O 4
I decided to use Handel-C for development , but i don't know what i must
study to start
I downloaded manuals for both digilent devices (only 14 pages).
Could you counsil me any books or links for beginners?
Thanks very much

Sean Durkin · Apr 21, 2006

Bob wrote:

You still need the /12 clock, from the ADC, in order to locate each sample
boundary, right?
Yupp, that you would have to generate using a DCM. But the clever thing

about this is that since you're running the FPGA with the sample clock
from the ADC, and the DCM generates a perfectly edge-aligned /12 clock,
you have perfect phase relationship between all clocks and data
channels, it all fits together perfectly, regardless of the number of
channels (assuming that the sample clock is the same for all channels).

cu,
Sean

EDK : FSL macros defined by Xilinx are wrong

Brian Davis

Guest

Symon

Guest

Symon

Guest

Duane Clark

Guest

Guest

Brian Davis

Guest

Johan Bernspang

Guest

dp

Guest

Eric Smith

Guest

Brian Davis

Guest

Simon Peacock

Guest

Sean Durkin

Guest

Kolja Sulimma

Guest

Guest

Guest

Larry Doolittle

Guest

Duane Clark

Guest

Jan Coombs

Guest

Mahmoud

Guest

Sean Durkin

Guest

Log in

Welcome to EDABoard.com

Sponsor