EDK : FSL macros defined by Xilinx are wrong

John Adair · Apr 21, 2006

Not quite an entire solution but you can store "fixed" parameters in the
programming flash memory. Xapp482 has coverage of this and is here
http://www.xilinx.com/bvdocs/appnotes/xapp482.pdf .

John Adair
Enterpoint Ltd. - Home of Broaddown2. The Ultimate Spartan3 Development
Board.
http://www.enterpoint.co.uk

"Quiet Desperation" <nospam@nospam.com> wrote in message
news:060420051752320025%nospam@nospam.com...

Any chance of there ever being an FPGA where one or more of the
SelectRAM blocks is nonvolatile?

I design a lot of stuff that is programmable and reconfigurable beyond
the FPGAs. I commonly need to store the setting of digital delay chips
and switch settings and other control lines so that a unit powers up
with everything in the state desired by the end user.

Currently I use little automotive serial EEPROMs, but, dang but it'd be
nice to have a little EEPROM inside an FPGA. Just one 18Kbit block
would do wonders.

Jon Beniston · Apr 21, 2006

I'm sad to hear that there is a patent on the ARM Thumb instruction set
that extends to the general principle, because something like it is
what the PowerPC architecture desperately needs - so that people can
use it the way IBM wants, without compromising the architecture.

I didn't think it was on Thumb instructions, but on the method of
switching between ARM and Thumb execution (using the LSB of the PC to
indicate which mode you're in).

Forget Thumb anyway, it's rubbish. What patents there are on Thumb2
should be more interesting.

Cheers,
Jon

Nick Maclaren · Apr 21, 2006

In article <e87b9ce8.0504070204.5f80c342@posting.google.com>,
jon@beniston.com (Jon Beniston) writes:
|>
|> > I'm sad to hear that there is a patent on the ARM Thumb instruction set
|> > that extends to the general principle, because something like it is
|> > what the PowerPC architecture desperately needs - so that people can
|> > use it the way IBM wants, without compromising the architecture.
|>
|> I didn't think it was on Thumb instructions, but on the method of
|> switching between ARM and Thumb execution (using the LSB of the PC to
|> indicate which mode you're in).

Totally different from using the top bit, as was done on the IBM 370
range and others, of course.

Regards,
Nick Maclaren.

Symon · Apr 21, 2006

"Jim Granville" <no.spam@designtools.co.nz> wrote in message
news:4254b2ad@clear.net.nz...

Alex wrote:
These 'school boy' issues with the more mature CPLDs certainly makes
their testing program look thin/non-existant.
Perhaps it is all a ploy to move designs to the latest "hot new
thing" - or maybe they have too many ex-microsoft employees ?

Jim,
Indeed. You'd think they'd have a bunch of hardware with all these parts on
them and run a regression test with new releases. As in earlier an earlier
thread, I'll be waiting a service pack or three before swapping!
Cheers, Syms.

Nick · Apr 21, 2006

On Thu, 7 Apr 2005 16:17:10 +0800, "kingkang" <305liuzg@163.net>
wrote:

Hi
I wrote a sdram controller which has pass the RTL simulation.
But when it come to the Altera cyclone board,the read/write
data were wrong.I have written sdram with some data,and then
I read the data from sdram.But found the data is not equal to
what have been written into the sdram.One or Some bits have
wrong.It is random bit error!I don't know what's wrong.About
the clock? or board delay? or else?Please help me out!
Thanks and Regards!

Did you set up the right delay for the clock of the SDRAM ? I use a
288 deg phase shift on my pll to feed the SDRAM, at 50 MHz. It can be
something else for you thought.

That's one of the most important thing.

Regards
Nick

Apr 21, 2006

Quiet Desperation wrote:

Any chance of there ever being an FPGA where one or more of the
SelectRAM blocks is nonvolatile?

I design a lot of stuff that is programmable and reconfigurable
beyond
the FPGAs. I commonly need to store the setting of digital delay
chips
and switch settings and other control lines so that a unit powers up
with everything in the state desired by the end user.

Currently I use little automotive serial EEPROMs, but, dang but it'd
be
nice to have a little EEPROM inside an FPGA. Just one 18Kbit block
would do wonders.

The Altera MAX II parts can do this. They contain 8kb of flash that
can be both read and written by your design to store anything you want.
See
http://www.altera.com/products/devices/cpld/max2/features/flash/mx2-flash_memory.html
for details on how to use it.

Vaughn
Altera
[v b e t z (at) altera.com]

Brijesh · Apr 21, 2006

Peter Alfke wrote:

If you use STROBE as a rising-edge clock input, then excessive noise
might be superimposed on its falling edge, such that the falling edge
actually contains a rising edge clock, which would give you weird
timing.
Just a wild guess...
Peter Alfke

Hello Peter,

Its DDR scheme. Data is clocked in both on the rising edge and falling edge.

My main question is still how slow is too slow?

Thanks

Brijesh

John Mashey · Apr 21, 2006

Everett M. Greene wrote:

Eric Smith <eric@brouhaha.com> writes:
"JJ" <johnjakson@yahoo.com> writes:
If some processor that is not any way MIPs like can otherwise
perform a
register ld/st for a word on any byte boundary is that a problem,
or
only if the rest of it is MIPs like too.

It's not the unaligned load/store per se that is patented; many
processors
have done that.

What MIPS invented and patented was the idea that instead of having
the
hardware deal with unaligned bus accesses, they require software to
issue *two* instructions to do an unaligned access. One does the
"left part" and one does the "right part" of the word.

This must be a misstatement or it's a ridiculous patent.
How can a patent be issued for NOT doing something?
I can get a patent for an anchor that requires the
addition of propulsion and lifting surfaces so that
it can fly on its own?

There's also the "obvious to anyone experienced with
the technology" thing.

The normal MIPS load and store instructions require alignment just
as
on most other RISC processors.

It's a poor statement; it's not a ridiculous patent, although, as I
have explained before in comp.arch [search: mashey lwl], in retrospect,
I'd just as soon we hadn't done it. Even though it was easy at the time
[Summer 1985], but turned out to get far less use than we'd expected,
at least partly because the rise of RISCs with strict alignment got
many people to clean up code. As noted elsewhere, if I were doing it
again, I'd probably do something different.

The patent is 4,814,976 by Hansen & Riordan.

The *point* of the patent was that if you have a straightforward RISC
pipeline that supports caches and paged virtual memory, then requiring
hardware to do all the work of handling arbitrarily-aligned data [i.e.,
crossing cache-line or worse, page boundaries] adds *greatly* to the
implementation complexity, and one doesn't want to do this. [The
implementation penalty for some microcoded CISCs cn be much less.]

The MIPS solution was some much simpler hardware (very minimal
additions, and nothing tricky] beyond what was there, that allowed
compilers to generate code to deal with unaligned accesses that
sometimes came up from legacy code without burdening the base hardware
design.

[This is one of those classic "It takes a bunch of hardware to make
this both fast and right, and it means a lot of checking for cases that
hardly ever happen, but if the hardware is wrong, the bugs are horrible
to find." cases that hardware designers hate.]

Brijesh · Apr 21, 2006

Symon wrote:

Hello Symon,

Brijesh,
Have you considered using the strobe signal at a latch enable rather than a
clock? Rise time is then unimportant. t\The IOBs' storage elements can be
latches, IIRC.

Its DDR scheme so simple Latch scheme wont work, as data is only stable
during the rising/falling edge of the strobe.

Even otherwise, as I mentioned there are multiple channels and all of
them work just fine, except this one. Thats led me to believe its not
design problem, but SI issue.

As for pin impedances, you need to use the IBIS files to determine this. Do
you have HyperLynx?
Have not really worked with IBIS files. I did peek into IBIS files for

the first time and found they do not have a direct number.

Don't have HyperLynx, will read up about it.

Thanks
Brijesh

Cheers, Syms.

Peter Alfke · Apr 21, 2006

There is no fundamental limit. A flip-flop will be clocked, even if the
clock takes seconds or minutes to rise. But the longer the transition
time, the bigger the chance of picking up noise, and creating a
double-pulse. And the fact that you use DDR does not invalidate my
prvious response. Noise then has a chance to disturb either edge or
both.
Peter Alfke
============
Brijesh wrote:

Peter Alfke wrote:

If you use STROBE as a rising-edge clock input, then excessive
noise
might be superimposed on its falling edge, such that the falling
edge
actually contains a rising edge clock, which would give you weird
timing.
Just a wild guess...
Peter Alfke

Hello Peter,

Its DDR scheme. Data is clocked in both on the rising edge and
falling edge.

My main question is still how slow is too slow?

Thanks

Brijesh

John Mashey · Apr 21, 2006

David wrote:

On Wed, 06 Apr 2005 11:08:53 +1200, Jim Granville wrote:

I heard that the NIOS II is 'very similar' to MIPS - can anyone
who
knows both cores in detail comment ?

-jg

I don't know the MIPS in detail (long ago, I read a book on the
R2000/R3000 architecture which I picked up in a second-hand
bookshop), but
there are certainly some fundamental similarities. Each is a 32-bit
RISC
core, with 32x32-bit registers, orthogonal instruction set, etc. The
NIOS
II is a little odd (IMHO) in that it has some registers with
dedicated
purposes and supervisor-only access. I think, however, that quite a
lot
of 32-bit RISC architectures (ARM, Microblaize) would be "very
similar" at
this level.

See http://www.altera.com/literature/lit-nio2.jsp.

NIOS II is clearly different from MIPS I, but I'd guess that whoever
designed NIOS II was quite familiar with MIPS, as a lot of nomenclature
(register names, many of the register allocations, some opcode names)
are rather MIPS-reminiscent. Of course, lots of ISAs have borrowed
from each other, and and ADD is an ADD

. However, of 32 registers,
at least 25 [0-23, 31] are allocated exactly as in MIPS, and generally
have same names, even when those aren't obvious.

People may recall my comments in the WIZ discussion about wishing not
to have done MUL and DIV the way we did in MIPS, but to have done it
more like the way Alpha did it later, and NIOS II does that.
NIOS II has no (MIPS) LUI instruction, but has ANDHI, ORHI, XORHI. We
almost did ANDHI or ORHI (as they subsume LUI, but have other uses),
but thought about it a little too late [Summer 1985] to get in.

In general, NIOS II feels like a sensible design for a small embedded
core, a bit more like MIPS than like any other RISC, but in general,
doing things different that in fact match with things that in
retrospect, we might well ahve done different.

Nick Maclaren · Apr 21, 2006

In article <1112905289.368801.161230@g14g2000cwa.googlegroups.com>,
John Mashey <old_systems_guy@yahoo.com> wrote:

The MIPS solution was some much simpler hardware (very minimal
additions, and nothing tricky] beyond what was there, that allowed
compilers to generate code to deal with unaligned accesses that
sometimes came up from legacy code without burdening the base hardware
design.

And the Nick Maclaren comment is that most of those codes are so
horrible that they probably aren't getting right answers anyway.
I strongly disapprove of fixing up alignment in software - it is
much better to diagnose the failure and get the programmer to fix
the broken code.

Notice that this requires the hardware to do even less

Regards,
Nick Maclaren.

Terje Mathisen · Apr 21, 2006

John Mashey wrote:
[re. misaligned load patent]

It's a poor statement; it's not a ridiculous patent, although, as I
have explained before in comp.arch [search: mashey lwl], in retrospect,
I'd just as soon we hadn't done it. Even though it was easy at the time
[Summer 1985], but turned out to get far less use than we'd expected,
at least partly because the rise of RISCs with strict alignment got
many people to clean up code. As noted elsewhere, if I were doing it
again, I'd probably do something different.

The patent is 4,814,976 by Hansen & Riordan.

The *point* of the patent was that if you have a straightforward RISC
pipeline that supports caches and paged virtual memory, then requiring
hardware to do all the work of handling arbitrarily-aligned data [i.e.,
crossing cache-line or worse, page boundaries] adds *greatly* to the
implementation complexity, and one doesn't want to do this. [The
implementation penalty for some microcoded CISCs cn be much less.]

The MIPS solution was some much simpler hardware (very minimal
additions, and nothing tricky] beyond what was there, that allowed
compilers to generate code to deal with unaligned accesses that
sometimes came up from legacy code without burdening the base hardware
design.

I have never seen either the patent or the relevant MIPS asm code
generated, but it seems to me that the hw I'd want would look like this:

a)

LoadAligned r1=[r0]

where any low-order bits in r0 would be ignored.

b)

either

LoadAligned r2=[r0+regsize]

or

LoadAlignedRight r2=[r0]

Both of these would load the next aligned word

c) (The somewhat tricky one!)

ShiftToAlign r3=r1,r2,r0

which is defined to merge r1 & r2, using the loworder bits from r0 to
determine the number of bytes to shift.

Since this opcode takes four register operands, I'd suggest forcing the
destination to be the same as the low-order source register, i.e.:

ShiftToAlign r1=r2,r0

defined as

r1 = (r1 >> (r0 & 7)*8) | (r2 << (8-(r0 & 7))*8);

Doing it this way makes it easy to process an array of misaligned data,
since the second source register is unmodified, so it can be used as the
primary (modified) source during the next iteration.

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Everett M. Greene · Apr 21, 2006

Eric Smith <eric@brouhaha.com> writes:

"JJ" <johnjakson@yahoo.com> writes:
If some processor that is not any way MIPs like can otherwise perform a
register ld/st for a word on any byte boundary is that a problem, or
only if the rest of it is MIPs like too.

It's not the unaligned load/store per se that is patented; many processors
have done that.

What MIPS invented and patented was the idea that instead of having the
hardware deal with unaligned bus accesses, they require software to
issue *two* instructions to do an unaligned access. One does the
"left part" and one does the "right part" of the word.

This must be a misstatement or it's a ridiculous patent.
How can a patent be issued for NOT doing something?
I can get a patent for an anchor that requires the
addition of propulsion and lifting surfaces so that
it can fly on its own?

There's also the "obvious to anyone experienced with
the technology" thing.

The normal MIPS load and store instructions require alignment just as
on most other RISC processors.

--

----------------------------------------------------------------------
Everett M. Greene (The Mojave Greene, crotalus scutulatus scutulatus)
Ridgecrest, Ca. 93555 Path: mojaveg@IWVISP.com

Murphy's law of aviation and large over-the-road vehicles:
Whichever direction you're going, it'll be into a stiff headwind.

Murphy's law of farming and earthmoving: Whichever direction
you're going, there's a tailwind of the same speed and direction
as your movement.

JJ · Apr 21, 2006

I don't buy this argument, at least some of the time.

I can think of atleast 1 application in mind which guarantees most
accesses not aligned and for which a SW aligned version would be ugly
either by doing the align in SW or by accessing sequential bytes.

Example parser performing lexing of string matches straight from the
lcc compiler by Hanson-Fraser..

In lcc each pattern match is something like this, mostly auto generated
for a given dictionary, so for matching "while"

if (cp[0]=="while"[0] &&
cp[1]=="while"[1] &&
cp[2]=="while"[2] &&
cp[3]=="while"[3] &&
cp[4]=="while"[4]) xxxx

This obviously requires 5 possible byte matches and 5 conditional
branches, and most matches will fail until the right production rule is
reached and all matches pass.

VC6 compiler produces good code without trying, lots of byte cmp and
bxx.pairs

In a rewrite I'd would (and do) use
if (same5(cp,"while")) xxxx

where same5 is an inlined match of 2 longs, at byte offset 0, then 1.
Now thats 2 long matches and 2 branches.

Now C might have a tiny dictionary of mostly small words but the
Verilog language has 200 plus words and many can be as long as 20chars
with many words giving false initial matches.

So there is an inline sameN as fn() from 1 to 20chars which takes upto
5matches ie 4x less work. Again VC6 produces good asm, nearly 4x less
than using byte serial checking.

Ofcourse I know this is a problem for RISCs that don't do nonaligned
accesses, but I really hate to see code 4x slower even if its probably
<1% of the total runtime. Now the cpu has to do a bit more work some
of the time as JM pointed out.

I few more mostly-unaligned kernals come to mind pretty quickly without
thinking too hard, compression etc.

I wonder if the RISC criteria for extreme simplicity need to be
reexamined when most RISCs are way more complex in other depts that are
far less visible to the programmer.

regards

johnjakson at usa dot com
transputer2 at yahoo dot com

Jim Granville · Apr 21, 2006

Brijesh wrote:

Symon wrote:

Hello Symon,

Brijesh,
Have you considered using the strobe signal at a latch enable rather
than a clock? Rise time is then unimportant. t\The IOBs' storage
elements can be latches, IIRC.

Its DDR scheme so simple Latch scheme wont work, as data is only stable
during the rising/falling edge of the strobe.

Even otherwise, as I mentioned there are multiple channels and all of
them work just fine, except this one. Thats led me to believe its not
design problem, but SI issue.

I'd look to see why is that channel slower ?
Slow edges also mean timing skew.
You could also deliberately slow a good one down, to see if that causes
similar errors, and slow the poor one a little more, to see if it
worsens.
-jg

John Mashey · Apr 21, 2006

All of this was fairly well-covered in a *1988* comp.arch thread called
"RISC data alignment", including the reasons why computer *vendors*
were forced to deal with these alignment issues, i.e., IBM (& then DEC
VAX) FORTRAN (interaction of EQUIVALENCE & COMMON, INTEGER*2, and
sometimes call-by-reference) .

Anton Ertl · Apr 21, 2006

Terje Mathisen <terje.mathisen@hda.hydro.com> writes:

I have never seen either the patent or the relevant MIPS asm code
generated, but it seems to me that the hw I'd want would look like this:

Well, what you describe is pretty much how Alpha does it. MIPS does
it differently.

a)

LoadAligned r1=[r0]

where any low-order bits in r0 would be ignored.

Alpha calls this instruction ldq_u (u for unaligned).

c) (The somewhat tricky one!)

ShiftToAlign r3=r1,r2,r0

Alpha uses three instructions for that. Two extq instructions for
shifting and masking r1 and r2, and an or instruction to combine the
results. Overall an unaligned load looks like this on Alpha:

lda at,0(t0)
ldq_u t9,0(at)
ldq_u t10,7(at)
extql t9,at,t9
extqh t10,at,t10
or t9,t10,t3

The lda (for computing the effective address) could be optimized away
in nearly all cases, but that effort was apparently not expended by
gas. It is interesting that the offset for the second ldq_u is 7, not
8 (and the extqh must match that). My guess is that this is done so
that you do not get an exception when you use this sequence for
loading the last word of a page with an aligned address.

Hmm, this requires two instructions, which are just used for this
purpose AFAIK: extqh and extql (ldq_u is also used for byte loads
etc. on the Alpha). How much longer would the sequence be if we
allowed only one 2-in-1-out special-purpose instruction, or none (but
slightly more general-purpose shift-and-mask-byte instructions)?

I can see how to do it with one less instruction with two
special-purpose instructions: extqh does not need to set the low-order
byte (this can be covered by extql in every case), so it could store
the low-order bits of the address there. Then extql could be modified
to take the result of extqh instead of the address, and perform the
merge. The sequence would look like:

lda at,0(t0) #can be optimized away
ldq_u t9,0(at)
ldq_u t10,7(at)
extqhx t10,at,t10
extqlor t9,t10,t3

This probably would have required additional muxes in the data path,
though.

Followups set to comp.arch.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Torben Ćgidius Mogensen · Apr 21, 2006

"John Mashey" <old_systems_guy@yahoo.com> writes:

The *point* of the patent was that if you have a straightforward RISC
pipeline that supports caches and paged virtual memory, then requiring
hardware to do all the work of handling arbitrarily-aligned data [i.e.,
crossing cache-line or worse, page boundaries] adds *greatly* to the
implementation complexity, and one doesn't want to do this. [The
implementation penalty for some microcoded CISCs cn be much less.]

The MIPS solution was some much simpler hardware (very minimal
additions, and nothing tricky] beyond what was there, that allowed
compilers to generate code to deal with unaligned accesses that
sometimes came up from legacy code without burdening the base hardware
design.

ARM has an "interesting" way of handling unaligned word addresses: If
the address is not word aligned, it uses the rounded-down address to
load a word but then rotates the word such that the byte at the
unaligned address is the LSB of the resulting word (this is for
little-endian mode). The behaviour is probably a side-effect of the
byte load instruction (which ANDs with 0xFF after the rotate). I once
wrote a fast string copier that exploited this behaviour, but I don't
think it makes unaligned word access any faster.

Torben

Brijesh · Apr 21, 2006

Peter,

I understand that DDR does not invalidate your response. Just mentioned
it clear up things.

I just wanted to know if I was heading in the right direction by
suspecting that slower rise time may be causing the problem.

Since I didnt know how slow is too slow, hence the question.
So now I know there is no fundamental limiting factor on how slow a edge
can be on V2. I will concentrate my efforts on identifying if the strobe
line is being corrupted by noise or cross talk.

I was hoping there was some rule of thumb from experience that edges
slower than X ns/V is inviting trouble.

Jim Granville,

Thanks for the response. Yess, I am going slow the edge on other
channels and also on this channel and see if the errors occurs more
frequently.

Thanks
Brijesh

Peter Alfke wrote:

There is no fundamental limit. A flip-flop will be clocked, even if the
clock takes seconds or minutes to rise. But the longer the transition
time, the bigger the chance of picking up noise, and creating a
double-pulse. And the fact that you use DDR does not invalidate my
prvious response. Noise then has a chance to disturb either edge or
both.
Peter Alfke
============
Brijesh wrote:

Peter Alfke wrote:

If you use STROBE as a rising-edge clock input, then excessive

noise

might be superimposed on its falling edge, such that the falling

edge

actually contains a rising edge clock, which would give you weird
timing.
Just a wild guess...
Peter Alfke

Hello Peter,

Its DDR scheme. Data is clocked in both on the rising edge and

falling edge.

My main question is still how slow is too slow?

Thanks

Brijesh

EDK : FSL macros defined by Xilinx are wrong

John Adair

Guest

Jon Beniston

Guest

Nick Maclaren

Guest

Symon

Guest

Nick

Guest

Guest

Brijesh

Guest

John Mashey

Guest

Brijesh

Guest

Peter Alfke

Guest

John Mashey

Guest

Nick Maclaren

Guest

Terje Mathisen

Guest

Everett M. Greene

Guest

JJ

Guest

Jim Granville

Guest

John Mashey

Guest

Anton Ertl

Guest

Torben Ćgidius Mogensen

Guest

Brijesh

Guest

Log in

Welcome to EDABoard.com

Sponsor