EDK : FSL macros defined by Xilinx are wrong

Symon · Dec 15, 2006

"Austin Lesea" <austin@xilinx.com> wrote in message
news:eluk9r$fjj2@cnn.xsj.xilinx.com...

All,

I have previously posted on the differential input circuit that we
(Xilinx) use.

I will repeat what I have said before: the differential input circuit
is a full CMOS differential comparator. It will operate (function) from
rail to rail on its inputs. Its performance has only been characterized
for LVDS, and low voltage LVPECL common mode voltages and swings.

Now for the new part: there are no configuration bits to select
anything. The comparator is the comparator, and it is the same circuit
regardless of standard selected. If it is differential, it is the same
circuit.

Hope this helps,

Austin

FWIW, I think the OP is talking about the MGT clock inputs. But the same
applies, I guess.
Cheers, Syms.

Austin Lesea · Dec 15, 2006

Symon,

Thanks for pointing this out -- I was answering the wrong question!

Unfortunately, no, my previous post does not apply to the dedicated
clock inputs for the MGTs.

Those differential comparators are designed to their own specific
requirements. However, I do believe they operate for LVDS or low
voltage PECL (2.5V). I would have to go read the data sheet, and also
the MGT user's guide.

Austin

Symon · Dec 18, 2006

"Brian Davis" <brimdavis@aol.com> wrote in message
news:1166385670.603702.192040@73g2000cwn.googlegroups.com...

Austin wrote:

The specification is min 3pF; max 10 pF for 3E.

Try looking at the IBIS files, which show a total max Cin
( C_pkg plus C_comp ) of a mere 2.4 pF for a VQ100.

Measured reflections off a fast LVDS driver in the lab
also agree with a Cin value in that range.

Which makes no distinction between input only pins, I/O pins,
global clock input pins, left/right side clock pins, dual mode
config pins, Vref pins, dedicated config pins, etc

Brian

Hi Brian,

Always good to read your posts, I hope all's well with you and yours.

So, it was particularly interesting to read about your measurements on S3e.
Also, Austin, your comments re. the pin capacitance is affected a lot by the
presense of the high power drivers, gives me hope that the S3e 'dedicated
inputs', i.e. input only, IOBs may have especially low Cpin. Brian, I don't
suppose that was part of your measuring exercise when you looked at the
LVDS? I looked at the IBIS files (ver2.1) and didn't find the input only
pins.

Thanks, Syms.

Finally, it was interesting glancing through the IBIS files. For anyone who
still believes the various myths surrounding bypass caps (see CAF passim),
the lead inductance of the various packages makes enlightening reading. I
guess the power leads have comparable inductance to the signal pin leads, so
don't forget to include these data in your 'resonant bypass array'
calculations!

Oh, and if anyone is planning on using a PQ208, make sure
that any cost savings you make in assembly is spent on an SI simulator...

Tim · Dec 18, 2006

Symon wrote:

Finally, it was interesting glancing through the IBIS files. For anyone who
still believes the various myths surrounding bypass caps (see CAF passim),
the lead inductance of the various packages makes enlightening reading. I
guess the power leads have comparable inductance to the signal pin leads, so
don't forget to include these data in your 'resonant bypass array'
calculations! Oh, and if anyone is planning on using a PQ208, make sure
that any cost savings you make in assembly is spent on an SI simulator...

The capacitors manufacturers have impressive figures for the parasitic
inductance improvement (i.e. reduction) you can get by moving to reverse
geometry capacitors and to multi-pad interdigitated capacitors. The
numbers seem to be approx 600pH for 0805, 300pH for 0508, 100pH for
eight pad 0508.

We use reverse geometry caps and we were thinking of trying the
interdigitated parts. Does anyone have any warnings?

And if I could go back to the ever-popular discussion of optimal board
layout... Presumably the best strategy for for the power planes is to
have one of them reachable via an in-pad microvia from the component
layer. Should we also put voids in this power plane so that a second
microvia can punch down to a second power plane?

Or even interdigitate the power plane to get power and ground
microvia-reachable from the component?

Tim

Austin Lesea · Dec 18, 2006

Brian,

OK, now I see it. Spartan 3/3E/3A dropped HSTL IV, which is a 48 mA
drive strength standard. III is 24 mA. That means the area required is
cut in half, so the C could be half that of the Virtex series IOB's
(that have HSTL IV).

I wonder how many people need IV?

Austin

Austin wrote:

Brian,

That would make sense (dropping a low voltage strong standard reduces
pin C). Dropping DCI would hardly save anything except area, however.

I will take it that you did not file a webcase. With a webcase number,
it could be tracked and solved, or escalated (and solved). Or even
reported to myself or Peter (as some have done).

Now that Spartan has "grown up", they are making their own decisions as
to what to spend their silicon on, so I have lost track of some of their
features. I should not have replied for them, as I really don't know
what they are doing (without asking).

Austin

Brian Davis wrote:

Austin wrote:

From what I know about the design, the only way to reduce
the C is to leave out the LVDS output driver (0.5pF less), or
make the IO drive strength smaller. <snip
It may be that 3E drops some of these (I will check).

The simplified S3E I/O drops DCI and some of the higher
drive standards vs. S3, and adds DIFF_TERM support.

Which makes no distinction between input only pins, I/O pins,
global clock input pins, left/right side clock pins, dual mode
config pins, Vref pins, dedicated config pins, etc.

The Vref pins do not have any more load, until they are programmed
to be a Vref, so we do not specify their C as a Vref <snip

Sometimes a missing specification is just something not needed.

Some inputs, such as global clocks or the MGT clock inputs of this
thread, are likely to differ from general I/O, but are not documented
as such in that single datasheet spec, nor in the IBIS files.

Perhaps I should file a WebCase asking Xilinx to update
their datasheet to match their IBIS files

But, if you filed it as a case, you should have received a reply as to
why the specification did not require an update. Case #?

If the "perhaps" and the smiley didn't give it away,
that was intended as a humorous observation for those
of us who have filed WebCases.

Brian

Symon · Dec 19, 2006

"Austin Lesea" <austin@xilinx.com> wrote in message
news:em6o57$gg73@cnn.xsj.xilinx.com...

Brian,

OK, now I see it. Spartan 3/3E/3A dropped HSTL IV, which is a 48 mA
drive strength standard. III is 24 mA. That means the area required is
cut in half, so the C could be half that of the Virtex series IOB's
(that have HSTL IV).

I wonder how many people need IV?

I would guess fewer than the number who would like the I/O performance

doubled?

Congrats to the Spartan group!

There should be some way to disconnect unwanted output standards before
assembly. Some system with fuses integrated on the FPGA in series with the
FETs , a test fixture, a car battery and a big red button marked "Cpin
enhance".

Cheers, Syms.

p.s. Like this (Fixed font!):-

Fusable FET Regular FET
\
| |
|- |-
----| ---|
|- |-
| |
| |
-------fuse-----
| |
| |
Vfuse--|>|--- -----I/O Pad
\
Diode

Vfuse is an external pin grounded during normal operation.

Symon · Dec 19, 2006

"Brian Davis" <brimdavis@aol.com> wrote in message
news:1166502377.044055.250990@a3g2000cwd.googlegroups.com...

I also haven't spotted any detail on those, or any limits for the
S3E diff_term range.

BTW, if you check the S3A datasheet, they provide better numbers
for the DIFF_TERM range and spec them for usage at 3.3V VCCIO;
they've also gone to a top/bottom left/right split between the
"goes to eleven" drivers and the lightweight S3E versions.

Brian

That's a bit more like it for documentation.

Thanks for the pointer! Syms.

Peter Alfke · Jan 18, 2007

It's very simple:
Take two Logic Cells (each a LUT and a Flip-flop) and feed both Q
outputs to the inputs of both LUTs.
Imagine any one (of many possible) sequence on the two Q outputs that
repeats after 3 states.
Then implement the required logic in each LUT.
I don't want to make it too trivially simple for you. A little thinking
strengthens the brain.
Peter Alfke

On Jan 17, 9:20 pm, "K. Sudheer Kumar" <ksudheerku...@gmail.com>
wrote:

Hi,

I need to generate a 70MHz clock from 210MHz. Is there any way to
generate it rather than using a DCM.

Thanks,

Sudheer

sudheer · Jan 18, 2007

Hi Peter,

Thanks for your suggestion. I would appreciate your providing me a copy
of your article "Unusual Clock Dividers".

Sudheer.

Peter Alfke wrote:

It's very simple:
Take two Logic Cells (each a LUT and a Flip-flop) and feed both Q
outputs to the inputs of both LUTs.
Imagine any one (of many possible) sequence on the two Q outputs that
repeats after 3 states.
Then implement the required logic in each LUT.
I don't want to make it too trivially simple for you. A little thinking
strengthens the brain.
Peter Alfke

On Jan 17, 9:20 pm, "K. Sudheer Kumar" <ksudheerku...@gmail.com
wrote:
Hi,

I need to generate a 70MHz clock from 210MHz. Is there any way to
generate it rather than using a DCM.

Thanks,

Sudheer

Antti · Jan 18, 2007

sudheer schrieb:

Hi Peter,

Thanks for your suggestion. I would appreciate your providing me a copy
of your article "Unusual Clock Dividers".

Sudheer.
dear Sudheer,

isnt goodle your friend too?

http://www.nalanda.nitc.ac.in/industry/appnotes/xilinx/documents/xcell/xl33/xl33_30.pdf

Antti

Symon · Jan 18, 2007

"gallen" <arlencox@gmail.com> wrote in message
news:1169101376.940807.115180@v45g2000cwv.googlegroups.com...

You could probably find it through some googling, but this brings up
another point: Why would Xilinx remove it's archives? It's not like
the material was dated.

Stop whining and start searching!

http://web.archive.org/web/20050404010919/www.xilinx.com/xcell/xl33/xl33_30.pdf

HTH, Syms.

Jan 18, 2007

Hi,
I studied this article. It is very interesting, and the resources
consumption is very low.
For a general purpose, I think Anydivider can help.
In this case, just enter "3", and then get the verilog code and the
waveform.
For more features, please visit
http://www.topweaver.com/doc/tad/tad.htm
Download http://www.topweaver.com/download.htm

TAD

"Antti Đ´ľŔŁş
"

sudheer schrieb:

Hi Peter,

Thanks for your suggestion. I would appreciate your providing me a copy
of your article "Unusual Clock Dividers".

Sudheer.
dear Sudheer,

isnt goodle your friend too?

http://www.nalanda.nitc.ac.in/industry/appnotes/xilinx/documents/xcell/xl33/xl33_30.pdf

Antti

Peter Alfke · Jan 18, 2007

I see all these references to my old article in XCell magazine, and I
enjoy the positive comments.
But: In almost all cases, there is no need for 50% duty cycle. The
natural 33/66% duty cycle of a simple divide-by-three circuit is
acceptable, especially at such low frequencies as 70 MHz.

Here is one of the simplest implementations:
Two flip-flops QA and QB, QA feeds the D-input of QB (shift register)
The NOR of QA and QB feeds the D input of QA.
This circuit also recovers from the illegal state of both QA and QB
being High.
Peter Alfke

On Jan 18, 9:26 am, "visiblepulse" <t...@visiblepulse.com> wrote:

module clock_div3
(
clock_in,
clock_out
);

input clock_in;
output clock_out;

reg clock_out;
reg [2:1] d_pos;
reg [2:1] d_neg;

always @ (posedge clock_in)
case (d_pos)
2'b00: d_pos[2:1] <= 2'b01;
2'b01: d_pos[2:1] <= 2'b11;
default: d_pos[2:1] <= 2'b00;
endcase

always @ (negedge clock_in)
case (d_neg)
2'b00: if (d_pos[1]) d_neg[2:1] <= 2'b01;
2'b01: d_neg[2:1] <= 2'b10;
default: d_neg[2:1] <= 2'b00;
endcase

always @ (posedge clock_in or posedge (d_neg[1] & !clock_in))
if (d_neg[1] & !clock_in)
clock_out <= 1'b0;
else
if (!d_pos[1]) clock_out <= 1'b1;

endmodule

sudheer · Jan 19, 2007

Thanks to alll for sending their comments/Useful Links.

I'm getting worried about the phase misalignment of divided clock w.r.t
the source clock because of combinational logic associated with the
output.

I would like to welcome your suggestions/comments on this

Note: Duty cycle with 33%, 66% will not be worked out in our case, so
badly I'm in need of clock with 50%dutycycle.

Thanks a lot once again,
Sudheer.

Symon · Jan 24, 2007

Google Subversion. And if you use Windows, Google Tortoise.
HTH, Syms.
p.s. This chap recommends it on his blog.
http://www.cambriandesign.com/

Guenter · Jan 24, 2007

On Jan 24, 1:12 am, "Symon" <symon_bre...@hotmail.com> wrote:

Google Subversion. And if you use Windows, Google Tortoise.
HTH, Syms.
p.s. This chap recommends it on his blog.http://www.cambriandesign.com/

I nice pair with subversion is trac:

http://trac.edgewall.org/

It provides a web view interface to the subversion repository with a
ticket system and wiki.

The trac page itself is using trac, so you can get from it a feeling
what you get. Using the "Timeline" for example will show you what
commits has been done to the repository or tickets beeing created or
finished.

"Browse Source" e.g. allows to browse the repository via the web
interface.

Cheers,

Guenter

Sylvain Munaut · Jan 25, 2007

Well, there won't be a schema that fits every possible transform ...
(if there was that would mean the SDRAM would be as flexible as SRAM
....)

Can't you narrow a little the type of access you want to do ?

Gabor · Jan 25, 2007

I've done something similar in the past. In my project I was
doing small-angle rotation, so I knew ahead of time the maximum
line-to-line skew of pixels that became vertical in the output
image, and it was small (like 1). When I started the project,
however I had the idea that the best way to accomplish the
general case of rotation is to make a cache memory in
the FPGA. The parts I was using at the time (XCV50's)
were a bit small to implement a decent cache, but I would
think newer parts could do this quite handily.

Also important are using the minimum burst size in the
SDRAM to reduce you cache-line access time.

HTH,
Gabor

On Jan 24, 2:36 pm, "wallge" <wal...@gmail.com> wrote:

I am doing some embedded video processing, where I store an incoming
frame of video, then based on some calculations in another part of the
system, I warp that buffered frame of video. Now when the frame goes
into the buffer
(an off-FPGA SDRAM chip), it is simply written in one pixel at a time
in row major ordering.

The problem with this is that I will not be accessing it in this way. I
may want to do some arbitrary image rotation. This means
the first pixel I want to access is not the first one I put in the
buffer, It might actually be the last one in the buffer. If I am doing
full page reads, or even burst reads, I will get a bunch of pixels that
I will not need to determine the output pixel value. If i just do
single reads, this waists a bunch of clock cycles setting up the SDRAM,
telling it which row to activate and which column to read from. After
the read is done, you then have to issue the precharge command to close
the row. There is a high degree of inefficiency to this. It takes 5,
maybe 10 clock cycles just to retrieve one
pixel value.

Does anyone know a good way to organize a frame buffer to be more
friendly (and more optimal) to nonsequential access (like the kind we
might need if we wanted to warp the input image via some
linear/nonlinear transformation)?

jbnote · Jan 25, 2007

Hello,

It all depends on your needs, of course, but block-style ordering can
help
a bit to relieve the problem by breaking the 1D-orientedness of
raster-scan sort.

For instance, you can pack pixel by 16, which will represent a 4x4
square in your image. When
retrieving the data, you get data from both dimentions, which will have
much better spatial locality
than a line of 16 pixels. This may help you quite a bit.

Peano-style or quadtree-style walking of the image could also be
investigated,
but my memories from it is that it's quite a bit more complicated...

JB

Pete Fraser · Jan 25, 2007

"Gabor" <gabor@alacron.com> wrote in message
news:1169736163.476029.150290@l53g2000cwa.googlegroups.com...

I've done something similar in the past. In my project I was
doing small-angle rotation, so I knew ahead of time the maximum
line-to-line skew of pixels that became vertical in the output
image, and it was small (like 1). When I started the project,
however I had the idea that the best way to accomplish the
general case of rotation is to make a cache memory in
the FPGA. The parts I was using at the time (XCV50's)
were a bit small to implement a decent cache, but I would
think newer parts could do this quite handily.

Another option (depending on your mapping) would be to do
it in two passes. There's a transpose in the middle, so
it would probably be best to do it in small sections to an on-chip
transpose buffer, before writing it out to the intermediate store.

Have you thought about what order of filtering you need?

Check out Digital Image Warping by Wolberg, or one of
Alvy Ray Smith's scan line ordering papers.

EDK : FSL macros defined by Xilinx are wrong

Symon

Guest

Austin Lesea

Guest

Symon

Guest

Tim

Guest

Austin Lesea

Guest

Symon

Guest

Symon

Guest

Peter Alfke

Guest

sudheer

Guest

Antti

Guest

Symon

Guest

Guest

Peter Alfke

Guest

sudheer

Guest

Symon

Guest

Guenter

Guest

Sylvain Munaut

Guest

Gabor

Guest

jbnote

Guest

Pete Fraser

Guest

Log in

Welcome to EDABoard.com

Sponsor