G
gliss
Guest
ModelSim is the best simulator you can buy; it's the industr
standard
Unfortunately it's very expensive
standard
Unfortunately it's very expensive
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
You're putting the wishbone interface externally ? To connect it to what ?Hello,
Just wondering if anyone can let me know if I'm going about this the
right way -
I'm trying to implement the opencores Ethernet MAC on a Xilinx FPGA,
but the board I have has too few I/Os. So, I want to reduce the width
of the 32bit data inputs and outputs in the wishbone interface to
accommodate (I'm about 70 IOBs short). It seems like this should be
feasible since the data is delivered to the interface in 1 byte chunks.
If I work around the byte counter and write the data to the FIFO right
away without assembling the bytes into 32bits...
I'm not so keen with verilog, so feel free to let me know if I'm going
about this all wrong, or if I'm forgetting something. Or you can tell
me I'm smoking something and that I should just not be cheap and get a
board that has enough IOs.
thanks!
pei
I can follow <> green, but <> blue seems to extend your rule ?Thanks Alvin,
The problem is to assign one of 5 colours for each of the integers from
1 upwards such that if any two integers have the same colour the
integer that they sum to must have a different colour:
eg.
If we have
1-green 2-blue 3-green
then 4 cannot be green or blue as 1+3 = 4 and 2+2 = 4.
So that's a string of ~160 characters, where each character can be oneA correct sequence of 160 digits for 5 colours is known. I wish to find
a sequence of 162 digits.
This reads a little like sorting primes.I'm doing an exhaustive search on a severely restricted subset of all
the possible sequences. The sequence is built up one colour at a time
until you get to the point where you know that somewhere down the track
there will be no possible colour, then you go back one in the sequence
and try a different colour etc.
This is quite easy to split up each chip at a time should only increase
the sequence by a few digits (due to memory constraints) and report
each possible final sequence back to the coordinating chip which dishes
each of these out to other chips when they request more work. Of course
there needs to be a prioritisation of sequences such that the sequence
queue doesn't get too big (ie. always dish out the longest sequences
which will be exhaustively searched and therefore removed quickest).
All of this stuff isn't too hard.
To the points you made,
1) efficient communication. Each chip needs to get a sequence per unit
of work which is no biggy, but it will need to report back each
sequence it ends up at..... This could in fact be quite a few - maximum
5^(#of digits sequence was increased by) but usually a lot less. For
each of these it needs to return (sequence length)/2 bytes. I think I
will need to consider this point in some more detail...
2) fault tolerance. I wish to find a single correct sequence and
believe (hope!) that there are many of these (expected running time is
time to first sequence not completion time for exhaustive search which
would in fact take 100s of years!!!), whilst missing one sequence due
to a faulty device/communications would be very bad this wouldn't be
disastrous. I'm not trying to say that no sequence of 162 digits
exists.
Look at Altera, Lattice, Xilinx - there are many demo/eval boards andI don't know anything about FPGAs or how these would apply, do you
happen to have some useful links?
The algorithm is always where the biggest speed gains can be made,I've got a pure math problem implemented in C that will take about 3
years to solve using all 5pcs available to me (the algorithm is about
as efficient as it will get without some major mathematical insights).
Wow! They pay you so much you have to worry about overflow ofHi, Jim..
We stopped after a week because we were satisfied. In one week, we
proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to
prove 10e16. Diminishing returns...But we definitely did NOT stop
because we found an error. No cheating on my watch!
For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).
Hi,kha59@student.canterbury.ac.nz wrote:
Thanks Alvin,
The problem is to assign one of 5 colours for each of the integers from
1 upwards such that if any two integers have the same colour the
integer that they sum to must have a different colour:
eg.
If we have
1-green 2-blue 3-green
then 4 cannot be green or blue as 1+3 = 4 and 2+2 = 4.
I can follow <> green, but <> blue seems to extend your rule ?
A correct sequence of 160 digits for 5 colours is known. I wish to find
a sequence of 162 digits.
So that's a string of ~160 characters, where each character can be one
of 5 values ?
I'm doing an exhaustive search on a severely restricted subset of all
the possible sequences. The sequence is built up one colour at a time
until you get to the point where you know that somewhere down the track
there will be no possible colour, then you go back one in the sequence
and try a different colour etc.
This is quite easy to split up each chip at a time should only increase
the sequence by a few digits (due to memory constraints) and report
each possible final sequence back to the coordinating chip which dishes
each of these out to other chips when they request more work. Of course
there needs to be a prioritisation of sequences such that the sequence
queue doesn't get too big (ie. always dish out the longest sequences
which will be exhaustively searched and therefore removed quickest).
All of this stuff isn't too hard.
To the points you made,
1) efficient communication. Each chip needs to get a sequence per unit
of work which is no biggy, but it will need to report back each
sequence it ends up at..... This could in fact be quite a few - maximum
5^(#of digits sequence was increased by) but usually a lot less. For
each of these it needs to return (sequence length)/2 bytes. I think I
will need to consider this point in some more detail...
2) fault tolerance. I wish to find a single correct sequence and
believe (hope!) that there are many of these (expected running time is
time to first sequence not completion time for exhaustive search which
would in fact take 100s of years!!!), whilst missing one sequence due
to a faulty device/communications would be very bad this wouldn't be
disastrous. I'm not trying to say that no sequence of 162 digits
exists.
This reads a little like sorting primes.
The data set would certainly fit into a (very) small microcontroller
= you can even pack into nibbles, and consume just 80 bytes, but
the problem with many small uC will be ensuring there are no overlaps,
or holes, in their scan coverage.
ie the task is simple enough, but multi-uC management is likely to
be a nightmare.
Something like i2c for the backplane is also likely to be a serious
bottleneck.
I don't know anything about FPGAs or how these would apply, do you
happen to have some useful links?
Look at Altera, Lattice, Xilinx - there are many demo/eval boards and
tool sets.
Also look at the Soft CPUs : Xilinx PicoBlaze, and Lattice Mico8
FPGAs can do hugely parallel tasks, and on a small data set like this,
you have no memory bandswidth issues.
With a FPGA, you could do exclusion mapping - that is, do not store the
Colour@integer, but instead have an array of N x 5 booleans, which are
excluded colours. [ALL 5 => Whoops, go back! ]
An FPGA could scan for all ahead exclusions, very efficently indeed.
One of the small soft CPUs could manage the re-seed process.
paste
I've got a pure math problem implemented in C that will take about 3
years to solve using all 5pcs available to me (the algorithm is about
as efficient as it will get without some major mathematical insights).
The algorithm is always where the biggest speed gains can be made,
especially in efficently mapping the algorithm to the hardware it runs
on.
In a FPGA you could set up 'algorithm races', where (eg) you code
4 algorithms in ~1/4 of the chip each, and run it for a couple of days,
and compare their Attained String Lengths.
If the present best is a langth of 160, don't just think about 162, look
to smash it !
I've added comp.arch.fpga, as this really sounds more like a FPGA+smart
algorithm, than a "sea of uC" problem.
-jg
Sorry, cryptic mode... WDOG = WatchDog = monostable timer, thatJim Granville wrote:
Peter's stuff snipped
Interesting - so for sustained thru-put on these, you are best to avoid
going empty, which probably means two operating modes : fastest, and
clean-out-the-last-byte(s)
I see some uarts have WDOGs in their fifos, that allow simpler streaming
code, and they generate a time-content interrupt, as well as the normal
threshold one.
Timeout is normally some multiple CHAR times, so the end of message
chars are dealt to without needing polling.
-jg
We upgraded a Zilog Z8530 serial port to a Z85230 serial port, because
the Z85230 has deeper FIFOs. The 85230's recv port has an 8-byte FIFO
vs the 8530's 3-byte FIFO. I had hoped that the CPU would be
interrupted a lot less using the 85230. The 85230 can be set-up to
interrupt when 1 char is recv'd or when there are 4 bytes in the recv
FIFO (half full). This sounded really great and even worked quite
well, until I discovered that when there _3_ bytes in the FIFO and the
chip is set-up to interrupt when half-full, that the chip does not
interrupt until another byte is received, even if that's minutes later.
ARGH, WIPA!
BTW what's a "WDOG"?
Jeorg's question on sci.electronics.design for an under $2 DSP chip got
me to thinking:
How are 1-cycle multipliers implemented in silicon? My understanding is
that when you go buy a DSP chip a good part of the real estate is taken
up by the multiplier, and this is a good part of the reason that DSPs
cost so much. I can't see it being a big gawdaful batch of
combinatorial logic that has the multiply rippling through 16 32-bit
adders, so I assume there's a big table look up involved, but that's as
far as my knowledge extends.
Yet the reason that you go shell out all the $$ for a DSP chip is to get
a 1-cycle MAC that you have to bury in a few (or several) tens of cycles
worth of housekeeping code to set up the pointers, counters, modes &c --
so you never get to multiply numbers in one cycle, really.
How much less silicon would you use if an n-bit multiplier were
implemented as an n-stage pipelined device? If I wanted to implement a
128-tap FIR filter and could live with 160 ticks instead of 140 would
the chip be much smaller?
Or is the space consumed by the separate data spaces and buses needed to
move all the data to and from the MAC? If you pipelined the multiplier
_and_ made it a two- or three- cycle MAC (to allow time to shove data
around) could you reduce the chip cost much? Would the amount of area
savings you get allow you to push the clock up enough to still do audio
applications for less money?
Obviously any answers will be useless unless somebody wants to run out
and start a chip company, but I'm still curious about it.
So you're saying it makes no difference if booth encoding is used, orBevan Weiss schrieb:
Kolja Sulimma wrote:
Bevan Weiss wrote:
Getting single cycle high speed multipliers is a very challenging
prospect, and one which much research is still ongoing.
Actually, if you cannot do full custom circuit optimizations
(e.g. because you do standard cell design or because you are using
LUTs in an FPGA) swapping wires is the only possible structural
optimization. All other multiplier transformations can be reduced to
swaps.
An extremely nice property of swapping wires is, that it can be done
after placement. This is such a huge advantage that we were able to beat
sophisticated multiplier generators with a simple greedy algorithm when
applying it after placement:
http://eis.eit.uni-kl.de/eis/research/publications/papers/iccd04.pdf
I was referring to custom design, not the use of standard cells or
FPGAs. It is certainly obvious that if you can't design your cells from
scratch then you're just arranging the cells that you have available.
What is that supposed to mean?
Even if your standard cell library consists of only a NAND-gate in one
size there are still many degrees of freedom in circuit design.
For many design problems there are architectures that trade off the
number of cells for power or speed.
Not so for single cycle multipliers. For any practice multiplier size
the number of 1-bit adders is fixed and there exists a complete set
of transformations to automatically reach all possible setups even after
placement.
You are right, my definition was not exact enough. What I wrote appliesNot so for single cycle multipliers. For any practice multiplier size
the number of 1-bit adders is fixed and there exists a complete set
of transformations to automatically reach all possible setups even after
placement.
So you're saying it makes no difference if booth encoding is used, or
any form of carry ripple reduction? That it's all just a rearranging of
wires? Surely not, using a booth encoder requires different components
to a simple ripple counter and so has broken that theory.
After all it is a LUT, so why not describe it as a LUT? List the outputI found that in Xilinx patents, all lookup table equations are
described in AND/OR/Multiplexer circuits in its claims. Describing a
logic connection for a lookup table in claims is much more complex in
English than presenting an equivalent logic equation.
For example, a lookup table has the equation:
Out <= (A*B) + (C*D);
It is much more concise and simpler than describing the circuit in
AND/OR gate circuits.
Hi,glen herrmannsfeldt wrote:
porterboy76@yahoo.com wrote:
I am looking for the homepage of Xilinx Research Labs, but Google is
not helping me. Does anybody know if they even have a homepage.
From what I know (i.e. academic perspective) Xilinx folks mostly focusI'd like to know what type of research they do at Xilinx, whether it is all
at the solid state and IC level, or whether they undertake higher level
algorithmic research as well.
Post to comp.arch.fpga and ask there.
-- glen
if you want to 'see' the technical then take a look at ML300 onjaxato@gmail.com> schrieb im Newsbeitrag
news:1133913172.512557.30990@o13g2000cwo.googlegroups.com...
Hi, I was just wondering some technicalities about a board.
Ive got the XUP virtex-II pro from digilent and I believe it was
designed by the XRL.
Well I was wondering what kind of CAD (schematic capture + pcb layout)
software does the team used. And also, out of curiosity, what PCB
manufacturer do they use and the number of layers involved. A friend of
mine thinks it's around 12. Which I doubt. So to make things clear, I
am asking the experts here.
Thanks for the tip, but im impressed by the job and I wanna know what
it takes to do that.
JA
My 2 cents,Ps we don't have an external web page, yet, but we are evaluating
options, so please send us suggestions if there's something specific
you'd like to see.
Stephen a écrit :
Ps we don't have an external web page, yet, but we are evaluating
options, so please send us suggestions if there's something specific
you'd like to see.
My 2 cents,
Maybe electronic versions of xilinx publications in academic
conferences, for those who are not registered at ACM/IEEE digital libraries.
Regards,
Steven
IEEE have no problem with their articles being made available providedThank you Steven,
I will make sure this happens, assuming we post an external web site
and that the papers aren't copyright of any ACM/IEEE conference. Good
suggestion for us to follow-up on.
Regards,
Stephen