need a cheap student edition FPGA

"bestyang" <bestyang@gmail.com> wrote in message
news:2fd96256.0505260627.3a00ded2@posting.google.com...
i use modelsim for windows-32 system
and i want to simulate for a time least 10000000000
but the modelsim has the following error:10000000000 exceeds 32-bit
capacity
how can i solve the problem
According to Modelsim manual, the time is represented as 64-bit integer
integer. So if you run simulation for 1,000,000,000 unit (whatever your time
resolution) for 10 times, you will get 10,000,000,000 unit of simulation
time. I never tried it, but it should work if the manual is right.

HTH,
Jim
 
Jim Wu wrote:
"bestyang" <bestyang@gmail.com> wrote in message
news:2fd96256.0505260627.3a00ded2@posting.google.com...

i use modelsim for windows-32 system
and i want to simulate for a time least 10000000000
but the modelsim has the following error:10000000000 exceeds 32-bit

capacity

how can i solve the problem


According to Modelsim manual, the time is represented as 64-bit integer
integer. So if you run simulation for 1,000,000,000 unit (whatever your time
resolution) for 10 times, you will get 10,000,000,000 unit of simulation
time. I never tried it, but it should work if the manual is right.

HTH,
Jim
try "run 10 ms"
 
I think you are probably over analyzing the problem. Most synchronous
circuits (or at least sub-circuits) use only a single clock for all flops in
the circuit. On the rising edge of CLK, all the flip flops update; they take
the values on their D inputs and place them on the Q outputs. Then the
signals have a full clock period to propagate through the combinational
logic connected to the Q outputs (and the primary inputs of the circuit),
before showing up on the D inputs prior to the next clock.

While you cannot ignore issues resulting from hold times, these are, for the
most part, taken care of by the library vendor and/or the tools used for
synthesis. Most ASIC libraries (and FPGA flip flops) have very small hold
time requirements - some/most are even negative. Furthermore the CLK->Q
propagation time for most flip-flops are usually larger than the hold time
requirements of flip flops. So, assuming you have a "reasonable" clock (one
that doesn't have excessive skew), the CLK->Q time is longer than the worst
case clock skew plus the worst case hold time requirement. In this case, it
is impossible to have a hold time violation. In the cases where this isn't
true, it is still quite rare to have a hold time problem; combinational
logic and even signal routing will all serve to "delay" the change in data
that occurs on the Q of a flip-flop from proagating to the next flip-flop so
that it doesn't violate the hold time requirement.

Finally, synthesis tools understand hold time requirements, and can be asked
to "fix" hold time violations, which they will do by adding delay on paths
from Q->D that would otherwise violate a hold time requirement.

Because of all these reasons, hold time issues can be dealt with without
requiring two different clocks (or any other heroic measures); for your
counter, you should be able to simply code it as

always @(posedge clk)
begin
if (!rst_n)
begin
count <= 0;
else
begin
if (count == MAX_VAL)
count <= 0;
else
count <= count + 1'b1;
end
end

Avrum


"Chloe" <chloe_music2003@yahoo.co.uk> wrote in message
news:1117152046.680027.140360@o13g2000cwo.googlegroups.com...
The problem is, I'm only using one clock, which frequency cannot be
doubled. The limitations of the design is such that I can only use one
clock, and this clock drives a counter at the same speed, ie the each
count is one clock period. This will still pass functionally, but I
worry about setup and hold violations later during synthesis.

I tried using two flip flops - one driving at posedge, and the other at
negedge of clk.

Any suggestions?
 
john.deepu@gmail.com wrote:


Can anyone give me some clues regarding how to implement cuberoot
function in RTL for synthesis.
There are square root and cube root algorithms for an abacus.
I found one in a little book that came with a very cheap abacus.

While it is in decimal, I believe a binary version of the algorithm
would also work. I don't think it will be too hard to find.

--glen
 
(comp.lang.verilog added)

Chris F Clark wrote:

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

It seems that verilog is one language that doesn't require a variable
to be declared before it is used.

The actual rules about when variables must be declared are quite
convoluted in Verilog. Some statements implicitly cause declarations
of the referenced variables if they weren't already declared, others
don't. However, if a variable is declared, it must be declared
"lexically" before it is used (that is the declaration must appear in
the source text before the use).
As I understand it, they don't have to be declared lexically before
use.

Because statements are not executed in order, but more or less in
parallel, it is easy to accidentally declare a variable lexically
later.

I often move code around to make it more readable, and sometimes don't
move the declaration.

This will all be a little confusing to comp.compilers readers, but
there are two important kinds of variables in verilog, and the
compiler needs to know which is which. One kind, reg variables, keep
there value when they are not being assigned, as do variables in most
programming languages. The other kind, wire variables, are used with
a continuous assignment statement, and so keep their value due to the
continuous assignment. If I say:

wire c;
assign c=!(a & b);

the logic is that of a 7400 style NAND gate. c changes soon after a
or b, just like a real NAND gate would work. The position of the
assign statement in the program (assuming a legal position) has no
effect on the logic.

It is supposed to work if the wire c; statement comes after the assign
statement.

-- glen
 
Did you use the "Pipelined divider" with a setting of 1 result every
clock? Every 8 clocks? I saw two versions of Pilelined Divider for the
Core Generator when I did a quick search on the Xilinx website. There
should be full data sheets there for referencing details.

FPGAs do a decent job with carry chains. Since that structure is used
often enough, it's pretty decent in speed. A test I ran a while back
for my own implementation suggested I could get a 16/16 divider running
with about 80 ns in one clock cycle without any constraints in a
Spartan-2E device if I recall correctly. A faster speed grade Virtex-2
may perform much better.

You *are* dividing by a variable, not a constant, right?


john.deepu@gmail.com wrote:
Hi all,
I wanted to use a 32/16 divider circuit in one of my designs. I found
Synopsys designware provides Pipelined dividers and decided to use it.
I synthesised DW-divider and found a 3-stage pipeline required to meet
my timing requirement of 20MHz(50ns) in TSMC .13u technology.

Since I wanted to FPGA prototyping for my asic, I thought of using Core
generator divider while synthesising for Xilnx FPGA..

Now the Interesting fact I found is, a 32/16 divider from Xlinx core
genrator can be synthesised(using XST synthesis)to 150Mhz easily for a
Virtex-2 (Xc2v2000)FPGA with just one stage pipeline..

At the same time DC-ultra 2004.06-1 is struggling with Designware
foundation divider for meeting a timing of 20MHz with 3 stage
pipeline....

I am confused.......... I always thought ASIC synthesis gives more
frequency for an RTL code...

What I can assume is SYNOPSYS Designware divider is a very bad
implementation of divider...

Any comments/Clues are welcome..


Thanks
Deepu John
 
The question is how many values you want to sort... basically, this is
iterative algorithm, so for a large set, a big memory and controller
will be small but long.

as you posted on comp.arch.fpga, you might like this one:
http://www.xilinx.com/xcell/xl23/xl23_16.pdf

else googlize "rank order filter"

john.deepu@gmail.com wrote:
Hi all ,
I wanted to implement a fast and low area sorting algorithm in Verilog
RTL, does anyone have any suggestions?
Any links to IEEE papers, articles are higly welcome...

regards
Deepu John
 
<john.deepu@gmail.com> schrieb im Newsbeitrag
news:1118147185.854541.278530@g49g2000cwa.googlegroups.com...
Hi Stephane.
I wanted to sort 48 8bit unsigned numbers
Just some ideas. Hold the numbers in a BRAM. Create a small FSM to pull out
the numbers, store them in a temporary register and compare. Bubble sort is
usually the easiest but slowest approach in software. Maybe this is also
true for an FPGA. Misc sort is fastest, so it sould be for hardware too. You
could use both ports aof a BRAm to simultaneously access two pieces of data
to increase speed.

Regards
Falk
 
In article <3gm4uiFd4i81U2@individual.net> you wrote:

: <john.deepu@gmail.com> schrieb im Newsbeitrag
: news:1118147185.854541.278530@g49g2000cwa.googlegroups.com...
: > Hi Stephane.
: > I wanted to sort 48 8bit unsigned numbers

: Just some ideas. Hold the numbers in a BRAM. Create a small FSM to pull out
: the numbers, store them in a temporary register and compare. Bubble sort is
: usually the easiest but slowest approach in software. Maybe this is also
: true for an FPGA. Misc sort is fastest, so it sould be for hardware too. You
: could use both ports aof a BRAm to simultaneously access two pieces of data
: to increase speed.

Bubble sort should actually be quite fast - you can store all 48 values in
registers, then compare and swap if necessary odd pairs on odd clock cycles
and even pairs on even cycles. After 48 cycles the registers should hold
a sorted data set.

Probably this aproach is about as fast as you can get? The speedup comes
from the fact that many many bubble sort opperations can occour in
parallel. Is this the most hardware efficient sort that runs at this speed
though?

If possible data should be shifted sequentially into and out of (or at least
out of) the registers as a readout mux would be a resource hog. Using a
Generate statement in VHDL (or equiv in Verilog) I'm guessing the sort will come
to less than 100 lines of code.

cheers
cds
 
"c d saunter" <christopher.saunter@durham.ac.uk> schrieb im Newsbeitrag
news:d84pvd$5pu$1@heffalump.dur.ac.uk...

Bubble sort should actually be quite fast - you can store all 48 values in
registers, then compare and swap if necessary odd pairs on odd clock
cycles
and even pairs on even cycles. After 48 cycles the registers should hold
a sorted data set.
This sounds like a almost full parallel approach. Could be quite fast, but
also quite resouce hungry.

Probably this aproach is about as fast as you can get? The speedup comes
from the fact that many many bubble sort opperations can occour in
parallel. Is this the most hardware efficient sort that runs at this
speed
though?

If possible data should be shifted sequentially into and out of (or at
least
out of) the registers as a readout mux would be a resource hog. Using a
Thats why a BRAM is quite handy. Using both ports you can pull two datas out
in one cycle, compare, and write them back on a second cycle.

Regards
Falk
 
Falk Brunner (Falk.Brunner@gmx.de) wrote:
: "c d saunter" <christopher.saunter@durham.ac.uk> schrieb im Newsbeitrag
: news:d84pvd$5pu$1@heffalump.dur.ac.uk...

: > Bubble sort should actually be quite fast - you can store all 48 values in
: > registers, then compare and swap if necessary odd pairs on odd clock
: cycles
: > and even pairs on even cycles. After 48 cycles the registers should hold
: > a sorted data set.

: This sounds like a almost full parallel approach. Could be quite fast, but
: also quite resouce hungry.

Yup. Unless the OP posts some details about desired performance it's impossible
to know which one is best...

: Thats why a BRAM is quite handy. Using both ports you can pull two datas out
: in one cycle, compare, and write them back on a second cycle.

Indeed. There is a nice intermediate level of parallelism availible by using
LUT RAMs to build units 16 words deep, each of which sort sequentially, with
every other set of sorts crossing the LUT RAM boundaries. This would also be
the most complicated case to code :)

cheers
cds
 
Davy wrote:

Hi all,

How to write a barrel shifter in Verilog?

Any easy approaches will be appreciated!
Best regards,
Davy
See my posting on comp.arch.fpga !
 
john.deepu@gmail.com schrieb:
I wanted to sort 48 8bit unsigned numbers

As others suggested the big question is: How are you numbers presented?
If you have 48 bit serial inputs that present the data MSB first you can
build a systolic 48x6 array of units containing two LUTs and two
registers . As each cell connects only to three neighbours it can run at
extremely high clock rates. The throughput in your case is 7 words per
clock.

If your data arrives in words, one word at a time consider a systolic
priority queue. (Similar to parallel heap sort). The simplest version
has about 48x8 registers and can still run at fairly high speeds. The
throughput is one word per clock.

If your data is allready in RAM (on or of chip) sort as you would on a
CPU. Counting sort comes to mind. In your case it uses a single BRAM and
is guaranteed to complete in 256+2N clock cycles on a single BRAM port.
If you use both ports the number of cycles is halved.
3.7 clock cycles per word is not much better than other sort algorithms
for a small data set as yours, but the implementation of counting sort
is very simple and the number of clock cycles per word will actually
decrease if the number of words increases.

Kolja Sulimma
 
a BRAM-based "software like" solution will surely be the most area
efficient;
anyway, for signal processing, John might need real time?

here is another solution to find the MEDIAN of M values of N bits.

inputs are stored in M registers. (You might want to shift them along
your sampled values)
design a M bits inputs, 1 bit output module. The output value is 1 if
the number of ones is greater than the number of zeroes.
Plug this on all MSb, the output is the MSb of the final result.

Now design quite the same module, but with Mx2+1 inputs. The Mx2 are
from your M values, starting with D_M(N downto N-1), the lonely input
'i' is the previous module's output.
for each pair, if i=D_M(N), keep D_M(N-1) as is.
if not, take not(D_M(N-1)).
plug the eventually modified M bits, and plug them into a M-->1 module
just like in step 1.
you instanciate N-1 of such modules, and chain them. N outputs make the
median value requested.

for small M, you can do it in one cycle. with 48, you'll get poor
performance, so you'll design a sequential counter, or pipeline the
modules. If you try, give feedback, it's interresting.

-------------------------------
Now consider this: the method was patented in 92 by Thomson. I don't
know if it is still pending... How can they dare patent such a method
for a given algorithm? Would you dare patenting for example a fast carry
adder??? I think it is a weak patent (haven't checked the claims),
because just an implementation of a well know algorithm, and an
anteriority shall be findable.


c d saunter wrote:
Falk Brunner (Falk.Brunner@gmx.de) wrote:
: "c d saunter" <christopher.saunter@durham.ac.uk> schrieb im Newsbeitrag
: news:d84pvd$5pu$1@heffalump.dur.ac.uk...

: > Bubble sort should actually be quite fast - you can store all 48 values in
: > registers, then compare and swap if necessary odd pairs on odd clock
: cycles
: > and even pairs on even cycles. After 48 cycles the registers should hold
: > a sorted data set.

: This sounds like a almost full parallel approach. Could be quite fast, but
: also quite resouce hungry.

Yup. Unless the OP posts some details about desired performance it's impossible
to know which one is best...

: Thats why a BRAM is quite handy. Using both ports you can pull two datas out
: in one cycle, compare, and write them back on a second cycle.

Indeed. There is a nice intermediate level of parallelism availible by using
LUT RAMs to build units 16 words deep, each of which sort sequentially, with
every other set of sorts crossing the LUT RAM boundaries. This would also be
the most complicated case to code :)

cheers
cds
 
Chloe,

A minor point. You might want to change rst to rst_n to indicate that
it's active-low. It's very helpful for integration (especially when
someone else is using your module).

In addition, is this a general purpose IP? or are you targetting a
particular architecture? Some architectures have built-in clock
dividers. You may also ask yourself whether this is going to be a local
clock with low fan-out, or a global clock. In the latter case you might
want to balance the load to reduce clock skew.

cheers,

jz
 
1. A variable in Verilog not declared but used is by default assumed to
be a scalar (i.e. width = 1) of one of the 'net' types (such as, wire,
tri, wor etc.). The actual net type can be modified by the compiler
directive `default_nettype (Section 3.5 and 19 of IEEE Std. 1364-2001).

2. The same LRM says (Section 3.2.2) that '(i)t is illegal to redeclare
a name already declared by a net, parameter or variable declaration.'

Given 1 and 2, it will be illegal to declare a variable after its use.

- Swapnajit.
--
SystemVerilog, DPI, Verilog PLI and all other good stuffs.
Project VeriPage: http://www.project-veripage.com
Get information on new articles:
<URL: http://www.project-veripage.com/list/?p=subscribe&id=1>
 
This should be clarified. An undeclared identifier creates an implicit
net declaration only if it appears in certain specific contexts: an
instance port expression, or in Verilog-2001, the LHS of a continuous
assignment. In other contexts, such as within procedural code, a
reference to an undeclared identifier is simply an error.

After an implicit net declaration, a later explicit declaration would
indeed be an illegal redeclaration of the identifier.

So yes, it is illegal to declare a variable or net after its use.

One oddity occurs with module ports. A Verilog-1995 port declaration
includes two declarations: the declaration as a port, and the
declaration as a net or variable. If there is no declaration as a net
or variable, then it is implicitly declared as a net. In Verilog-XL
and any simulator that matches it, you can have the port declaration,
then uses of the identifier, then later a declaration of the identifier
as a net or variable. This isn't really a use before declaration,
since the port part of the declaration appeared before the use.
However, the identifier has not been fully declared at that point.

Functions are a special case. Verilog-XL allows the use of a simple
name for a function when it is actually a hierarchical reference to the
function. This allows declaring a set of shared functions in a
top-level module and referring to them from anywhere, without having to
use a hierarchical name. In effect, all function names are processed
as if they were hierarchical function names. So if you have a call to
a function that is not declared yet, this is OK, because it is treated
as a hierarchical name. Then during the upward search to find the
function, if it is declared in the local module, that will be found
first. This has the effect of allowing functions to be declared after
their use. I don't think that this treatment of functions is specified
in the IEEE standard, but since Verilog-XL does it, it is de facto
standard.
 
TMU wrote:

I want to model large memories in verilog.But you know that for
modeling for example 1M RAM memory about 32M in system is requierd.


so it will be fine if we could model the RAM elements in PLI.
I assume here that you are not using each element in the memory,
because then the whole memory is needed (some space can be saved
by modeling just 0 and 1).

There are few ways to do this, it depends on your tools. If you have
Cadence tools check $CDS_HOME/verilog/examples/pli/damem directory
in your installation. With VCS and Modelsim you can specify that
the memories are sparse with special comments in the code, check
from the manual the excact syntax.

If you have access to Synopsys Designware you can use the mempro
models which allocate memory as needed. Or if you have access to
SystemVerilog capable simulator, just use the assoc-memory
capabilities in the language. Or use VHDL which has dynamic memory
allocation capabilities :)


--Kim
 
marcas@ireland.com wrote:

I have a ROM model, and I use $readmemh to read its memory file.

I have multiple instances of this ROM, and I want each to read in a
different memory file.

I know I can do this statically (at compile time) using parameters as
follows:

parameter FILE_1 = "file_1.dat";
parameter FILE_2 = "file_2.dat";

rom #(FILE_1) rom_1( ... );
rom #(FILE_2) rom_2( ... );

Is there a standard way to do this dynamically (ie, at run time),
without having to re-compile/elaborate? I've tried a few things, but
none has been satifactory...
I have not figured out standard way to do this. But in some similators
this is possible to do.

If you are using Modelsim this is easy. Just put the parameter inside
the ROM model. And then from the vsim commandline you can use -G parameter
i.e. '-G/path/to/my/instance/FILENAME="file_1.dat"'

One way is to assign each ROM a unique name in the compiled code, and then play
with symbolic links in the filesystem (assuming that you are using unix).

--Kim
 
When you generate your coregen, tick the "display core footprint" option.
If you do that , then on successful generation of the core, the tool
will show you
the resource usage.

Also, based on the knowledge of available BRAM Size, you can calculate
the number of BRAMs
For example, in your case, you will need at least 8 BRAMs because of the
width requirement you have.
Hope this helps.
Cheers,
Adarsh Kumar Jain



rootz wrote:
Guys.
This must seem quite trivial, but how can I tell how many block rams I
am using in a xilinx coregen'ed fifo. Example, how many block rams are
being used in a 256bit by 32 deep fifo.

thanks
 

Welcome to EDABoard.com

Sponsor

Back
Top