How to select an FPGA size (beginner)

P

Paul Marciano

Guest
Hi, I am learning Verilog with a mind to implementing a simple 1980s
style video generator, hooked up to a W65C02S microprocessor. Very
80s.

The processor interface will have a dozen or so 8-bit registers, and
the memory interface at least a 16-bit memory address register, some
state registers and a few counters.

The VGA timing interface will have a couple or three counters.

As you can instantly tell, it hasn't been thought through yet. This
is day two of my "do something concrete" plan.


In looking at FGPA specifications on the XILINX site, I see gate
counts and CLB counts. The specs suggest that a CLB can hold two
registered bits. I figure two things:

1. I need a device with at least nRegisterBits/2 CLBs.
2. nRegisterBits/2 is probably grossly optimistic.

So my questions are:
1. How do you get a coarse feel for the size of FPGA you need for a
design such as this?
2. Do CLBs map 1:1 with registered bits, or per spec 2:1.

I was hoping to be able to keep to a sub $50 FGPA, but it's not
looking promising. I'd appreciate any advice or pointers from real
engineers on how to go about selecting the right device.


Thanks,
Paul.
 
Paul Marciano wrote:
Hi, I am learning Verilog with a mind to implementing a simple 1980s
style video generator, hooked up to a W65C02S microprocessor. Very
80s.

The processor interface will have a dozen or so 8-bit registers, and
the memory interface at least a 16-bit memory address register, some
state registers and a few counters.

The VGA timing interface will have a couple or three counters.

As you can instantly tell, it hasn't been thought through yet. This
is day two of my "do something concrete" plan.


In looking at FGPA specifications on the XILINX site, I see gate
counts and CLB counts. The specs suggest that a CLB can hold two
registered bits. I figure two things:

1. I need a device with at least nRegisterBits/2 CLBs.
2. nRegisterBits/2 is probably grossly optimistic.

So my questions are:
1. How do you get a coarse feel for the size of FPGA you need for a
design such as this?
2. Do CLBs map 1:1 with registered bits, or per spec 2:1.

I was hoping to be able to keep to a sub $50 FGPA, but it's not
looking promising. I'd appreciate any advice or pointers from real
engineers on how to go about selecting the right device.


Thanks,
Paul.
Hi!

Each logic element holds basicaly a DFF and a 4 input LUT, meaning that
you can make any logic that uses a single FF and some logic with max 4
inputs. They do have also a carry chain, meaning that a simple counter
will use one logic element per bit. Each Xilinx CLB has two o these.
Each ALTERA LE is one of these. Gate count is useless.
For your registers, don´t forget the embedded ram blocks.
Both ALTERA Cyclone and XILINX SpartaIIe should achieve what you want at
arround $20 a single FPGA chip (Digikey price for XILINX, Arrow price
for ALTERA).
The best option to have it selected is, usually, take ALTERA QUARTUS or
XILINX WEBPACK, make a first version and see where it fits. Double it to
have plenty of room for ajustments corrections and improvements. If it
were a product, choose the smallest not so tight device for production.

Hope to have helped...
Ricardo
 
Paul Marciano wrote:

Hi, I am learning Verilog with a mind to implementing a simple 1980s
style video generator, hooked up to a W65C02S microprocessor. Very
80s.
That would be just few PALs ;-)

The processor interface will have a dozen or so 8-bit registers, and
the memory interface at least a 16-bit memory address register, some
state registers and a few counters.

The VGA timing interface will have a couple or three counters.

As you can instantly tell, it hasn't been thought through yet. This
is day two of my "do something concrete" plan.
Have a look at www.fpgacpu.org. Jan Gray put a 16bit
RISC,DMA,MemoryControl & Video in a chip which is so small, it isn't
even supported by xilix tools anymore ;-)

Just as an idea how to do things ...

In looking at FGPA specifications on the XILINX site, I see gate
counts and CLB counts. The specs suggest that a CLB can hold two
registered bits. I figure two things:
If you're new to this FPGA business, forget about it at the beginning.

I was hoping to be able to keep to a sub $50 FGPA, but it's not
looking promising.
You look fine, as long as you talk about $50 for the FPGA alone.

I'd appreciate any advice or pointers from real
engineers on how to go about selecting the right device.
Just start with your design, and look the place & route statistics.
Then you really get a feeling what resources you use on what function,
and probably you even notice, that you implement it in a not so
efficient way for an FPGA.

And as soon you have some solid design,
you still can run the place & route on different families & chips,
then you really see what the difference is.

Hope it helps at least a little,
good luck
 
The ratio of LUTs to flip-flops depends heavily upon your design. For
current xilinx FPGAs, there are two flip-flops per slice, and either two
or four slices per CLB (two for virtex, 4 for virtex2). Even the
smallest FPGAs (XC2S15) have sufficient resources for a simple video
text display generator if you are clever with the design (eg, be smart
about the load values for the counters to make the decodes easy, as in
not requiring a full decode of the counter). Depending on the size of
your page and which FPGA you use, you may need some external RAM for the
page memory.

Count the register bits in the design. Look at each register and take a
swag at the number of inputs to the logic leading up to that register
and map that to 4 input logic cells. Anything with less than 4 inputs
is free, as it comes with the register. stuff with more than 4 inputs
that is not arithmetic (add/subtract) adds another LUT for each 4
inputs. With that you can get a fair guess at the number of LUTs and
FF's needed. It is easier to let the synthesis tool do it for you if
you've gotten far enough on the design.



So my questions are:
1. How do you get a coarse feel for the size of FPGA you need for a
design such as this?
2. Do CLBs map 1:1 with registered bits, or per spec 2:1.

I was hoping to be able to keep to a sub $50 FGPA, but it's not
looking promising. I'd appreciate any advice or pointers from real
engineers on how to go about selecting the right device.

Thanks,
Paul.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Lots of responses already so I'll just add a few bits that have been missed
....

"Paul Marciano" <pm940@yahoo.com> wrote in message
news:d5bc3deb.0405181044.7821e99@posting.google.com...
1. I need a device with at least nRegisterBits/2 CLBs.
2. nRegisterBits/2 is probably grossly optimistic.
Actually I've yet to run out of CLBs in anything I've done (not a lot of
complicated stuff) however you can easily run out of "clock domains" (global
clocks) routing resources (no way to connect two parts of your design), and
I/O pins.

So perhaps a better way to approach this is figure out how many I/O's you
need. Get the free WebPack tools and select a devices that has that many
i/o's. Start developing your verilog stuff and synthesize it after each
major subsystem is done. If you run out of CLB's go to the next bigger, if
there isn't one bigger in the same package, try the next larger package.
Then the next "family." The cost -> complexity path is:

9500 CPLD -> CoolRunner CPLD -> Spartan 2 -> Spartan 2E -> Spartan 3 ->
Spartan3E

Since you can get Spartan 3's with a bazillion CLBs I know you won't get
that far up the food chain.

The latest Digikey catalog sells Spartan 2's with 50K "gates" for $22 qty 1,
You can do a complete CPU + Video etc in one of those. Actually you should
probably google for "Commodore One" where Jeri Ellsworth has implemented in
FPGA replacements for the Commodore 64 special function chips.

On a related note, does anyone have a decent S-video circuit that one could
use an FPGA to implement? Lots of FPGA kits have VGA connectors (simple RGB
+ Sync output) but I'd like something I could use to make video on my
television ...

--Chuck
 
Chuck McManis wrote:
(snip)

On a related note, does anyone have a decent S-video circuit that one could
use an FPGA to implement? Lots of FPGA kits have VGA connectors (simple RGB
+ Sync output) but I'd like something I could use to make video on my
television ...
The old trick for generating composite video should still work for
S-video, except that the chrominance and luminance are not combined.
Look at how the IBM CGA did it, for example. The dot clock is
4x the color subcarrier (3579545Hz), such that the result comes
out a specific color depending on the phase of the result.
D/A converters on each, so that you can generate more than just
on/off for each color.

It might be that you can do a little better with a higher
multiple of 3579545, but you will find the 4x crystals easier
to find. (They still exist in any PC with an ISA bus.)

-- glen
 
If you are doing a synchronous design you normally shouldn't run out of clock
resources. What are you doing that uses so many clocks? I suspect your
clocking is also causing your routing woes. The virtex parts have abundant
routing resources; it is not that easy to use them up.

Chuck McManis wrote:

Lots of responses already so I'll just add a few bits that have been missed
...

"Paul Marciano" <pm940@yahoo.com> wrote in message
news:d5bc3deb.0405181044.7821e99@posting.google.com...
1. I need a device with at least nRegisterBits/2 CLBs.
2. nRegisterBits/2 is probably grossly optimistic.

Actually I've yet to run out of CLBs in anything I've done (not a lot of
complicated stuff) however you can easily run out of "clock domains" (global
clocks) routing resources (no way to connect two parts of your design), and
I/O pins.
<snip>

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
"E.S." <emu@ecubics.com> wrote in message news:<C9vqc.149$Ou.77@fe39.usenetserver.com>...
The VGA timing interface will have a couple or three counters.

As you can instantly tell, it hasn't been thought through yet. This
is day two of my "do something concrete" plan.

Have a look at www.fpgacpu.org. Jan Gray put a 16bit
RISC,DMA,MemoryControl & Video in a chip which is so small, it isn't
even supported by xilix tools anymore ;-)
That's very interesting. His VGA timing block uses half as many
registers as my first go at it. I think I'm over-using registers
(where wires would work fine).

Jan uses an LSFR counter for his horizontal and vertical counters. I
read a thread on this newgroup about such counters from, I think,
around 2001. Talking about wide 100MHz counters. The general view
was that LFSR counters use fewer CLBs than binary counters, and run
faster due to the lack of a carry chain, but come with caveats. A
post near the end of the thread said, effectively, "use straight
binary counters and let the synthesis tool figure it out - modern
FPGAs are fast and the tools are good".


The VGA dot clock is around 25MHz. I need a 10-bit horizontal counter
and a 9-bit vertical counter.

reg [9:0] xcnt; // counts from 0 to 799.

always @(posedge clk)
if (reset || xcnt == 799)
xcnt <= #`TP 10'h0;
else
xcnt <= #`TP xcnt + 1;

Is that sufficient for a 10-bit 25MHz counter in the real world for
this application, or should I already been looking at LFSR counters?


Just start with your design, and look the place & route statistics.
Then you really get a feeling what resources you use on what function,
and probably you even notice, that you implement it in a not so
efficient way for an FPGA.

And as soon you have some solid design,
you still can run the place & route on different families & chips,
then you really see what the difference is.
Thanks. That sounds like a plan.

Paul.
 
Using LFSRs for video counters comes from the days when 25-30 MHz performance took a lot of
work, and especially with earlier families like the 3000 and 3100 that did not have carry
chains (in which case, a binary counter not only was slow, but it also took up a lot of
resources...more than one level of logic). 10 bit counters in modern FPGAs can run at
several hundred MHz, and the carry logic is free. For 25 MHz video, it is not worth the
extra headache of working with an LFSR. It gains you nothing in this case. If you look at
my Dynamic VIdeo hardware paper from ca. 1996, it also used LFSRs in the video timing logic.
That was done in National CLAY FPGAs, which are structurally similar to the Atmel 6000 series
parts, no carry chain, very limited interconnect, and simple logic cells that did not do
random logic well.




Paul Marciano wrote:
<snip>

reg [9:0] xcnt; // counts from 0 to 799.

always @(posedge clk)
if (reset || xcnt == 799)
xcnt <= #`TP 10'h0;
else
xcnt <= #`TP xcnt + 1;

Is that sufficient for a 10-bit 25MHz counter in the real world for
this application, or should I already been looking at LFSR counters?
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
"Ray Andraka" <ray@andraka.com> wrote in message
news:40AB48D9.C3F88E92@andraka.com...
If you are doing a synchronous design you normally shouldn't run out of
clock
resources. What are you doing that uses so many clocks? I suspect your
clocking is also causing your routing woes. The virtex parts have
abundant
routing resources; it is not that easy to use them up.
Some assumptions in there Ray :

"...If you are doing a synchronous design ..." but these days people are
doing SOC designs that might have a video clock, a CPU clock, a clock
driving an ethernet PHY, and a perhaps some refresh logic for their DRAM
controller. I'm not disagreeing with you, my point was that "gates" are
generally not the thing you run out of first.

Next up "...The virtex parts..." however the original poster wanted to stay
away from the "expensive" chips. so starting from CPLDs and moving up,
you're somewhat constrained there. Lots of clock resources in a Virtex II
Pro, but then again in single quantities the chip is $300. :)

--Chuck
 
Even the smallest FPGAs have four or more clock nets, and that is going back to
the
4000's. Clock enables can do a lot for you, in most cases there is not really
a need for
a proliferation of clocks. Multiple clock domains give rise to potential
timing
constraints issues, as well as problems crossing clock domain boundaries. Not
that
any of that is insurmountable, just that it requires extra diligence to make
sure you
do it right, and that the tools don't mess up your good work.

When I said virtex parts, I was referring to the virtex architecture, which
includes all the
current families. SpartanII is the virtex1 architecture, spartan2e is virtexE,
spartan3 is
a mutation of virtexII. The point is all of these families have ample
routing. Now,
whether the tools make efficient use of that routing is another question
altogether (they
don't, the tools now do a 'lazy' routing that only improves the routing until
it is good
enough. Problem is in a dense design, the circuitous routing artificially
congests the
routing resources which can make it appear that you do not have the routing to
make timing.
Poor placement can also aggravate the routing. The placer is *still* very bad
at placing
logic when a signal goes through multiple LUTs between flip-flops, often
placing the LUTs
without flip-flops far from the destinations, and well out of the way between
the source and
destination. The result is again unnecessary congestion of the routing
resources, and pathetic
timing results. Floorplanning will relieve enough of the problems caused by
the placer that
it is very hard to run out of routing.

BTW, the routing is generally more stressed with larger devices rather than the
smaller ones.
Required routing goes up roughly with the square of the number of LUTs, yet the
routing
network is virtually unchanged across the family. Additionally, placement is
more critical with
larger devices because haphazard placement will incur large routing delay
penalties, and that
makes the job of the router tht much harder (possibly leading to a no-route
situation due to timing)
With small FPGAs, it is the memories followed by logic that gets used up
first. I would argue
that the routing in the cheap FPGAs is even more abundant.

CPLDs are a different animal altogether. There, the routing between macrocells
is generally
sparse, and without careful planning it can be easy to use up the routing
there.

I stand by my earlier comments.


Chuck McManis wrote:

"Ray Andraka" <ray@andraka.com> wrote in message
news:40AB48D9.C3F88E92@andraka.com...
If you are doing a synchronous design you normally shouldn't run out of
clock
resources. What are you doing that uses so many clocks? I suspect your
clocking is also causing your routing woes. The virtex parts have
abundant
routing resources; it is not that easy to use them up.

Some assumptions in there Ray :

"...If you are doing a synchronous design ..." but these days people are
doing SOC designs that might have a video clock, a CPU clock, a clock
driving an ethernet PHY, and a perhaps some refresh logic for their DRAM
controller. I'm not disagreeing with you, my point was that "gates" are
generally not the thing you run out of first.

Next up "...The virtex parts..." however the original poster wanted to stay
away from the "expensive" chips. so starting from CPLDs and moving up,
you're somewhat constrained there. Lots of clock resources in a Virtex II
Pro, but then again in single quantities the chip is $300. :)

--Chuck
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
Just to reinforce what Ray says in his response, if you have designs that
have a lot of input clocks, I have found that some effort up front to retime
these clocks to one higher frequency masterclock (which is clock enabled for
each source clock domain) can often save you from a world of pain later on.
Especially if you're transferring data between domains. The DCMs are great
for making this masterclock.
Cheers, Syms.

"Chuck McManis" <devnull@mcmanis.com> wrote in message
news:IWUqc.51416$fS2.33280@newssvr29.news.prodigy.com...
"...If you are doing a synchronous design ..." but these days people are
doing SOC designs that might have a video clock, a CPU clock, a clock
driving an ethernet PHY, and a perhaps some refresh logic for their DRAM
controller. I'm not disagreeing with you, my point was that "gates" are
generally not the thing you run out of first.
 

Welcome to EDABoard.com

Sponsor

Back
Top