Lattice Announces EOL for XP and EC/P Product Lines

Brian Davis · Sep 3, 2013

rickman wrote:

I have been looking at these parts for some time and I never
realized they don't include distributed RAM using the LUTs.

Also of note, the ICE40 Block RAM's two ports consist of
one read-only port, and one write-only port; vs. the two
independent read+write ports of many other FPGA families.

Lattice has a license on many Xilinx owned patents because
they bought the Orca line from Lucent who had gotten all
sorts of licensing from Xilinx in a weak moment.
snip
I'll never understand why they licensed their products to Lucent.

I'd reckon AT&T/Lucent had a large semiconductor patent
portfolio with which to apply strategic "leverage" for a
favorable cross-licensing agreement.

If the processor were integrated into the FPGA, then we
are back to a single simulation, schweet!

As a yardstick, a system build for my homebrew RISC,
including 4 Kbyte BRAM, UART and I/O, fits snugly into
one of the 1280 LUT4 XO2 devices:

: Number of logic LUT4s: 890
: Number of distributed RAM: 66 (132 LUT4s)
: Number of ripple logic: 110 (220 LUT4s)
: Number of shift registers: 0
: Total number of LUT4s: 1242
:
: Number of block RAMs: 4 out of 7 (57%)

The core proper (32 bit datapath, 16 bit instructions)
is currently ~800 LUT4 in its' default configuration.
[ I miss TBUF's when working on processor datapaths.]

I don't have the XO2 design checked in, but the similar
XP2 version is in the following code repository, under
trunk/hdl/systems/evb_lattice_xp2_brevia :

http://code.google.com/p/yard-1/

The above is still very much a work-in-progress, but
far enough along to use for small assembly projects
( note that interrupts are currently broken ).

-Brian

rickman · Sep 3, 2013

On 9/2/2013 9:56 PM, Brian Davis wrote:

rickman wrote:

I have been looking at these parts for some time and I never
realized they don't include distributed RAM using the LUTs.

Also of note, the ICE40 Block RAM's two ports consist of
one read-only port, and one write-only port; vs. the two
independent read+write ports of many other FPGA families.

The iCE family of products have a number of shortcomings compared to the
large parts sold elsewhere, but for a reason, the iCE lines are very,
very low power. You can't do that if you have a lot of "fat" in the
hardware. So they cut to the bone. This is not the only area where the
parts are a little short. The question is how much does it matter? For
a long time I've heard how brand X or A or whatever is better because of
this feature or that feature. So the iCE line has few of these fancy
features, how well do designs work in them?

Lattice has a license on many Xilinx owned patents because
they bought the Orca line from Lucent who had gotten all
sorts of licensing from Xilinx in a weak moment.
snip
I'll never understand why they licensed their products to Lucent.

I'd reckon AT&T/Lucent had a large semiconductor patent
portfolio with which to apply strategic "leverage" for a
favorable cross-licensing agreement.

Possible, but I don't think so. Any number of folks could have had
semiconductor patents and no one else got anything like this. I would
speculate that Xilinx needed a second source for some huge customer or
maybe they were at a critical point in the company's growth and just
needed a bunch of cash (as opposed to cache). Who knows?

If the processor were integrated into the FPGA, then we
are back to a single simulation, schweet!

As a yardstick, a system build for my homebrew RISC,
including 4 Kbyte BRAM, UART and I/O, fits snugly into
one of the 1280 LUT4 XO2 devices:

: Number of logic LUT4s: 890
: Number of distributed RAM: 66 (132 LUT4s)
: Number of ripple logic: 110 (220 LUT4s)
: Number of shift registers: 0
: Total number of LUT4s: 1242
:
: Number of block RAMs: 4 out of 7 (57%)

The core proper (32 bit datapath, 16 bit instructions)
is currently ~800 LUT4 in its' default configuration.
[ I miss TBUF's when working on processor datapaths.]

I don't have the XO2 design checked in, but the similar
XP2 version is in the following code repository, under
trunk/hdl/systems/evb_lattice_xp2_brevia :

http://code.google.com/p/yard-1/

The above is still very much a work-in-progress, but
far enough along to use for small assembly projects
( note that interrupts are currently broken ).

The trick to datapaths in CPU designs is to minimize the number of
inputs onto a "bus" which is implemented as multiplexers. Minimizing
inputs gains speed and minimizes logic. When possible put the muxes
inside some RAM on the chip to good use. I got sidetracked on my last
iteration of a CPU design which was going to use a block RAM as the
"register file" and stack in one. Since then I've read about some other
designs which use similar ideas although not identical.

Why did you roll your own RISC design when each FPGA maker has their
own? The Lattice version is even open source.

--

Rick

Brian Davis · Sep 4, 2013

rickman wrote:

I'll never understand why they licensed their products to Lucent.

I'd reckon AT&T/Lucent had a large semiconductor patent
portfolio with which to apply strategic "leverage" for a
favorable cross-licensing agreement.

Possible, but I don't think so. Any number of folks could
have had semiconductor patents and no one else got anything
like this. I would speculate that Xilinx needed a second source

There was definitely a second source in the XC3000 days,
first from MMI (bought by AMD), later AT&T; but I don't
remember there being anyone second sourcing the XC4000

IIRC, as Xilinx introduced the XC4000, AT&T went their
own way in the ORCA, with similar features (distributed RAM,
carry chains), but using the Neocad software.

My speculation is that at this juncture, AT&T leveraged
rights to the Xilinx FPGA patents.

Back in 1995, the AT&T press release responding to the
Neocad acquisition was re-posted here:

https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ

and stated:
"
" When AT&T Microelectronics decided not to second source
" the Xilinx 4000 family of FPGAs, we accelerated the
" introduction of the ORCA family.
"

-----------------

The trick to datapaths in CPU designs is to minimize
the number of inputs onto a "bus" which is implemented
as multiplexers.

Yes, that's why I miss the TBUF's

In the XC4000/Virtex days, the same 32 bit core fit into
300-400 LUT4's, and a good number of TBUF's.

The growth to ~800 LUT4 is split between the TBUF
replacement muxes and new instruction set features.

Why did you roll your own RISC design when each FPGA
maker has their own?

When the YARD core blinked it's first LED in 1999,
there wasn't much in the way of free vendor RISC IP.

Being a perpetually-unfinished spare-time project,
I never got enough loose ends tidied up enough to
make the sources available until recently.

The Lattice version is even open source.

At the initial announcement, yes; but when I looked
a couple years ago, the Lattice Mico source files
had been lawyered up with a "Lattice Devices Only"
clause, see the comments on this thread:

http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

-Brian

rickman · Sep 4, 2013

On 9/3/2013 6:27 PM, Brian Davis wrote:

rickman wrote:

I'll never understand why they licensed their products to Lucent.

I'd reckon AT&T/Lucent had a large semiconductor patent
portfolio with which to apply strategic "leverage" for a
favorable cross-licensing agreement.

Possible, but I don't think so. Any number of folks could
have had semiconductor patents and no one else got anything
like this. I would speculate that Xilinx needed a second source

There was definitely a second source in the XC3000 days,
first from MMI (bought by AMD), later AT&T; but I don't
remember there being anyone second sourcing the XC4000

IIRC, as Xilinx introduced the XC4000, AT&T went their
own way in the ORCA, with similar features (distributed RAM,
carry chains), but using the Neocad software.

My speculation is that at this juncture, AT&T leveraged
rights to the Xilinx FPGA patents.

Back in 1995, the AT&T press release responding to the
Neocad acquisition was re-posted here:

https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ

and stated:
"
" When AT&T Microelectronics decided not to second source
" the Xilinx 4000 family of FPGAs, we accelerated the
" introduction of the ORCA family.
"

Yes, that is what we are discussing. Why did *Xilinx* give out the
family jewels to Lucent? We know it happened, the question is *why*?

-----------------

The trick to datapaths in CPU designs is to minimize
the number of inputs onto a "bus" which is implemented
as multiplexers.

Yes, that's why I miss the TBUF's

In the XC4000/Virtex days, the same 32 bit core fit into
300-400 LUT4's, and a good number of TBUF's.

The growth to ~800 LUT4 is split between the TBUF
replacement muxes and new instruction set features.

My understanding is that TBUFs may have been a good idea when LUT delays
were 5 nS and routing was another 5 to 10 between LUTs, but as they made
the devices more dense and faster they found the TBUFs just didn't scale
in the same way, in fact the speed got worse! The capacitance being
driven didn't go down much and the TBUFs needed to scale which means
they had less drive. So they would have actually gotten slower. No,
they are gone because TBUFs just aren't your friend when you want to
make a dense, fast chip.

Why did you roll your own RISC design when each FPGA
maker has their own?

When the YARD core blinked it's first LED in 1999,
there wasn't much in the way of free vendor RISC IP.

Being a perpetually-unfinished spare-time project,
I never got enough loose ends tidied up enough to
make the sources available until recently.

Ok, that makes sense. I rolled my first CPU around 2002 and, like you,
it may have been used, but still is not finished.

The Lattice version is even open source.

At the initial announcement, yes; but when I looked
a couple years ago, the Lattice Mico source files
had been lawyered up with a "Lattice Devices Only"
clause, see the comments on this thread:

http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

Oh, that is a horse of a different color. So the Lattice CPU designs
are out! No big loss. The 8 bitter doesn't have a C compiler (not that
I care) and good CPU designs are a dime a dozen... I guess, depending on
your definition of "good".

--

Rick

glen herrmannsfeldt · Sep 4, 2013

rickman <gnuarm@gmail.com> wrote:

"

Yes, that is what we are discussing. Why did *Xilinx* give out the
family jewels to Lucent? We know it happened, the question is *why*?

(snip)

Yes, that's why I miss the TBUF's

In the XC4000/Virtex days, the same 32 bit core fit into
300-400 LUT4's, and a good number of TBUF's.

The growth to ~800 LUT4 is split between the TBUF
replacement muxes and new instruction set features.

My understanding is that TBUFs may have been a good idea when LUT delays
were 5 nS and routing was another 5 to 10 between LUTs, but as they made
the devices more dense and faster they found the TBUFs just didn't scale
in the same way, in fact the speed got worse! The capacitance being
driven didn't go down much and the TBUFs needed to scale which means
they had less drive. So they would have actually gotten slower. No,
they are gone because TBUFs just aren't your friend when you want to
make a dense, fast chip.

That is probably enough, but it is actually worse than that.

At about 0.8 micron, the wiring has to use a distributed RC model.

Above, you can treat it as driving a capacitor with a current source.
All points are, close enough, the same voltage, and the only thing
that matters is what that voltage is. (LC delay is pretty low.)

Below 0.8 micron, and besides the fact that the lines are getting
longer, the resistance is also significant. It is then modeled as
series resistors and capacitors to ground, all the way down the line.
(As well as I remember, the inductance is less singificant that
resistance, but I haven't thought about it that closely for
a while now.)

-- glen

GaborSzakacs · Sep 4, 2013

rickman wrote:
[snip]

My understanding is that TBUFs may have been a good idea when LUT delays
were 5 nS and routing was another 5 to 10 between LUTs, but as they made
the devices more dense and faster they found the TBUFs just didn't scale
in the same way, in fact the speed got worse! The capacitance being
driven didn't go down much and the TBUFs needed to scale which means
they had less drive. So they would have actually gotten slower. No,
they are gone because TBUFs just aren't your friend when you want to
make a dense, fast chip.

I think TBUFs went away along with "long lines" due to capacitive delay
as you noted. Modern FPGA's use buffered routing, and tristates don't
match up with that sort of routing network since the buffered routes
become unidirectional. The silicon for line drivers is now much faster
than routing prop delays, making the buffered network faster than a
single point driving all that line capacitance. So the new parts have
drivers in every switch box instead of just pass FETs. I think the
original Virtex line was the first to use buffered routing, part of
the Dyna-Chip aquisition by Xilinx. They still had long lines and
TBUFs, but that went away on Virtex 2.

--
Gabor

Brian Davis · Sep 5, 2013

Gabor wrote:
> I think TBUFs went away along with "long lines" due to capacitive delay

I appreciate the rationale.
Yet still I miss their functionality for processor designs.
[ "Lament of the TBUF" would make an excellent dirge title ]

> Modern FPGA's use buffered routing, and tristates don't match up with that

I think I once read that the last generation or few of TBUF's were actually implemented with dedicated muxes/wired OR's, or something similar.

I wish that had been continued on a reduced scale, TBUF's every 4 or 8 columns, matching the carry chain pitch, spanning some horizontal fraction of a clock region.

-Brian

glen herrmannsfeldt · Sep 5, 2013

Brian Davis <brimdavis@aol.com> wrote:

Gabor wrote:
(snip)
Modern FPGA's use buffered routing, and tristates don't
match up with that

I think I once read that the last generation or few of
TBUF's were actually implemented with dedicated muxes/wired
OR's, or something similar.

As far as I know, they are still implemented by the synthesis
tools as either OR or AND logic. I don't know any reason to remove
that ability, as it doesn't depend on the hardware. Then again, it
isn't hard to write the logic directly.

-- glen

Mark Curry · Sep 5, 2013

In article <l08jta$9m8$1@speranza.aioe.org>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

Brian Davis <brimdavis@aol.com> wrote:
Gabor wrote:
(snip)
Modern FPGA's use buffered routing, and tristates don't
match up with that

I think I once read that the last generation or few of
TBUF's were actually implemented with dedicated muxes/wired
OR's, or something similar.

As far as I know, they are still implemented by the synthesis
tools as either OR or AND logic. I don't know any reason to remove
that ability, as it doesn't depend on the hardware. Then again, it
isn't hard to write the logic directly.

We do this now in verilog - declare our read data bus (and similar signals) as
"wor" nets. Then you can tie them all together as needed. Saves you the
hassle of actually creating/managing individual return data, and muxxing it
all.

The individual modules must take care to drive 0's on the read_data when not
in use. Then you're really creating multi-source signals (like past bus
structures), but relying on the "wor" to resolve the net.

Works in Xilinx XST and Synplicity. Don't know about others. Don't know if
this trick would work in VHDL.

--Mark

glen herrmannsfeldt · Sep 5, 2013

Mark Curry <gtwrek@sonic.net> wrote:

(snip, I wrote)

As far as I know, they are still implemented by the synthesis
tools as either OR or AND logic. I don't know any reason to remove
that ability, as it doesn't depend on the hardware. Then again, it
isn't hard to write the logic directly.

We do this now in verilog - declare our read data bus
(and similar signals) as "wor" nets. Then you can tie them
all together as needed. Saves you the hassle of actually
creating/managing individual return data, and muxxing it all.

The individual modules must take care to drive 0's on the
read_data when not in use. Then you're really creating
multi-source signals (like past bus structures), but
relying on the "wor" to resolve the net.

I think you can also do it with traditional tri-state gates,
but pretty much the same as AND with the enable, and then onto
the WOR line.

Works in Xilinx XST and Synplicity. Don't know about others.
Don't know if this trick would work in VHDL.

I can usually read VHDL but don't claim to write it.

-- glen

Sep 6, 2013

The standard data type (std_logic) is tri-statable in VHDL, so that would be the preferred choice, rather than WAND or WOR. It does come in handy in that a single bidirectional port in RTL can represent both input and output wires, and part of the mux, at the gate level.

Tri-state bidirectional ports allow distributed address decoding in the RTL (give a module the address and a generic to tell it what addresses to respond to), even though at the gate level it will all get optimized together at the muxes.

Some synthesis tools can even "register" tri-state values to allow you to simplify pipelining in the RTL. Synthesis takes care of the details of separating out the tri-state enable from the data, and registering both appropriately.

Andy

Lattice Announces EOL for XP and EC/P Product Lines

Brian Davis

Guest

rickman

Guest

Brian Davis

Guest

rickman

Guest

glen herrmannsfeldt

Guest

GaborSzakacs

Guest

Brian Davis

Guest

glen herrmannsfeldt

Guest

Mark Curry

Guest

glen herrmannsfeldt

Guest

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Lattice Announces EOL for XP and EC/P Product Lines

Brian Davis

Guest

rickman

Guest

Brian Davis

Guest

rickman

Guest

glen herrmannsfeldt

Guest

GaborSzakacs

Guest

Brian Davis

Guest

glen herrmannsfeldt

Guest

Mark Curry

Guest

glen herrmannsfeldt

Guest

Guest

Log in

Welcome to EDABoard.com

Sponsor