Nios II Going Live...

Hopefully the size of your paycheck will be the third possible
size, not the first ;-)

Eric Crabill wrote:

Goran Bilski wrote:

There should only be different 3 numbers used as sizes,
0, 1 or infinity. Any other number will creating barriers
that will be reach and have impacts on the system.

I'm headed over to payroll right now!

Eric
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin,
1759
 
Austin Lesea wrote:

Tim,

Low blow.

S3 shipped a lot of parts last quarter. A whole lot of parts.
Sorry Austin,

Just asked, for the xc3s400-fg456 I can get now ces samples.
Probably I can get parts in 14 weeks, so I stay with spartanII.

No one expected the product to gather that many orders that fast. Even
the optomists among us were made to look like pessimistic fools.

If we would have only believed our own sales pitch that S3 was a better
deal than an ASIC in volume (which it is), we might have been at least
partially prepared.
So, xilinx did something right, and it is wrong again ?
;-)

Austin

Tim wrote:

"Austin Lesea" wrote

At lunch the other day we were reminiscing about how the Z8000 never
took off because they changed their architecture and instruction set
completely from the Z80 and immediately alienated all of their customers
(who were still programming in assembly language in those days).
Not quite getting it into production may have troubled some customers...

Not like that anymore.
The Spartan-3 of its day ;-)
 
ES,

Yes. Xilinx just can not do anything at all "right."

The most FPGAs shipped in the history of FPGAs (Virtex family), the only
FPGA with embedded processors, the first FPGA ever with 10 Gbs
transceivers, lowest interrupt latency of any soft processor core(and
even better than most hard processors), 40% speed improvement in our
tools, over 250K seats of software shipped, XCell Magazine with a
subscription larger than the premier electronics mag......

Such a bummer, I guess we must just keep striving to be better and better!

Austin
 
In deeply embedded systems (i.e. no RTOS), the use of the windowed
registers is extremely useful due to its speed. When you start using
the processor in applications that have an RTOS, it's a different
story. Each time you have to do a context switch, unless the RTOS is
really clever, you have to save out the whole set of registers
associated with the task that is getting swapped out and read in the
set of registers for the task that is getting swapped back in.

Initially soft-core microprocessors on FPGAs were used as simple
control processors by the HW engineers in place of state-machines. So
only rarely did someone want to run an OS on them. But as they have
gained more acceptance, engineers want the same tools that other
microprocessors provide.

Having a compiler option allowed you to choose between using the
windowed registers vs. a flat register. With Nios II we optimized for
size and speed, and the architecture we chose did not use the register
windows.

-Joel-


Jim Granville <no.spam@designtools.co.nz> wrote in message news:<43Qqc.3057$FN.324468@news02.tsnz.net>...
Goran Bilski wrote:

It's creating weird situation in embedding processing where you reach
the limit of the window.
There should only be different 3 numbers used as sizes, 0, 1 or infinity.
Any other number will creating barriers that will be reach and have
impacts on the system.
On reaching the limit of the register window, you have a big chunk of
data to save and load which isn't nice to have when you need to have a
deterministic system.

I'm lost - since the register count is finite at around 32 in most RISC
designs, how does removing a feature improve the situation ?.

I don't know the specific NIOS details, but Register window/Frame
Pointer/Register Bank select schemes have been around for years, and
can greatly help code density and reaction speed if done properly.
I think sparc had a clever partial frame pointer, that allowed some
registers to carry calling/return parameters, and some as local variables.
The compiler needs 'to be on its toes', but that's a SW
housekeeping issue.
Another nice feature of register frame pointers, is if you are
uncomfortable with them, you can just ignore it, and you have
a 'vanilla RISC' core.

-jg
 
Joel A. Seely wrote:
In deeply embedded systems (i.e. no RTOS), the use of the windowed
registers is extremely useful due to its speed. When you start using
the processor in applications that have an RTOS, it's a different
story. Each time you have to do a context switch, unless the RTOS is
really clever, you have to save out the whole set of registers
associated with the task that is getting swapped out and read in the
set of registers for the task that is getting swapped back in.
The problem here is because you accept/target a 'less than really
clever' RTOS, you also compromise the available peak performance.

Initially soft-core microprocessors on FPGAs were used as simple
control processors by the HW engineers in place of state-machines. So
only rarely did someone want to run an OS on them. But as they have
gained more acceptance, engineers want the same tools that other
microprocessors provide.
....but the first group have not 'gone away' ?

Having a compiler option allowed you to choose between using the
windowed registers vs. a flat register. With Nios II we optimized for
size and speed, and the architecture we chose did not use the register
windows.
One advantage of a FPGA core is you CAN change it as tools evolve :)

One path that appeals for embedded design, is the hyperthread/switched
CPU approach, that is now appearing in mainstream MPU (and so tools will
follow, over time ).
eg Ubicom divide their newest CPU into (IIRC) 64 time slots, and
tasks/processes can have N,M etc of those slots assigned. Result is good
granularity of horsepower allocation, and very hard real-time performance.
With a FPGA, you could assign the hard real time stuff to one Core,
with register pointer features ON, and running the small, time paranoid
code.
Time muxed on the other core, you can run the softer-time stuff, on a
RTOS, with register pointer features OFF.
In this approach, you are really multiplexing at the slowest memory
BUS pivot, rather than context thrashing a single, fast core.
-jg
 
Austin Lesea <austin@xilinx.com> wrote in message news:<c8im4g$cfc1@cliff.xsj.xilinx.com>...
ES,

Yes. Xilinx just can not do anything at all "right."

The most FPGAs shipped in the history of FPGAs (Virtex family), the only
FPGA with embedded processors, the first FPGA ever with 10 Gbs
transceivers, lowest interrupt latency of any soft processor core(and
even better than most hard processors), 40% speed improvement in our
tools, over 250K seats of software shipped, XCell Magazine with a
subscription larger than the premier electronics mag......

Such a bummer, I guess we must just keep striving to be better and better!

Austin
Mr. Lesea, this is not a flame, but to correct an error in your
statement:

"the only FPGA with embedded processors" is far from the truth. The
following come to mind immediately (I'm sure I'm forgetting several):
- Nios & Excalibur (Introduced June, 2000, that was FOUR years ago,
and a year ahead of the competition; my how time flies!)
- QuickLogic
- The company you just acquired (I'll leave my theories out of this
post)

All are processors on FPGAs. These are commercial offerings, there are
numerous 3rd party & free cores out there too.

Your comments on ISR latency can be debated if you like, but I won't
get into it now; there is already a thread discussing the
architectural pros & cons that affect this.

Boy, all this stuff makes feel me like I did yesterday when a guy
dropped in on me while surfing.

Regards,

Jesse Kempa
Altera Corp.
jkempa at altera dot com
 
Austin Lesea wrote:

lowest interrupt latency of any soft processor core (and
even better than most hard processors)
that must be red rag to a bull for john jackson and the other
transputer folk.

and why are there so many transputer people in fpgaland?
 
"Joel A. Seely" <jseely@altera.com> wrote in message
news:9bded7a8.0405200947.28b2d90c@posting.google.com...
In deeply embedded systems (i.e. no RTOS), the use of the windowed
registers is extremely useful due to its speed. When you start using
the processor in applications that have an RTOS, it's a different
story. Each time you have to do a context switch, unless the RTOS is
really clever, you have to save out the whole set of registers
associated with the task that is getting swapped out and read in the
set of registers for the task that is getting swapped back in.

Initially soft-core microprocessors on FPGAs were used as simple
control processors by the HW engineers in place of state-machines. So
only rarely did someone want to run an OS on them. But as they have
gained more acceptance, engineers want the same tools that other
microprocessors provide.

Having a compiler option allowed you to choose between using the
windowed registers vs. a flat register. With Nios II we optimized for
size and speed, and the architecture we chose did not use the register
windows.

-Joel-
Hi Joel,

(trying to tune out the trolls here)
I'm going to be porting what you call a "deeply embedded" interrrupt driven
application from NiosI to NiosII shortly. Can you contrast the two in terms
of interrupt latency?

The app was originally developed on a dual coldfire system and I can say a
single Cyclone based NiosI handles things very nicely. I'm looking forward
to the new IDE and even more performance.

Whatever the naysayer's say, Motorola is not sending me a new higher
performance cpu to download to my *existing boards*. This is great stuff!

TIA,
Ken
 
Jesse,

Processors, plural.

I'm still right.

Austin

Jesse Kempa wrote:
Austin Lesea <austin@xilinx.com> wrote in message news:<c8im4g$cfc1@cliff.xsj.xilinx.com>...

ES,

Yes. Xilinx just can not do anything at all "right."

The most FPGAs shipped in the history of FPGAs (Virtex family), the only
FPGA with embedded processors, the first FPGA ever with 10 Gbs
transceivers, lowest interrupt latency of any soft processor core(and
even better than most hard processors), 40% speed improvement in our
tools, over 250K seats of software shipped, XCell Magazine with a
subscription larger than the premier electronics mag......

Such a bummer, I guess we must just keep striving to be better and better!

Austin


Mr. Lesea, this is not a flame, but to correct an error in your
statement:

"the only FPGA with embedded processors" is far from the truth. The
following come to mind immediately (I'm sure I'm forgetting several):
- Nios & Excalibur (Introduced June, 2000, that was FOUR years ago,
and a year ahead of the competition; my how time flies!)
- QuickLogic
- The company you just acquired (I'll leave my theories out of this
post)

All are processors on FPGAs. These are commercial offerings, there are
numerous 3rd party & free cores out there too.

Your comments on ISR latency can be debated if you like, but I won't
get into it now; there is already a thread discussing the
architectural pros & cons that affect this.

Boy, all this stuff makes feel me like I did yesterday when a guy
dropped in on me while surfing.

Regards,

Jesse Kempa
Altera Corp.
jkempa at altera dot com
 
Austin Lesea wrote:
ES,

Yes. Xilinx just can not do anything at all "right."

The most FPGAs shipped in the history of FPGAs (Virtex family), the only
FPGA with embedded processors, the first FPGA ever with 10 Gbs
transceivers, lowest interrupt latency of any soft processor core(and
even better than most hard processors), 40% speed improvement in our
tools, over 250K seats of software shipped, XCell Magazine with a
subscription larger than the premier electronics mag......

Such a bummer, I guess we must just keep striving to be better and better!
I am just curious Austin, do you think this message helped either you or
Xilinx?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
 
Goran Bilski wrote:
It seems that Altera has created a MicroBlaze as well.
They have finally realized that a FPGA based soft processor should have
- 32 bit ISA
- 32 registers
- 3 operand instruction format
- JTAG based HW debugging
- HW divider

The weird register window mechanism from NIOS (is it called NIOS1 now?)
didn't work well in embedded processing markets.

Göran Bilski
Actually, it works quite well if used correctly. It isn't used correctly
in the implementations I've seen (from Altera and from an OS vendor).
I modified the OS to change the register spill strategy: Rather than
spilling the entire register set, we only spill one register frame.
Restores are done normally. This results in a "run time optimization" of
the top of the register window forprograms. This works very well in
practice because after initialization and task startup, a task's
register window is at the top of the register file. For a 256 register
file that means you get 14 function calls before a register spill occurs.

I'm a little sad that we'll lose the register windows in Nios2.
Performance, etc. will make up for it. ;-)

-Rich
 
On Fri, 21 May 2004 00:09:34 +0100, "Tim"
<tim@rockylogic.com.nooospam.com> wrote:

Austin Lesea wrote:

lowest interrupt latency of any soft processor core (and
even better than most hard processors)

that must be red rag to a bull for john jackson and the other
transputer folk.
Tee hee. Interrupt latency is a joke number. I wrote a
piece about twelve years ago for one of the embedded-system
comics, pointing out how insignificant is the processor's
own interrupt latency - there are many things that are
orders of magnitude more important to interrupt performance.
Here as in many other things, the transputer was on the
right track. Sadly, limitations of design culture and
available technology doomed it to commercial failure.

Just for the record, here's Bromley's First and Second Law
of commercial failure in a technological product:

First Law:
Probability of commercial failure is increased if the
product meets any of the following criteria:
1) It employs concepts and techniques that will become
popular more than a decade later.
2) Its design is based on technically, logically or
mathematically sound principles.
3) Its creators are British.

Second Law:
The probability of commercial failure is unity if two
or more of the above criteria are met.

and why are there so many transputer people in fpgaland?
Perhaps because they know a good thing when they see one?

Getting more and more cynical as time rolls by...
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
jseely@altera.com (Joel A. Seely) wrote in message news:<9bded7a8.0405200947.28b2d90c@posting.google.com>...
When you start using
the processor in applications that have an RTOS, it's a different
story. Each time you have to do a context switch, unless the RTOS is
really clever, you have to save out the whole set of registers
associated with the task that is getting swapped out and read in the
set of registers for the task that is getting swapped back in.
Yes, but without the windows those would have been swapped out to the
stack allready anyway so you loose nothing.

Also note how much you gain: For example for a bifurcating recursion
even a single level of register windows saves 50% of the register
spills, regardless of how deep the recursion is. Two levels save 75%.
And so on...
For non-recursive scenarios the numbers are even better. (5 levels
save almost all spills)

BTW: This whole discussion is oT and belongs into comp.arch.

Kolja Sulimma
 
Rick,

You are correct. I just lashed out. I apologize (to the newsgroup).

Now that we are the "gorilla" I need to be 5X more humble. We win with
listening to customers and always placing them first.

I can't say I won't over-react again, but I can say I will try to improve.

Austin

-snip-
I am just curious Austin, do you think this message helped either you or
Xilinx?
 
In article <h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>,
Jonathan Bromley <jonathan.bromley@doulos.com> wrote:

Tee hee. Interrupt latency is a joke number. I wrote a
piece about twelve years ago for one of the embedded-system
comics, pointing out how insignificant is the processor's
own interrupt latency - there are many things that are
orders of magnitude more important to interrupt performance.
Here as in many other things, the transputer was on the
right track. Sadly, limitations of design culture and
available technology doomed it to commercial failure.
I remember doing a bit of due dilligance for a relative who was
looking at a job at a company which was making similar claims (they
were using a shadow-register setup).

I basically did an amdahl's law workup and gave the advice of "this is
why it is bogus", and the observation that, since the company HAD
funding, it might be good for a year but nothing beyond that.

and why are there so many transputer people in fpgaland?

Perhaps because they know a good thing when they see one?
More importantly, if we ever "solve" the tool problem for general
purpose computation on FPGAs, we solve it for Transputers.

--
Nicholas C. Weaver nweaver@cs.berkeley.edu
 
Austin Lesea <austin@xilinx.com> wrote in message news:<c8jeee$cfc3@cliff.xsj.xilinx.com>...
Jesse,

Processors, plural.

I'm still right.

Austin
My sincere apologies. I would drop this, but as its a public forum and
I want the reading public to know the truth. Some further elaboration:

Multiple embedded processorS on an FPGA (plural) have been
technologically feasible, supported, and implemented by customers --
with Nios -- since its inception (I'm sure the same could be said of
other offerings prior to that date, too), and we continue to support
that. That has been extended in the most recent release of our
product. As an example, the user can debug many (we have tested up to
8) processorS (plural) simultaneously via a single JTAG connection and
a nice IDE environment.

That's the real beauty of an FPGA, as we all know... you have logic
you can put to any use, including the same use several times over to
do interesting things.

And if for some reason a "soft" processor does not equal a "hard" one,
well, I suppose that is a matter of debate. They both take compiled C
code and do useful tasks, so I think they're both proessorS.

Regards,

Jesse Kempa
Altera Corp.
jkempa at altera dot com
 
You're really sad ? Take a look at the terribly broken setjmp/longjmp
implementation for Nios I. Register windows work ok if you
never switch stacks (say for threads or to have a separate exception
stack). A correct implementation of context switching requires that
you spill all the register windows on the task being switched out and
restore to the previous depth the windows on the task being switched in.
setjmp/longjmp together should behave as a context switch.

If your interrupt processing model is -- all processing related to an
interrupt happens in the interrupt service routine you might be happy
with register windows (unless you are unfortunate enough to have the
exception occur when the windows are full). On the other hand, if your
modle is do only the things that must be done in the service routine,
then enable a thread to do the rest, then you probably aren't too happy.

I'm quite pleased that they dumped this feature and took the lean approach.

Geoffrey



Richard Pennington wrote:
Goran Bilski wrote:

It seems that Altera has created a MicroBlaze as well.
They have finally realized that a FPGA based soft processor should have
- 32 bit ISA
- 32 registers
- 3 operand instruction format
- JTAG based HW debugging
- HW divider

The weird register window mechanism from NIOS (is it called NIOS1
now?) didn't work well in embedded processing markets.

Göran Bilski


Actually, it works quite well if used correctly. It isn't used correctly
in the implementations I've seen (from Altera and from an OS vendor).
I modified the OS to change the register spill strategy: Rather than
spilling the entire register set, we only spill one register frame.
Restores are done normally. This results in a "run time optimization" of
the top of the register window forprograms. This works very well in
practice because after initialization and task startup, a task's
register window is at the top of the register file. For a 256 register
file that means you get 14 function calls before a register spill occurs.

I'm a little sad that we'll lose the register windows in Nios2.
Performance, etc. will make up for it. ;-)

-Rich
 
Geoffrey Brown <geobrown@cs.indiana.edu> writes:
You're really sad ? Take a look at the terribly broken setjmp/longjmp
implementation for Nios I. Register windows work ok if you
never switch stacks (say for threads or to have a separate exception
stack). A correct implementation of context switching requires that
you spill all the register windows on the task being switched out and
restore to the previous depth the windows on the task being switched in.
setjmp/longjmp together should behave as a context switch.
The officially defined semantics of setjmp and longjmp do not require
that they be usable for switching stacks; they only are defined to
unwind a stack.

I ran into exactly this problem when I ported the Telebit Netblazer
operating system to the AMD 29000 back in 1991. The 29000 typically
uses register windows, although it can also use the entire set of 128
local registers as "normal" non-windowed registers. I had to rewrite
the setjmp and longjmp implementation exactly as you describe.

However, I wouldn't claim that this is because the setjmp/longjmp
implmenetation was broken. It was behaving exactly as specified.
Rather, the problem is with using setjmp/longjmp for something
other than unwinding the stack.

I thing a case could be made that the next revision of the C standard
should have new library functions for context switching.
 
"Tim" <tim@rockylogic.com.nooospam.com> wrote in message news:<c8jdui$9rg$1$8300dec7@news.demon.co.uk>...
Austin Lesea wrote:

lowest interrupt latency of any soft processor core (and
even better than most hard processors)
Oh, I am asked to say something:)

Ok I have no idea whose interupt latency is shortest. Probably the cpu
that has the fastest clock rate or the one thats specially designed
for int response handling.

I suspect that the several ASIC MT cpus that have recently come along
for the wireless set could well have the best int response esp 1 that
runs 8 threads at 250MHz (or was it 400MHz) because the threads run
all the time every 8th cycle. ANd these cpus don't have context to
swap since they have N contexts in ram.

Technically Transputers don't have interrupts, thats too low a level
of looking at them, but they do service events with an incredibly
quick response for a variety of reasons but that was at 25MHz and
15yrs ago.

Now the R3 cpu also being an multithreaded (MT) cpu (and also now
running baby code BTW in C model) could designate 1 of its 16 threads
to poll some HW and take the event home. That would mean about 20-50
cycles of computation might pass before Pn noticed it had to do some
work. If Pn can find away to stay active in the IX engine without
branching (which causes process swap round robin style) then it could
notice an event in <4cycles. I don't think I will add support for
always stay active process. Now when the process thats does service an
interupt does get it's turn, it will have no registers to swap but it
may have to do some cache misses while workset becomes reloaded but
thats transparent to MT. If it pans out at 250MHz in V2Pro it may or
may not have fastest int response. It will however have the most
throughput of any FPGA cpu bordering on 1.3clock Freq from the sim
traces. It loves branches and transfers and swapping, its the nature
of the MT beastie.

that must be red rag to a bull for john jackson and the other
transputer folk.

and why are there so many transputer people in fpgaland?
Well I don't remember anyone else here that identifies themself as
such, most are probably busy elsewhere. And where is Alan C!

Well the answer to that is real simple. Anything FPGAs do today esp
DSP and coms and whatever was once done by Transputers. Look at
Nallatech and a whole load of UK/European companies that were once
Transputer TRAM module houses. Those that survived are all FPGA guys
today and in the top tier of high perf engineering. Whats a good
engineer to do when something runs out of gas, look for the next
obvious replacemment.

Also the FPGA and the Transputer more or less came out at the same
time 84++,
the Transputer peaked along time ago, the FPGA really started peaking
only a few years ago, wasn't really much use till 4K or later
(sorry)..

That also brings me to the other point. Occam runs on both. Not C.
Ofcource Occam had to resurrect itself in C syntax (HandelC) to be
more attractive to the avg EE to be synthesizeable for FPGA. BTW I am
not a fan of HandelC, just mention thats its roots go back to Occam.

I will leave it there

regards

johnjakson_usa_com
 
Jonathan Bromley <jonathan.bromley@doulos.com> wrote in message news:<h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>...
On Fri, 21 May 2004 00:09:34 +0100, "Tim"
tim@rockylogic.com.nooospam.com> wrote:

Austin Lesea wrote:

lowest interrupt latency of any soft processor core (and
even better than most hard processors)

that must be red rag to a bull for john jackson and the other
transputer folk.

Tee hee. Interrupt latency is a joke number. I wrote a
piece about twelve years ago for one of the embedded-system
comics, pointing out how insignificant is the processor's
own interrupt latency - there are many things that are
orders of magnitude more important to interrupt performance.
Here as in many other things, the transputer was on the
right track. Sadly, limitations of design culture and
available technology doomed it to commercial failure.

Just for the record, here's Bromley's First and Second Law
of commercial failure in a technological product:

First Law:
Probability of commercial failure is increased if the
product meets any of the following criteria:
1) It employs concepts and techniques that will become
popular more than a decade later.
2) Its design is based on technically, logically or
mathematically sound principles.
3) Its creators are British.
Perhaps I am doomed to fail on all 3 counts.

Anyway I may be a US citizen before this thing gets polished and can
deny the last rule as everything important has to seem to be invented
or reinvented in the US- (sadly).

Since my math isn't so great maybe I can deny the 2nd rule too:).

And 20yrs have passed since I left and the Transputer shipped so I can
beat that one too perhaps.

Second Law:
The probability of commercial failure is unity if two
or more of the above criteria are met.

and why are there so many transputer people in fpgaland?

Perhaps because they know a good thing when they see one?
yep

Getting more and more cynical as time rolls by...
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573 Web: http://www.doulos.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 

Welcome to EDABoard.com

Sponsor

Back
Top