MISC - Stack Based vs. Register Based

David Brown · Apr 3, 2013

On 02/04/13 19:20, glen herrmannsfeldt wrote:

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
On 3/31/2013 2:34 PM, glen herrmannsfeldt wrote:

For Itanium, the different units do different things. There are
instruction formats that divide up the bits in different ways to make
optimal use of the bits. I used to have the manual nearby, but I don't
see it right now.

Yes, the array processor I worked on was coded from scratch, very
laboriously. The Itanium is trying to run existing code as fast as
possible. So they have a number of units to do similar things, but also
different types, all working in parallel as much as possible. Also the
parallelism in the array processor was all controlled by the programmer.
In regular x86 processors the parallelism is controlled by the chip
itself. I'm amazed sometimes at just how much they can get the chip to
do, no wonder there are 100's of millions of transistors on the thing.
I assume parallelism in the Itanium is back to the compiler smarts to
control since it needs to be coded into the VLIW instructions.

Seems to me that the big problem with the original Itanium was the
need to also run x86 code. That delayed the release for some time, and
in that time other processors had advanced. I believe that later
versions run x86 code in software emulation, maybe with some hardware
assist.

x86 compatibility was not the "big" problem with the Itanium (though it
didn't help). There were two far bigger problems. One is that the chip
was targeted as maximising throughput with little regard for power
efficiency, since it was for the server market - so all of the logic was
running all of the time to avoid latencies, and it has massive caches
that run as fast as possible. The result here is that the original
devices had a power density exceeding the core of a nuclear reactor (it
was probably someone from AMD who worked that out...).

The big problem, however, is that the idea with VLIW is that the
compiler does all the work scheduling instructions in a way that lets
them run in parallel. This works in some specialised cases - some DSP's
have this sort of architecture, and some types of mathematical
algorithms suit it well. But when Intel started work on the Itanium,
compilers were not up to the task - Intel simply assumed they would work
well enough by the time the chips were ready. Unfortunately for Intel,
compiler technology never made it - and in fact, it will never work
particularly well for general code. There are too many unpredictable
branches and conditionals to predict parallelism at compile time. So
most real-world Itanium code uses only about a quarter or so of the
processing units in the cpu at any one time (though some types of code
can work far better). Thus Itanium chips run at half the real-world
speed of "normal" processors, while burning through at least twice the
power.

Rod Pemberton · Apr 4, 2013

"rickman" <gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...

Weren't you the person who brought CISC into this discussion?

Yes.

Why are you asking this question about CISC?

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

RISC was about effectively using all the processor clock cycles by
using fast instructions. RISC wasn't concerned about the encoded
size of instructions, how much memory a program consumed, the
cost of memory, or how fast memory needed to be.

CISC was about reducing memory consumed per instruction. CISC
reduced the average size of encoded instructions while also
increasing the amount of work each instruction performs. CISC was
typically little-endian to reduce the space needed for integer
encodings. However, increasing the amount of work per instruction
produces highly specialized instructions that are the
characteristic of CISC. You only need to look at the x86
instruction set to find some, e.g., STOS, LODS, XLAT, etc. They
are also slow to decode and execute as compared to RISC.

So, if memory is cheap and fast, there is no point in improving
code density, i.e., use RISC. If memory is expensive or slow, use
CISC.

Arlet mentioned changes to a processor that appeared to me to have
nothing to do with increasing or decreasing code density. AISI,
the changes he mentioned would only affect what was in current set
of MISC instructions, i.e., either a set of register-based MISC
instructions or a set of stack-based MISC instructions. This was
stated previously.

Rod Pemberton

Rod Pemberton · Apr 4, 2013

"rickman" <gnuarm@gmail.com> wrote in message
news:kjeve8$tvm$1@dont-email.me...

On 3/30/2013 5:54 PM, Rod Pemberton wrote:
....

Even if you implement CISC-like instructions, you can't
forgo the MISC instructions you already have in order to
add the CISC-like instructions.

Really? I can't drop the entire instruction set and start over?
Who says so? Am I breaking a law?

It's just a logical outcome, AISI. The design criteria that you
stated was that of producing MISC processors. MISC seems to be
purely about the minimizing the quantity of instructions. You've
produced a MISC processor. So, if you now change your mind about
MISC and add additional instructions to your processor's
instruction set, especially non-MISC instructions, you're
effectively going against your own stated design requirement: MISC
or reducing the quantity of instructions. So, it'd be you who
says so, or not... It just seems contradictory with your past
self to change course now with your current self.

So, to do that, you'll need to increase
the size of the instruction set, as well as implement
a more complicated instruction decoder.

Define "increase the size of the instruction set".

You'll have more instructions in your instruction set.

I am using a 9 bit opcode for my stack design and am
using a similar 9 bit opcode for the register design. In what
way is the register design using a larger instruction set?

You haven't added any additional CISC-like instructions, yet. You
just exchanged stack operations for register operations. So,
none for now.

I.e., that means the processor will no longer be MISC,
but MISC+minimal CISC hybrid, or pure
CISC...

Nonsense. "Minimal Instruction Set Computer (MISC) is a
processor architecture with a very small number of basic
operations and corresponding opcodes." from Wikipedia.
BTW, I don't think the term MISC is widely used and is not
well defined. This is the only web page I found that even
tries to define it.

Actually, if you consider only the opcodes and not the operand
combinations, I think the register design may have fewer
instructions than does the stack design. But the register
design still is in work so I'm not done counting yet.

How exactly does fewer instructions contribute to increased code
density for the remaining instructions? The eliminated
instructions are no longer a measured component of code density.
I.e., they no longer consume memory and therefore aren't measured.

There are some interesting corners to be explored. For example
a MOV rx,rx is essentially a NOP. There are eight of these. So
instead, why not make them useful by clearing the register?
So MOV rx,rx is a clear to be given the name CLR rx... unless
the rx is r7 in which case it is indeed a MOV r7,r7 which is now
a NOP to be coded as such. The CLR r7 is not needed because
LIT 0 already does the same job. Even better the opcode for a
NOP is 0x1FF or octal 777. That's very easy to remember
and recognize. It feels good to find convenient features like
this and makes me like the register MISC design.

I can see that you're attempting to minimize the quantity of
implemented instructions. Although similar in nature, that's not
the same as improving the code density. Are you conflating two
different concepts? One of them reduces the encoded size of an
instruction, while the other eliminates instructions ...

How are you going to attempt to increase the code density for your
processor?

1) adding new, additional, more powerful instructions that you
don't already have
2) merging existing instruction into fewer instructions
3) finding a more compact method of instruction encoding
4) use little-endian to reduce encoded sizes of integers
5) none or other

I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or
"other" and #3 and #4 ...

Rod Pemberton

rickman · Apr 4, 2013

On 4/3/2013 8:34 PM, Rod Pemberton wrote:

"rickman"<gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...

Weren't you the person who brought CISC into this discussion?

Yes.

Why are you asking this question about CISC?

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

Ok, so that's how you see it.

RISC was about effectively using all the processor clock cycles by
using fast instructions. RISC wasn't concerned about the encoded
size of instructions, how much memory a program consumed, the
cost of memory, or how fast memory needed to be.

Don't know why you are even mentioning RISC.

CISC was about reducing memory consumed per instruction. CISC
reduced the average size of encoded instructions while also
increasing the amount of work each instruction performs. CISC was
typically little-endian to reduce the space needed for integer
encodings. However, increasing the amount of work per instruction
produces highly specialized instructions that are the
characteristic of CISC. You only need to look at the x86
instruction set to find some, e.g., STOS, LODS, XLAT, etc. They
are also slow to decode and execute as compared to RISC.

I think CISC was not solely about reducing memory used per instruction.
CISC was not an area of work. CISC was not even coined until long
after many CISC machines were designed. Most CISC computers were
designed with very different goals in mind. For example, the x86, a
CISC processor, was initially designed to extend the x86 instruction set
to a 16 bit processor and then to a 32 bit processor. The goal was just
to develop an instruction set that was backwards compatible with
existing processors while adding capabilities that would make 32 bit
processors marketable.

So, if memory is cheap and fast, there is no point in improving
code density, i.e., use RISC. If memory is expensive or slow, use
CISC.

LOL, that is a pretty MINIMAL analysis of computers, so I guess it is
MACA, Minimal Analysis of Computer Architectures.

Arlet mentioned changes to a processor that appeared to me to have
nothing to do with increasing or decreasing code density. AISI,
the changes he mentioned would only affect what was in current set
of MISC instructions, i.e., either a set of register-based MISC
instructions or a set of stack-based MISC instructions. This was
stated previously.

This is so far out of context I can't comment.

--

Rick

glen herrmannsfeldt · Apr 4, 2013

In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote:

"rickman" <gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...
Weren't you the person who brought CISC into this discussion?

Yes.

Why are you asking this question about CISC?

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

That brought instructions like S/360 EDMK and VAX POLY.
(Stories are that on most VAX models, POLY is slower than an
explicit loop.) Now, there is no need to waste bits, and so
instruction formats were defined to use the available bits.

S/360 (and successors) only have three different instruction lengths,
and even then sometimes waste bits.

The VAX huge number of different instructions lengths, and also IA32,
does seem to be for code size efficiency. VAX was also defined with
a 512 byte page size, even after S/370 had 2K and 4K pages.
Way too small, but maybe seemed right at the time.

RISC was about effectively using all the processor clock cycles by
using fast instructions. RISC wasn't concerned about the encoded
size of instructions, how much memory a program consumed, the
cost of memory, or how fast memory needed to be.

Yes, but that doesn't mean that CISC is concerned with the size
of instructions.

CISC was about reducing memory consumed per instruction. CISC
reduced the average size of encoded instructions while also
increasing the amount of work each instruction performs.

Even if it is true (and it probably is) that CISC tends to make
efficient use of the bits, that doesn't prove that is what CISC
was about.

As above, CISC was about making coding easier for programmers,
specifically assembly programmers. Now, complex instructions take
less space than a series of simpler instructions, but then again
one could use a subroutine call.

The PDP-10 has the indirect bit, allowing for nested indirection,
which may or may not make efficient use of that bit. S/360 uses
a sequence of L (load) instructions to do indirection.

Instruction usage statistics noting how often L (load) was
executed in S/360 code may have been the beginning of RISC.

CISC was typically little-endian to reduce the space needed
for integer encodings.

This I don't understand at all. They take the same amount
of space. Little endian does make it slightly easier to
do a multiword (usually byte) add, and that may have helped
for the 6502. It allows one to propagate the carry in the
same order one reads bytes from memory.

But once you add multiply and divide, the advantage is pretty
small.

However, increasing the amount of work per instruction
produces highly specialized instructions that are the
characteristic of CISC. You only need to look at the x86
instruction set to find some, e.g., STOS, LODS, XLAT, etc. They
are also slow to decode and execute as compared to RISC.

Those are not very CISCy compared with some S/360 or VAX
instructions. Now, compare to S/360 TR which will translate
by looking up bytes in a lookup table for 1 to 256 byte
long strings. (Unless I remember wrong, XLAT does one byte.)

So, if memory is cheap and fast, there is no point in improving
code density, i.e., use RISC. If memory is expensive or slow, use
CISC.

Well, RISC is more toward using simpler instructions that compilers
actually generate and executing them fast. Having one instruction
size helps things go fast, and tends to be less efficient with bits.

Even so, I believe that you will find that RISC designers try to
make efficient use of the bits available, within the single size
instruction constraint.

Arlet mentioned changes to a processor that appeared to me to have
nothing to do with increasing or decreasing code density. AISI,
the changes he mentioned would only affect what was in current set
of MISC instructions, i.e., either a set of register-based MISC
instructions or a set of stack-based MISC instructions. This was
stated previously.

Decoding multiple different instruction formats tends to require
complicated demultiplexers which are especially hard to do in
an FPGA. Even so, one can make efficient use of the bits
and still be MISC.

-- glen

rickman · Apr 4, 2013

On 4/3/2013 8:35 PM, Rod Pemberton wrote:

"rickman"<gnuarm@gmail.com> wrote in message
news:kjeve8$tvm$1@dont-email.me...
On 3/30/2013 5:54 PM, Rod Pemberton wrote:
...

Even if you implement CISC-like instructions, you can't
forgo the MISC instructions you already have in order to
add the CISC-like instructions.

Really? I can't drop the entire instruction set and start over?
Who says so? Am I breaking a law?

It's just a logical outcome, AISI. The design criteria that you
stated was that of producing MISC processors. MISC seems to be
purely about the minimizing the quantity of instructions. You've
produced a MISC processor. So, if you now change your mind about
MISC and add additional instructions to your processor's
instruction set, especially non-MISC instructions, you're
effectively going against your own stated design requirement: MISC
or reducing the quantity of instructions. So, it'd be you who
says so, or not... It just seems contradictory with your past
self to change course now with your current self.

Your logic seems to be flawed on so many levels. I don't think I stated
that producing a MISC processor was a "design criteria". It doesn't
even make sense to have that as a "design criteria".

I never said I was "adding" instructions to some existing instruction
set. In fact, I think I've said that the instruction set for the
register based MISC processor is so far, *smaller* than the instruction
set for the stack based MISC processor as long as you don't consider
each combination of X and Y in MOV rx,ry to be a separate instruction.
If you feel each combination is a separate instruction then they both
have approximately the same number of instructions since they both have
9 bit instructions and so have 512 possible instructions.

So, to do that, you'll need to increase
the size of the instruction set, as well as implement
a more complicated instruction decoder.

Define "increase the size of the instruction set".

You'll have more instructions in your instruction set.

Sorry, that isn't a good definition because you used part of the term
you are defining in the definition.

I am using a 9 bit opcode for my stack design and am
using a similar 9 bit opcode for the register design. In what
way is the register design using a larger instruction set?

You haven't added any additional CISC-like instructions, yet. You
just exchanged stack operations for register operations. So,
none for now.

Ok, now we are getting somewhere. In fact, if you read my other posts,
you will find that I *won't* be adding any CISC instructions because one
of my stated "design criteria" is that each instruction executes in one
clock cycle. It's pretty hard to design a simple machine that can do
"complex" instructions without executing them in multiple clock cycles.

I.e., that means the processor will no longer be MISC,
but MISC+minimal CISC hybrid, or pure
CISC...

Nonsense. "Minimal Instruction Set Computer (MISC) is a
processor architecture with a very small number of basic
operations and corresponding opcodes." from Wikipedia.
BTW, I don't think the term MISC is widely used and is not
well defined. This is the only web page I found that even
tries to define it.

Actually, if you consider only the opcodes and not the operand
combinations, I think the register design may have fewer
instructions than does the stack design. But the register
design still is in work so I'm not done counting yet.

How exactly does fewer instructions contribute to increased code
density for the remaining instructions? The eliminated
instructions are no longer a measured component of code density.
I.e., they no longer consume memory and therefore aren't measured.

Not sure what you mean here. Code density how many instructions it
takes to do a given amount of work. I measure this by writing code and
counting the instructions it takes. Right now I have a section of code
I am working on that performs the DDS calculations from a set of control
inputs to the DDS. This is what I was working on when I realized that a
register based design likely could do this without the stack ops, OVER
mainly, but also nearly all the others that just work on the top two
stack items.

So far it appears the register based instructions are significantly more
compact than the stack based instructions. Just as important, the
implementation appears to be simpler for the register based design. But
that is just *so far*. I am still working on this. The devil is in the
details and I may find some aspects of what I am doing that cause
problems and can't be done in the instruction formats I am planning or
something blows up the hardware to be much bigger than I am picturing at
the moment.

There are some interesting corners to be explored. For example
a MOV rx,rx is essentially a NOP. There are eight of these. So
instead, why not make them useful by clearing the register?
So MOV rx,rx is a clear to be given the name CLR rx... unless
the rx is r7 in which case it is indeed a MOV r7,r7 which is now
a NOP to be coded as such. The CLR r7 is not needed because
LIT 0 already does the same job. Even better the opcode for a
NOP is 0x1FF or octal 777. That's very easy to remember
and recognize. It feels good to find convenient features like
this and makes me like the register MISC design.

I can see that you're attempting to minimize the quantity of
implemented instructions. Although similar in nature, that's not
the same as improving the code density. Are you conflating two
different concepts? One of them reduces the encoded size of an
instruction, while the other eliminates instructions ...

You really don't seem to understand what I am doing. You continually
misinterpret what I explain.

How are you going to attempt to increase the code density for your
processor?

1) adding new, additional, more powerful instructions that you
don't already have
2) merging existing instruction into fewer instructions
3) finding a more compact method of instruction encoding
4) use little-endian to reduce encoded sizes of integers
5) none or other

I'd think it should be #1 and #3 and #4, or #2 and #3 and #4, or
"other" and #3 and #4 ...

Uh, I am designing an instruction set that does as much as possible with
as little hardware as possible. When you say, "new, additional"
instructions, compared to what? When you say "more compact", again,
compared to what exactly?

When you say "litte-endian" to reduce encoded integer size, what exactly
is that? Are you referring to specifying an integer in small chunks so
that sign extension allows the specification to be limited in length?
Yes, that is done on both the stack and register based designs.
Koopmans paper lists literals and calls as some of the most frequently
used instructions, so optimizing literals optimizes the most frequently
used instructions.

--

Rick

Syd Rumpo · Apr 4, 2013

On 29/03/2013 21:00, rickman wrote:

I have been working with stack based MISC designs in FPGAs for some
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.

<snip>

Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
response time.

Cheers
--
Syd

Arlet Ottens · Apr 4, 2013

On 04/04/2013 10:38 AM, Syd Rumpo wrote:

On 29/03/2013 21:00, rickman wrote:
I have been working with stack based MISC designs in FPGAs for some
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.

snip

Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
response time.

It depends on the implementation.

The easiest thing would be to not save anything at all before jumping to
the interrupt handler. This would make the interrupt response really
fast, but you'd have to save the registers manually before using them.
It would benefit systems that don't need many (or any) registers in the
interrupt handler. And even saving 4 registers at 100 MHz only takes an
additional 40 ns.

If you have parallel access to the stack/program memory, you could like
the Cortex, and save a few (e.g. 4) registers on the stack, while you
fetch the interrupt vector, and refill the execution pipeline at the
same time. This adds a considerable bit of complexity, though.

If you keep the register file in a large memory, like a internal block
RAM, you can easily implement multiple sets of shadow registers.

Of course, an FPGA comes with flexible hardware such as large FIFOs, so
you can generally avoid the need for super fast interrupt response. In
fact, you may not even need interrupts at all.

Albert van der Horst · Apr 4, 2013

In article <kjin8q$so5$1@speranza.aioe.org>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

In comp.arch.fpga Rod Pemberton <do_not_have@notemailnotq.cpm> wrote:
"rickman" <gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...
Weren't you the person who brought CISC into this discussion?

Yes.

Why are you asking this question about CISC?

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

<SNIP>

Decoding multiple different instruction formats tends to require
complicated demultiplexers which are especially hard to do in
an FPGA. Even so, one can make efficient use of the bits
and still be MISC.

-- glen
--

Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Arlet Ottens · Apr 4, 2013

On 04/04/2013 01:16 PM, Albert van der Horst wrote:

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

Speed is another valid reason.

Arlet Ottens · Apr 4, 2013

On 04/04/2013 02:49 PM, glen herrmannsfeldt wrote:

In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote:

(snip)
Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

If you disable interrupts so that another one doesn't come along
before you can save enough state for the first one, yes.

S/360 does it with no stack. You have to have some place in the
low (first 4K) address range to save at least one register.
The hardware saves the old PSW at a fixed (for each type of
interrupt) address, which you also have to move somewhere else
before enabling more interrupts of the same type.

ARM7 is similar. PC and PSW are copied to registers, and further
interrupts are disabled. The hardware does not touch the stack. If you
want to make nested interrupts, the programmer is responsible for saving
these registers.

ARM Cortex has changed that, and it saves registers on the stack. This
allows interrupt handlers to be written as regular higher language
functions, and also allows easy nested interrupts.

When dealing with back-to-back interrupts, the Cortex takes a shortcut,
and does not pop/push the registers, but just leaves them on the stack.

glen herrmannsfeldt · Apr 4, 2013

In comp.arch.fpga Arlet Ottens <usenet+5@c-scape.nl> wrote:

On 04/04/2013 01:16 PM, Albert van der Horst wrote:

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

Speed is another valid reason.

Presumably some combination of ease of coding, speed, and also Brooks'
"Second System Effect".

Paraphrasing from "Mythical Man Month" since I haven't read it recently,
the ideas that designers couldn't implement in their first system that
they designed, for cost/efficiency/whatever reasons, come out in the
second system.

Brooks wrote that more for OS/360 (software) than for S/360 (hardware),
but it might still have some effect on the hardware, and maybe also
for VAX.

There are a number of VAX instructions that seem like a good idea, but
as I understand it ended up slower than if done without the special
instructions.

As examples, both the VAX POLY and INDEX instruction. When VAX was new,
compiled languages (Fortran for example) pretty much never did array
bounds testing. It was just too slow. So VAX supplied INDEX, which in
one instruction did the multiply/add needed for a subscript calcualtion
(you do one INDEX for each subscript) and also checked that the
subscript was in range. Nice idea, but it seems that even with INDEX
it was still too slow.

Then POLY evaluates a whole polynomial, such as is used to approximate
many mathematical functions, but again, as I understand it, too slow.

Both the PDP-10 and S/360 have the option for an index register on
many instructions, where when register 0 is selected no indexing is
done. VAX instead has indexed as a separate address mode selected by
the address mode byte. Is that the most efficient use for those bits?

-- glen

glen herrmannsfeldt · Apr 4, 2013

In comp.arch.fpga Syd Rumpo <usenet@nononono.co.uk> wrote:

(snip)

Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

If you disable interrupts so that another one doesn't come along
before you can save enough state for the first one, yes.

S/360 does it with no stack. You have to have some place in the
low (first 4K) address range to save at least one register.
The hardware saves the old PSW at a fixed (for each type of
interrupt) address, which you also have to move somewhere else
before enabling more interrupts of the same type.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
response time.

-- glen

Elizabeth D. Rather · Apr 4, 2013

On 4/3/13 11:31 PM, Arlet Ottens wrote:

On 04/04/2013 10:38 AM, Syd Rumpo wrote:
On 29/03/2013 21:00, rickman wrote:
I have been working with stack based MISC designs in FPGAs for some
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.

snip

Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
response time.

It depends on the implementation.

The easiest thing would be to not save anything at all before jumping to
the interrupt handler. This would make the interrupt response really
fast, but you'd have to save the registers manually before using them.
It would benefit systems that don't need many (or any) registers in the
interrupt handler. And even saving 4 registers at 100 MHz only takes an
additional 40 ns.

The best interrupt implementation just jumps to the handler code. The
implementation knows what registers it has to save and restore, which
may be only one or two. Saving and restoring large register files takes
cycles!

If you have parallel access to the stack/program memory, you could like
the Cortex, and save a few (e.g. 4) registers on the stack, while you
fetch the interrupt vector, and refill the execution pipeline at the
same time. This adds a considerable bit of complexity, though.

If you keep the register file in a large memory, like a internal block
RAM, you can easily implement multiple sets of shadow registers.

Of course, an FPGA comes with flexible hardware such as large FIFOs, so
you can generally avoid the need for super fast interrupt response. In
fact, you may not even need interrupts at all.

Interrupts are good. I don't know why people worry about them so!

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

rickman · Apr 4, 2013

On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote:

In comp.arch.fpga Arlet Ottens<usenet+5@c-scape.nl> wrote:
On 04/04/2013 01:16 PM, Albert van der Horst wrote:

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

Albert, do you have a reference about this?

Speed is another valid reason.

Presumably some combination of ease of coding, speed, and also Brooks'
"Second System Effect".

Paraphrasing from "Mythical Man Month" since I haven't read it recently,
the ideas that designers couldn't implement in their first system that
they designed, for cost/efficiency/whatever reasons, come out in the
second system.

Brooks wrote that more for OS/360 (software) than for S/360 (hardware),
but it might still have some effect on the hardware, and maybe also
for VAX.

There are a number of VAX instructions that seem like a good idea, but
as I understand it ended up slower than if done without the special
instructions.

As examples, both the VAX POLY and INDEX instruction. When VAX was new,
compiled languages (Fortran for example) pretty much never did array
bounds testing. It was just too slow. So VAX supplied INDEX, which in
one instruction did the multiply/add needed for a subscript calcualtion
(you do one INDEX for each subscript) and also checked that the
subscript was in range. Nice idea, but it seems that even with INDEX
it was still too slow.

Then POLY evaluates a whole polynomial, such as is used to approximate
many mathematical functions, but again, as I understand it, too slow.

Both the PDP-10 and S/360 have the option for an index register on
many instructions, where when register 0 is selected no indexing is
done. VAX instead has indexed as a separate address mode selected by
the address mode byte. Is that the most efficient use for those bits?

I think you have just described the CISC instruction development
concept. Build a new machine, add some new instructions. No big
rational, no "CISC" concept, just "let's make it better, why not add
some instructions?"

I believe if you check you will find the term CISC was not even coined
until after RISC was invented. So CISC really just means, "what we used
to do".

--

Rick

rickman · Apr 4, 2013

On 4/4/2013 4:38 AM, Syd Rumpo wrote:

On 29/03/2013 21:00, rickman wrote:
I have been working with stack based MISC designs in FPGAs for some
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.

snip

Can you achieve as fast interrupt response times on a register-based
machine as a stack machine? OK, shadow registers buy you one fast
interrupt, but that's sort of a one-level 2D stack.

Even the venerable RTX2000 had an impressive (IIRC) 200ns interrupt
response time.

That's an interesting question. The short answer is yes, but it
requires that I provide circuitry to do two things, one is to save both
the Processor Status Word (PSW) and the return address to the stack in
one cycle. The stack computer has two stacks and I can save these two
items in one clock cycle. Currently my register machine uses a stack in
memory pointed to by a register, so it would require *two* cycles to
save two words. But the memory is dual ported and I can use a tiny bit
of extra logic to save both words at once and bump the pointer by two.

The other task is to save registers. The stack design doesn't really
need to do that, the stack is available for new work and the interrupt
routine just needs to finish with the stack in the same state as when it
started. I've been thinking about how to handle this in the register
machine.

The registers are really two registers and one bank of registers. R6
and R7 are "special" in that they have a separate incrementer to support
the addressing modes. They need a separate write port so they can be
updated in parallel with the other registers. I have considered
"saving" the registers by just bumping the start address of the
registers in the RAM, but that only saves R0-R5. I could use LUT RAM
for R6 and R6 as well. This would provide two sets of registers for
R0-R5 and up to 16 sets for R6 and R7. The imbalance isn't very useful,
but at least there would be a set for the main program and a set for
interrupts with the caveat that nothing can be retained between
interrupts. This also means interrupts can't be interrupted other than
at specific points where the registers are not used for storage.

I'm also going to look at using a block RAM for the registers. With
only two read and write ports this makes the multiply step cycle longer
though.

Once that issue is resolved the interrupt response then becomes the same
as the stack machine - 1 clock cycle or 20 ns.

--

Rick

rickman · Apr 4, 2013

On 4/4/2013 5:34 PM, glen herrmannsfeldt wrote:

In comp.arch.fpga rickman<gnuarm@gmail.com> wrote:

(snip, then I wrote)

Then POLY evaluates a whole polynomial, such as is used to approximate
many mathematical functions, but again, as I understand it, too slow.

Both the PDP-10 and S/360 have the option for an index register on
many instructions, where when register 0 is selected no indexing is
done. VAX instead has indexed as a separate address mode selected by
the address mode byte. Is that the most efficient use for those bits?

I think you have just described the CISC instruction development
concept. Build a new machine, add some new instructions. No big
rational, no "CISC" concept, just "let's make it better, why not add
some instructions?"

Yes, but remember that there is competition and each has to have
some reason why someone should by their product. Adding new instructions
was one way to do that.

I believe if you check you will find the term CISC was not even coined
until after RISC was invented. So CISC really just means, "what we used
to do".

Well, yes, but why did "we used to do that"? For S/360, a lot of
software was still written in pure assembler, for one reason to make
it faster, and for another to make it smaller. And people were just
starting to learn that people (writing software) are more expensive
that machines (hardware). Well, that is about the point that it was
true. For earlier machines you were lucky to get one compiler and
enough system to run it.

Sure, none of this stuff was done without some purpose. My point is
that there was no *common* theme to the various CISC instruction sets.
Everybody was doing their own thing until RISC came along with a basic
philosophy. Someone felt the need to give a name to the previous way of
doing things and CISC seemed appropriate. No special meaning in the
name actually, just a contrast to the "Reduced" in RISC.

I don't think this is a very interesting topic really. It started in
response to a comment by Rod.

--

Rick

glen herrmannsfeldt · Apr 4, 2013

In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:

(snip, then I wrote)

Then POLY evaluates a whole polynomial, such as is used to approximate
many mathematical functions, but again, as I understand it, too slow.

Both the PDP-10 and S/360 have the option for an index register on
many instructions, where when register 0 is selected no indexing is
done. VAX instead has indexed as a separate address mode selected by
the address mode byte. Is that the most efficient use for those bits?

I think you have just described the CISC instruction development
concept. Build a new machine, add some new instructions. No big
rational, no "CISC" concept, just "let's make it better, why not add
some instructions?"

Yes, but remember that there is competition and each has to have
some reason why someone should by their product. Adding new instructions
was one way to do that.

I believe if you check you will find the term CISC was not even coined
until after RISC was invented. So CISC really just means, "what we used
to do".

Well, yes, but why did "we used to do that"? For S/360, a lot of
software was still written in pure assembler, for one reason to make
it faster, and for another to make it smaller. And people were just
starting to learn that people (writing software) are more expensive
that machines (hardware). Well, that is about the point that it was
true. For earlier machines you were lucky to get one compiler and
enough system to run it.

And VAX was enough later and even more CISCy.

-- glen

rickman · Apr 5, 2013

On 4/4/2013 7:16 AM, Albert van der Horst wrote:

In article<kjin8q$so5$1@speranza.aioe.org>,
glen herrmannsfeldt<gah@ugcs.caltech.edu> wrote:
In comp.arch.fpga Rod Pemberton<do_not_have@notemailnotq.cpm> wrote:
"rickman"<gnuarm@gmail.com> wrote in message
news:kjf48e$5qu$1@dont-email.me...
Weren't you the person who brought CISC into this discussion?

Yes.

Why are you asking this question about CISC?

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

I'm pretty sure that conclusion is not correct. If you have an
instruction that does two or three memory accesses in one instruction
and you replace it with three instructions that do one memory access
each, you end up with two extra memory accesses. How is this faster?

That is one of the reasons why I want to increase code density, in my
machine it automatically improves execution time as well as reducing the
amount of storage needed.

--

Rick

Alex McDonald · Apr 5, 2013

On Apr 4, 10:04 pm, rickman <gnu...@gmail.com> wrote:

On 4/4/2013 8:44 AM, glen herrmannsfeldt wrote:

In comp.arch.fpga Arlet Ottens<usene...@c-scape.nl> wrote:
On 04/04/2013 01:16 PM, Albert van der Horst wrote:

You mentioned code density. AISI, code density is purely a CISC
concept. They go together and are effectively inseparable.

They do go together, but I am not so sure that they are inseperable.

CISC began when much coding was done in pure assembler, and anything
that made that easier was useful. (One should figure out the relative
costs, but at least it was in the right direction.)

But, of course, this is a fallacy. The same goal is accomplished by
macro's, and better. Code densitity is the only valid reason.

Albert, do you have a reference about this?

Let's take two commonly used S/360 opcodes as an example of CISC; some
move operations. MVC (move 0 to 255 bytes) MVCL (move 0 to 16M bytes).
MVC does no padding or truncation. MVCL can pad and truncate, but
unlike MVC will do nothing and report overflow if the operands
overlap. MVC appears to other processors as a single indivisible
operation; every processor (including IO processors) sees storage as
either before the MVC or after it; it's not interruptible. MVCL is
interruptible, and partial products can be observed by other
processors. MVCL requires 4 registers and their contents are updated
after completion of the operation; MVC requires 1 for variable length
moves, 0 for fixed and its contents are preserved. MVCL has a high
code setup cost; MVC has none.

Writing a macro to do multiple MVCs and mimic the behaviour of MVCL?
Why not? It's possible, if a little tricky. And by all accounts, MVC
in a loop is faster than MVCL too. IBM even provided a macro; $MVCL.

But then, when you look at MVCL usage closely, there are a few
defining characteristics that are very useful. It can zero memory, and
the millicode (IBM's word for microcode) recognizes 4K boundaries for
4K lengths and optimises it; it's faster than 16 MVCs.

There's even a MVPG instruction for moving 4K aligned pages! What are
those crazy instruction set designers thinking?

The answer's a bit more than just code density; it never really was
about that. In all the years I wrote IBM BAL, I never gave code
density a serious thought -- with one exception. That was the 4K base
address limit; a base register could only span 4K, so code that was
bigger than that, you had to have either fancy register footwork or
waste registers for multiple bases.

It was more about giving assembler programmers choice and variety to
get the best out of the box before the advent of optimising compilers;
a way, if you like, of exposing the potential of the micro/millicode
through the instruction set. "Here I want you to zero memory" meant an
MVCL. "Here I am moving 8 bytes from A to B" meant using MVC. A
knowledgeable assembler programmer could out-perform a compiler.
(Nowadays quality compilers do a much better job of instruction
selection than humans, especially for pipelined processors that
stall.)

Hence CISC instruction sets (at least, IMHO and for IBM). They were
there for people and performance, not for code density.

MISC - Stack Based vs. Register Based

David Brown

Guest

Rod Pemberton

Guest

Rod Pemberton

Guest

rickman

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Syd Rumpo

Guest

Arlet Ottens

Guest

Albert van der Horst

Guest

Arlet Ottens

Guest

Arlet Ottens

Guest

glen herrmannsfeldt

Guest

glen herrmannsfeldt

Guest

Elizabeth D. Rather

Guest

rickman

Guest

rickman

Guest

rickman

Guest

glen herrmannsfeldt

Guest

rickman

Guest

Alex McDonald

Guest

Log in

Welcome to EDABoard.com

Sponsor