R
rickman
Guest
I have been working with stack based MISC designs in FPGAs for some
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.
So far my CPUs have always ranked reasonably well in terms of speed, but
more importantly to me, very well in terms of size and code density. My
efforts have shown it hard to improve on code density by a significant
degree while simultaneously minimizing the resources used by the design.
Careful selection of the instruction set can both improve code density
and minimize logic used if measured together, but there is always a
tradeoff. One can always be improved at the expense of the other.
The last couple of days I was looking at some code I plan to use and
realized that it could be a lot more efficient if I could find a way to
use more parallelism inside the CPU and use fewer instructions. So I
started looking at defining separate opcodes for the two primary
function units in the design, the data stack and the return stack. Each
has its own ALU. The data stack has a full complement of capabilities
while the return stack can only add, subtract and compare. The return
stack is actually intended to be an "address" processing unit.
While trying to figure out how to maximize the parallel capabilities of
these units, I realized that many operations were just stack
manipulations. Then I read the thread about the relative "cost" of
stack ops vs memory accesses and I realized these were what I needed to
optimize. I needed to find a way to not use an instruction and a clock
cycle for moving data around on the stack.
In the thread on stack ops it was pointed out repeatedly that very often
the stack operands would be optimized to register operands, meaning they
wouldn't need to do the stack ops at all really. So I took a look at a
register based MISC design. Guess what, I don't see the disadvantage!
I have pushed this around for a couple of days and although I haven't
done a detailed design, I think I have looked at it enough to realize
that I can design a register oriented MISC CPU that will run as fast, if
not faster than my stack based design and it will use fewer
instructions. I still need to add some features like support for a
stack in memory, in other words, pre-increment/post-decrement (or the
other way around...), but I don't see where this is a bad design. It
may end up using *less* logic as well. My stack design provides access
to the stack pointers which require logic for both the pointers and
muxing them into the data stack for reading.
I guess looking at other peoples designs (such as Chuck's) has changed
my perspective over the years so that I am willing and able to do
optimizations in ways I would not have wanted to do in the past. But I
am a bit surprised that there has been so much emphasis on stack
oriented MISC machines which it may well be that register based MISC
designs are also very efficient, at least if you aren't building them to
service a C compiler or trying to match some ideal RISC model.
--
Rick
years. All along I have been comparing my work to the work of others.
These others were the conventional RISC type processors supplied by the
FPGA vendors as well as the many processor designs done by individuals
or groups as open source.
So far my CPUs have always ranked reasonably well in terms of speed, but
more importantly to me, very well in terms of size and code density. My
efforts have shown it hard to improve on code density by a significant
degree while simultaneously minimizing the resources used by the design.
Careful selection of the instruction set can both improve code density
and minimize logic used if measured together, but there is always a
tradeoff. One can always be improved at the expense of the other.
The last couple of days I was looking at some code I plan to use and
realized that it could be a lot more efficient if I could find a way to
use more parallelism inside the CPU and use fewer instructions. So I
started looking at defining separate opcodes for the two primary
function units in the design, the data stack and the return stack. Each
has its own ALU. The data stack has a full complement of capabilities
while the return stack can only add, subtract and compare. The return
stack is actually intended to be an "address" processing unit.
While trying to figure out how to maximize the parallel capabilities of
these units, I realized that many operations were just stack
manipulations. Then I read the thread about the relative "cost" of
stack ops vs memory accesses and I realized these were what I needed to
optimize. I needed to find a way to not use an instruction and a clock
cycle for moving data around on the stack.
In the thread on stack ops it was pointed out repeatedly that very often
the stack operands would be optimized to register operands, meaning they
wouldn't need to do the stack ops at all really. So I took a look at a
register based MISC design. Guess what, I don't see the disadvantage!
I have pushed this around for a couple of days and although I haven't
done a detailed design, I think I have looked at it enough to realize
that I can design a register oriented MISC CPU that will run as fast, if
not faster than my stack based design and it will use fewer
instructions. I still need to add some features like support for a
stack in memory, in other words, pre-increment/post-decrement (or the
other way around...), but I don't see where this is a bad design. It
may end up using *less* logic as well. My stack design provides access
to the stack pointers which require logic for both the pointers and
muxing them into the data stack for reading.
I guess looking at other peoples designs (such as Chuck's) has changed
my perspective over the years so that I am willing and able to do
optimizations in ways I would not have wanted to do in the past. But I
am a bit surprised that there has been so much emphasis on stack
oriented MISC machines which it may well be that register based MISC
designs are also very efficient, at least if you aren't building them to
service a C compiler or trying to match some ideal RISC model.
--
Rick