D
David Brown
Guest
On 04/09/14 22:50, jim.brakefield@ieee.org wrote:
<snip and reformat - Google groups really is a /terrible/ news client>
There is no requirement to have a stack for C - the standards don't even
mention the word. And there are C implementations for machines that
have little or no stack (perhaps just a short hardware return stack).
But the most common arrangement is a single stack for frames,
parameters, and return addresses, with passed data and local variables
in registers where possible and on the stack when necessary.
The key reason for having a single stack is not processor efficiency,
but for simplicity of memory management. Starting from low memory, you
have program code, then statically allocated data, and then the heap
(for dynamic memory) grows up into free space. The stack starts at the
top of memory and grows downwards, until it hits the heap and the system
crashes.
If you are happy with a segmented memory, then it may be more efficient
to have multiple stacks for different purposes. This is particularly
true if the hardware can access the stacks simultaneously.
There is quite a bit of information available on the net for multiple
stack systems. Many target Forth rather than C, as Forth is a highly
stack-oriented language.
Superscaling (handling multiple instructions simultaneously) is usually
considered much more advanced and complex to implement than pipelining,
which has been common on cpu cores for decades.
On Thursday, September 4, 2014 1:20:28 PM UTC-5, rickman wrote:
<snip and reformat - Google groups really is a /terrible/ news client>
I find it interesting that you refer to issues in using C while you
have no intent to work toward having your CPU supported by a C
compiler. Is anything to do with C important then?
Tend to consider C as typical of using a single memory stack to hold
subroutine frames, parameters, result, previous frame pointer and
return address. However, some compilers probably manage to keep some
of this in registers. Original Fortran used parameter lists and
globally allocated memory for non-recursive subroutines. On a dual
stack machine, parameters can moved to the return stack to create a
frame. My intent is that the ISA support any of these memory usages
and others as well.
There is no requirement to have a stack for C - the standards don't even
mention the word. And there are C implementations for machines that
have little or no stack (perhaps just a short hardware return stack).
But the most common arrangement is a single stack for frames,
parameters, and return addresses, with passed data and local variables
in registers where possible and on the stack when necessary.
The key reason for having a single stack is not processor efficiency,
but for simplicity of memory management. Starting from low memory, you
have program code, then statically allocated data, and then the heap
(for dynamic memory) grows up into free space. The stack starts at the
top of memory and grows downwards, until it hits the heap and the system
crashes.
If you are happy with a segmented memory, then it may be more efficient
to have multiple stacks for different purposes. This is particularly
true if the hardware can access the stacks simultaneously.
There is quite a bit of information available on the net for multiple
stack systems. Many target Forth rather than C, as Forth is a highly
stack-oriented language.
I would point out that even if you reach 200 MHz operation on an
FPGA that will be approximately the same as running at 57 MHz if
your instructions used a single clock cycle, not a tricky goal
usually. In addition that makes many aspects of the machine
simpler.
If each instruction does twice as much as a RISC instruction and if
the dual issue does not increase the LUT count significantly, then
will have a 200MHz RISC equivalent. Without pipelining.
Superscaling (handling multiple instructions simultaneously) is usually
considered much more advanced and complex to implement than pipelining,
which has been common on cpu cores for decades.
Also possible to use LUT RAM for the stacks and increase the
execution rate. For now content to go with the slower design. Yes
RISCs are simple. Keeping all the stacks in memory and all the
pointers in LUT RAM is also simple. For this instruction set,
address ALU is 100% busy and data ALU is 30% busy. With dual issue
one needs a second address ALU and data ALU is 60% busy.
Am aiming for a low LUT count, single block RAM design. My figure of
merit is instructions per second per LUT (with adjustment for word
size). Very easy to add a few features and double the LUT count.
There is an extensive comparison of soft core processors at:
http://opencores.com/project,up_core_list,downloads click on best of
each design link
Jim Brakefield