Know any good public FPGA projects to contribute to?

David Brown · Sep 5, 2014

On 04/09/14 22:50, jim.brakefield@ieee.org wrote:

On Thursday, September 4, 2014 1:20:28 PM UTC-5, rickman wrote:

I find it interesting that you refer to issues in using C while you
have no intent to work toward having your CPU supported by a C
compiler. Is anything to do with C important then?

Tend to consider C as typical of using a single memory stack to hold
subroutine frames, parameters, result, previous frame pointer and
return address. However, some compilers probably manage to keep some
of this in registers. Original Fortran used parameter lists and
globally allocated memory for non-recursive subroutines. On a dual
stack machine, parameters can moved to the return stack to create a
frame. My intent is that the ISA support any of these memory usages
and others as well.

There is no requirement to have a stack for C - the standards don't even
mention the word. And there are C implementations for machines that
have little or no stack (perhaps just a short hardware return stack).
But the most common arrangement is a single stack for frames,
parameters, and return addresses, with passed data and local variables
in registers where possible and on the stack when necessary.

The key reason for having a single stack is not processor efficiency,
but for simplicity of memory management. Starting from low memory, you
have program code, then statically allocated data, and then the heap
(for dynamic memory) grows up into free space. The stack starts at the
top of memory and grows downwards, until it hits the heap and the system
crashes.

If you are happy with a segmented memory, then it may be more efficient
to have multiple stacks for different purposes. This is particularly
true if the hardware can access the stacks simultaneously.

There is quite a bit of information available on the net for multiple
stack systems. Many target Forth rather than C, as Forth is a highly
stack-oriented language.

I would point out that even if you reach 200 MHz operation on an
FPGA that will be approximately the same as running at 57 MHz if
your instructions used a single clock cycle, not a tricky goal
usually. In addition that makes many aspects of the machine
simpler.

If each instruction does twice as much as a RISC instruction and if
the dual issue does not increase the LUT count significantly, then
will have a 200MHz RISC equivalent. Without pipelining.

Superscaling (handling multiple instructions simultaneously) is usually
considered much more advanced and complex to implement than pipelining,
which has been common on cpu cores for decades.

Also possible to use LUT RAM for the stacks and increase the
execution rate. For now content to go with the slower design. Yes
RISCs are simple. Keeping all the stacks in memory and all the
pointers in LUT RAM is also simple. For this instruction set,
address ALU is 100% busy and data ALU is 30% busy. With dual issue
one needs a second address ALU and data ALU is 60% busy.

Am aiming for a low LUT count, single block RAM design. My figure of
merit is instructions per second per LUT (with adjustment for word
size). Very easy to add a few features and double the LUT count.
There is an extensive comparison of soft core processors at:
http://opencores.com/project,up_core_list,downloads click on best of
each design link

Jim Brakefield

Sep 5, 2014

On Friday, September 5, 2014 2:20:13 AM UTC-5, David Brown wrote:

On Thursday, September 4, 2014 1:20:28 PM UTC-5, rickman wrote:

I find it interesting that you refer to issues in using C while you
have no intent to work toward having your CPU supported by a C
compiler. Is anything to do with C important then?

Tend to consider C as typical of using a single memory stack to hold
subroutine frames, parameters, result, previous frame pointer and
return address. However, some compilers probably manage to keep some
of this in registers. Original Fortran used parameter lists and
globally allocated memory for non-recursive subroutines. On a dual
stack machine, parameters can moved to the return stack to create a
frame. My intent is that the ISA support any of these memory usages
and others as well.

There is no requirement to have a stack for C - the standards don't even
mention the word. And there are C implementations for machines that
have little or no stack (perhaps just a short hardware return stack).

IMHO: There seems to be a dichotomy between embedded and GP computing. GP programming frowns on using global memory whereas on an embedded chip it is usually very fast. If a subroutine is not recursive nor re-entrant, or even if it has limited recursion (typical for embedded software) one can layout the "stack frame" in global memory, one area for each subroutine (not as memory efficient as a stack).

I would point out that even if you reach 200 MHz operation on an
FPGA that will be approximately the same as running at 57 MHz if
your instructions used a single clock cycle, not a tricky goal
usually. In addition that makes many aspects of the machine
simpler.

If each instruction does twice as much as a RISC instruction and if
the dual issue does not increase the LUT count significantly, then
will have a 200MHz RISC equivalent. Without pipelining.

Superscaling (handling multiple instructions simultaneously) is usually
considered much more advanced and complex to implement than pipelining,
which has been common on cpu cores for decades.

There are some tricks that make superscaling simple in this situation. Remember way back when there were three address computers. This machine takes one memory cycle for each operand and one cycle for the result and fetches the next instruction while doing the data computation. Block RAM is dual ported so two instructions can run side by side, one on each port. They are staggered to avoid a conflict over the data ALU and to let the 2nd instruction use the condition code results from the 1st. There is a version of the instruction encoding that uses either two 16-bit instructions or one 32-bit instruction per 32-bit word. One could have the compiler to make sure the two 16-bit instructions can be properly executed side by side, inserting a NOP if necessary. One reason for four stacks is to make quad issue technically possible, again with the help of the compiler.

Jim Brakefield

pini_kr · Sep 13, 2014

Hi all,

Can you suggest any good FPGA projects I could contribute to? I have som
=
free time and want to work on something challenging and interesting.
Inste=
ad of starting something myself I'm wondering where to find some coo
proje=
cts that exist already that need help.

Thanks!

This project implements the lower layers of a standard TCP/IP stack base
on a free code from University of Queensland: IP stack
My first steps to understand the project, after reading the documents are:
http://bknpk.ddns.net/my_web/IP_STACK/start_1.html

---------------------------------------
Posted through http://www.FPGARelated.com

Know any good public FPGA projects to contribute to?

David Brown

Guest

Guest

pini_kr

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Know any good public FPGA projects to contribute to?

David Brown

Guest

Guest

pini_kr

Guest

Log in

Welcome to EDABoard.com

Sponsor