Know any good public FPGA projects to contribute to?

S

signaltap

Guest
Hi all,

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Thanks!
 
Hi,

here, for example, is one.
http://forum.gadgetfactory.net/index.php?/topic/2046-xthundercore-is-taking-shape/

In general, there are many CPUs but a shortage of simple (!) "Hello world
examples to actually use them without spending a week first.

This blog nails it, more or less:
http://blog.tube42.se/?p=105
(that said: I managed to get the "small" variant of the ZPU in questio
working on a Spartan 6, here
http://forum.gadgetfactory.net/index.php?/topic/1863-bare-metal-zpu-hello-world/
It is slow but fairly small, about 12 % on a Spartan 6 LX9)

Another interesting project is "minSoc". It appears to be very wel
maintained.
A simulation worked right out of the box when I tried yesterday - it eve
includes its own iverilog simulator - but I wasn't able to build on Sparta
6 as the JTAG block is not supported.
A minimal openRisc "hello world" example could be useful for many - nothin
but processor, on-chip RAM with initial values for program code and a LED.


---------------------------------------
Posted through http://www.FPGARelated.com
 
wrong link: blog.tube42.se/?p=105


---------------------------------------
Posted through http://www.FPGARelated.com
 
well... as fascinating as this candy business is, I was trying to link to
"Tubologue | The sad state of OSS hardware (part 1)"
but usenet won't let me... Lost in quotation...


---------------------------------------
Posted through http://www.FPGARelated.com
 
On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:
Hi all,

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

Jim Brakefield
 
On Wednesday, September 3, 2014 4:34:11 PM UTC-5, rickman wrote:
On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

Hybrid between stack, accumulator and memory oriented instruction sets.
(1 to 4 stack pointers with offset addressing, frame & thread pointers, single block RAM)
(data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).
 
On Wednesday, September 3, 2014 5:47:47 PM UTC-5, Tom Gardner wrote:
On Wednesday, September 3, 2014 4:34:11 PM UTC-5, rickman wrote:
On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:
Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).

To be a bit belligerent, those are all internal /features/,
not /benefits/ visible to a user of the (black-box) processor.
Certainly they are all more-or-less useless without tool
support (e.g. compiler, debuggers).

Now if you had said that the processor used minimal power,
or had fixed execution times for all instructions (so that
the compiler/IDE could define the execution time of each
block/loop/subroutine), then that might have been of benefit
to the user of the black box.

It does have fixed execution times. It is intended for the hard real-time embedded market.
Power should be minimal: Estimating 300 Spartan6 LUTs + multiplier for 16-bit version.
 
Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
> On 03/09/14 23:14, jim.brakefield@ieee.org wrote:

(snip)
Hybrid between stack, accumulator and memory oriented instruction sets.
(1 to 4 stack pointers with offset addressing, frame & thread
pointers, single block RAM)
(data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a
stack machine or a C machine. Everything except a RISC machine.
All pointer registers (including PC) are in a LUT RAM, stacks
are in the block RAM (at least in a minimal implementation).

To be a bit belligerent, those are all internal /features/,
not /benefits/ visible to a user of the (black-box) processor.
Certainly they are all more-or-less useless without tool
support (e.g. compiler, debuggers).

Well, for a high-level language programmer, I suppose.
But for assembly programmers, those are mostly still visible.

Now, we could all say that it doesn't matter, that Intel won
the world, but I might believe that there is still something
left out there, especially if the goal isn't to get rich.

Also, there might still be some room for new ideas in soft
processors.

Now if you had said that the processor used minimal power,
or had fixed execution times for all instructions (so that
the compiler/IDE could define the execution time of each
block/loop/subroutine), then that might have been of benefit
to the user of the black box.

-- glen
 
On 9/3/2014 4:05 PM, jim.brakefield@ieee.org wrote:
On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:
Hi all,

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

--

Rick
 
I'm not quite understanding. First, I don't really know what an
"accumulator" machine is other than one which has very limited
instructions that don't let you do much other than move stuff through an
accumulator. Is there some advantage to an accumulator CPU?

For me the classic accumulator machine is the CDC 1604.
Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.

So it is a stack machine with stack pointers into on chip memory. You
feel it is useful as a platform for C. Do you have any plans to provide
a C compiler for this?

The stack pointers can be used either as stack pointers or as index registers.
The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.

Would like to have a C compiler. Probably beyond my ability.
Am comfortable with assembler.
Consider the programming model up for grabs. E.g., this can be considered a research machine.

I would find it interesting if you could compare this to the ZPU, a 32
bit soft core designed for C which can be quite small. I believe the
small version fits in around 500 LUT4s. I'm not sure how to compare
LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow
when running C code and possibly any code. It takes a lot of CPU cycles
to do nearly anything. Do you have any timing info on your design?

ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.

Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.
http://opencores.com/project,up_core_list,downloads Click on family comparison link.

As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.
 
On 03/09/14 23:14, jim.brakefield@ieee.org wrote:
On Wednesday, September 3, 2014 4:34:11 PM UTC-5, rickman wrote:

On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

Hybrid between stack, accumulator and memory oriented instruction sets.
(1 to 4 stack pointers with offset addressing, frame & thread pointers, single block RAM)
(data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).

To be a bit belligerent, those are all internal /features/,
not /benefits/ visible to a user of the (black-box) processor.
Certainly they are all more-or-less useless without tool
support (e.g. compiler, debuggers).

Now if you had said that the processor used minimal power,
or had fixed execution times for all instructions (so that
the compiler/IDE could define the execution time of each
block/loop/subroutine), then that might have been of benefit
to the user of the black box.
 
On 04/09/14 00:31, jim.brakefield@ieee.org wrote:
On Wednesday, September 3, 2014 5:47:47 PM UTC-5, Tom Gardner wrote:
On Wednesday, September 3, 2014 4:34:11 PM UTC-5, rickman wrote:
On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:
Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).

To be a bit belligerent, those are all internal /features/,
not /benefits/ visible to a user of the (black-box) processor.
Certainly they are all more-or-less useless without tool
support (e.g. compiler, debuggers).

Now if you had said that the processor used minimal power,
or had fixed execution times for all instructions (so that
the compiler/IDE could define the execution time of each
block/loop/subroutine), then that might have been of benefit
to the user of the black box.

It does have fixed execution times. It is intended for the hard real-time embedded market.
Power should be minimal: Estimating 300 Spartan6 LUTs + multiplier for 16-bit version.

OK, that's a benefit in some situations. I wonder
how it compares to the XMOS processors, which claim the
same advantage and are commercially available at Digikey
http://www.xmos.com/

They have good tool support.
 
On 04/09/14 00:43, glen herrmannsfeldt wrote:
Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
On 03/09/14 23:14, jim.brakefield@ieee.org wrote:

(snip)
Hybrid between stack, accumulator and memory oriented instruction sets.
(1 to 4 stack pointers with offset addressing, frame & thread
pointers, single block RAM)
(data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a
stack machine or a C machine. Everything except a RISC machine.
All pointer registers (including PC) are in a LUT RAM, stacks
are in the block RAM (at least in a minimal implementation).

To be a bit belligerent, those are all internal /features/,
not /benefits/ visible to a user of the (black-box) processor.
Certainly they are all more-or-less useless without tool
support (e.g. compiler, debuggers).

Well, for a high-level language programmer, I suppose.
But for assembly programmers, those are mostly still visible.

True, but it doesn't really invalidate my point.


Now, we could all say that it doesn't matter, that Intel won
the world,

No, they've carved themselves out a large lucrative niche :)


but I might believe that there is still something
left out there,

Very definitely. But then I've worked a few miles from the
origin of other commercial processors. e.g ARM in Cambridge,
and XMOS in Bristol.


> especially if the goal isn't to get rich.

A very valid goal, but please be explicit about that
so that other people can quickly assess its viability.


Also, there might still be some room for new ideas in soft
processors.

Very definitely. Over the past half decade there's been
an explosion of new commercial processor families. Most
will fall by the wayside, but some will succeed.
 
On 9/3/2014 6:14 PM, jim.brakefield@ieee.org wrote:
On Wednesday, September 3, 2014 4:34:11 PM UTC-5, rickman wrote:

On Thursday, July 24, 2014 9:45:53 PM UTC-5, signaltap wrote:

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control logic to be written.

There are a million processor cores out there. What is interesting
about yours?

Hybrid between stack, accumulator and memory oriented instruction sets.
(1 to 4 stack pointers with offset addressing, frame & thread pointers, single block RAM)
(data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).

I'm not quite understanding. First, I don't really know what an
"accumulator" machine is other than one which has very limited
instructions that don't let you do much other than move stuff through an
accumulator. Is there some advantage to an accumulator CPU?

So it is a stack machine with stack pointers into on chip memory. You
feel it is useful as a platform for C. Do you have any plans to provide
a C compiler for this?

I would find it interesting if you could compare this to the ZPU, a 32
bit soft core designed for C which can be quite small. I believe the
small version fits in around 500 LUT4s. I'm not sure how to compare
LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow
when running C code and possibly any code. It takes a lot of CPU cycles
to do nearly anything. Do you have any timing info on your design?

--

Rick
 
On 9/3/2014 10:17 PM, jim.brakefield@ieee.org wrote:
I'm not quite understanding. First, I don't really know what an
"accumulator" machine is other than one which has very limited
instructions that don't let you do much other than move stuff through an
accumulator. Is there some advantage to an accumulator CPU?

For me the classic accumulator machine is the CDC 1604.
Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.

So it is a stack machine with stack pointers into on chip memory. You
feel it is useful as a platform for C. Do you have any plans to provide
a C compiler for this?

The stack pointers can be used either as stack pointers or as index registers.
The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.

Would like to have a C compiler. Probably beyond my ability.
Am comfortable with assembler.
Consider the programming model up for grabs. E.g., this can be considered a research machine.

I would find it interesting if you could compare this to the ZPU, a 32
bit soft core designed for C which can be quite small. I believe the
small version fits in around 500 LUT4s. I'm not sure how to compare
LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow
when running C code and possibly any code. It takes a lot of CPU cycles
to do nearly anything. Do you have any timing info on your design?

ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.

Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.
http://opencores.com/project,up_core_list,downloads Click on family comparison link.

As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.

Ok, so you are shooting for high code density. Have you done any
comparisons with other machines? Saying "each instruction does the work
of several RISC instructions" is just shooting from the hip.

--

Rick
 
On Wednesday, September 3, 2014 10:11:39 PM UTC-5, rickman wrote:
I'm not quite understanding. First, I don't really know what an

"accumulator" machine is other than one which has very limited

instructions that don't let you do much other than move stuff through an

accumulator. Is there some advantage to an accumulator CPU?



For me the classic accumulator machine is the CDC 1604.

Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.



So it is a stack machine with stack pointers into on chip memory. You

feel it is useful as a platform for C. Do you have any plans to provide

a C compiler for this?



The stack pointers can be used either as stack pointers or as index registers.

The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.



Would like to have a C compiler. Probably beyond my ability.

Am comfortable with assembler.

Consider the programming model up for grabs. E.g., this can be considered a research machine.



I would find it interesting if you could compare this to the ZPU, a 32

bit soft core designed for C which can be quite small. I believe the

small version fits in around 500 LUT4s. I'm not sure how to compare

LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow

when running C code and possibly any code. It takes a lot of CPU cycles

to do nearly anything. Do you have any timing info on your design?



ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.



Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.

http://opencores.com/project,up_core_list,downloads Click on family comparison link.

As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.

Ok, so you are shooting for high code density. Have you done any
comparisons with other machines? Saying "each instruction does the work
of several RISC instructions" is just shooting from the hip.

There are several sources of code inefficiency in the standard RISC instruction set:
A) 16-bit immediates and displacements when they are mostly under 8-bits.
B) 15-bits per instruction for register locations (3x5)
C) Load and store instructions in addition to calculation instructions
D) Separate address modification instructions
and worst of all:
E) Subroutine overhead

A thru D: Besides the normal code density advantage of non-RISC and compacted RISC versus standard RISC, the architecture supports instruction byte granularity with the single byte instructions being stack instructions. Don't have any statistics.
E: The standard C model for subroutines has the effect of discouraging short subroutines. This is where stack machines gain a big advantage.
 
On Thursday, September 4, 2014 1:20:28 PM UTC-5, rickman wrote:
On Wednesday, September 3, 2014 10:11:39 PM UTC-5, rickman wrote:







I'm not quite understanding. First, I don't really know what an



"accumulator" machine is other than one which has very limited



instructions that don't let you do much other than move stuff through an



accumulator. Is there some advantage to an accumulator CPU?







For me the classic accumulator machine is the CDC 1604.



Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.







So it is a stack machine with stack pointers into on chip memory. You



feel it is useful as a platform for C. Do you have any plans to provide



a C compiler for this?







The stack pointers can be used either as stack pointers or as index registers.



The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.







Would like to have a C compiler. Probably beyond my ability.



Am comfortable with assembler.



Consider the programming model up for grabs. E.g., this can be considered a research machine.







I would find it interesting if you could compare this to the ZPU, a 32



bit soft core designed for C which can be quite small. I believe the



small version fits in around 500 LUT4s. I'm not sure how to compare



LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow



when running C code and possibly any code. It takes a lot of CPU cycles



to do nearly anything. Do you have any timing info on your design?







ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.







Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.



http://opencores.com/project,up_core_list,downloads Click on family comparison link.



As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.



Ok, so you are shooting for high code density. Have you done any

comparisons with other machines? Saying "each instruction does the work

of several RISC instructions" is just shooting from the hip.



There are several sources of code inefficiency in the standard RISC instruction set:

A) 16-bit immediates and displacements when they are mostly under 8-bits.

B) 15-bits per instruction for register locations (3x5)

C) Load and store instructions in addition to calculation instructions

D) Separate address modification instructions

and worst of all:

E) Subroutine overhead



A thru D: Besides the normal code density advantage of non-RISC and compacted RISC versus standard RISC, the architecture supports instruction byte granularity with the single byte instructions being stack instructions. Don't have any statistics.

E: The standard C model for subroutines has the effect of discouraging short subroutines. This is where stack machines gain a big advantage.

Ok, that is compared to RISC in a subjective manner. How about other
ISA types? MISC? Other CISC designs?



I find it interesting that you refer to issues in using C while you have
no intent to work toward having your CPU supported by a C compiler. Is
anything to do with C important then?

Tend to consider C as typical of using a single memory stack to hold subroutine frames, parameters, result, previous frame pointer and return address. However, some compilers probably manage to keep some of this in registers..
Original Fortran used parameter lists and globally allocated memory for non-recursive subroutines. On a dual stack machine, parameters can moved to the return stack to create a frame. My intent is that the ISA support any of these memory usages and others as well.

I would point out that even if you reach 200 MHz operation on an FPGA
that will be approximately the same as running at 57 MHz if your
instructions used a single clock cycle, not a tricky goal usually. In
addition that makes many aspects of the machine simpler.

If each instruction does twice as much as a RISC instruction and if the dual issue does not increase the LUT count significantly, then will have a 200MHz RISC equivalent. Without pipelining.

Also possible to use LUT RAM for the stacks and increase the execution rate..
For now content to go with the slower design. Yes RISCs are simple. Keeping all the stacks in memory and all the pointers in LUT RAM is also simple. For this instruction set, address ALU is 100% busy and data ALU is 30% busy. With dual issue one needs a second address ALU and data ALU is 60% busy.

Am aiming for a low LUT count, single block RAM design. My figure of merit is instructions per second per LUT (with adjustment for word size). Very easy to add a few features and double the LUT count. There is an extensive comparison of soft core processors at:
http://opencores.com/project,up_core_list,downloads click on best of each design link

Jim Brakefield
 
On 9/4/2014 11:17 AM, jim.brakefield@ieee.org wrote:
On Wednesday, September 3, 2014 10:11:39 PM UTC-5, rickman wrote:



I'm not quite understanding. First, I don't really know what an

"accumulator" machine is other than one which has very limited

instructions that don't let you do much other than move stuff through an

accumulator. Is there some advantage to an accumulator CPU?



For me the classic accumulator machine is the CDC 1604.

Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.



So it is a stack machine with stack pointers into on chip memory. You

feel it is useful as a platform for C. Do you have any plans to provide

a C compiler for this?



The stack pointers can be used either as stack pointers or as index registers.

The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.



Would like to have a C compiler. Probably beyond my ability.

Am comfortable with assembler.

Consider the programming model up for grabs. E.g., this can be considered a research machine.



I would find it interesting if you could compare this to the ZPU, a 32

bit soft core designed for C which can be quite small. I believe the

small version fits in around 500 LUT4s. I'm not sure how to compare

LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow

when running C code and possibly any code. It takes a lot of CPU cycles

to do nearly anything. Do you have any timing info on your design?



ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.



Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.

http://opencores.com/project,up_core_list,downloads Click on family comparison link.

As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.

Ok, so you are shooting for high code density. Have you done any
comparisons with other machines? Saying "each instruction does the work
of several RISC instructions" is just shooting from the hip.

There are several sources of code inefficiency in the standard RISC instruction set:
A) 16-bit immediates and displacements when they are mostly under 8-bits.
B) 15-bits per instruction for register locations (3x5)
C) Load and store instructions in addition to calculation instructions
D) Separate address modification instructions
and worst of all:
E) Subroutine overhead

A thru D: Besides the normal code density advantage of non-RISC and compacted RISC versus standard RISC, the architecture supports instruction byte granularity with the single byte instructions being stack instructions. Don't have any statistics.
E: The standard C model for subroutines has the effect of discouraging short subroutines. This is where stack machines gain a big advantage.

Ok, that is compared to RISC in a subjective manner. How about other
ISA types? MISC? Other CISC designs?

I find it interesting that you refer to issues in using C while you have
no intent to work toward having your CPU supported by a C compiler. Is
anything to do with C important then?

I would point out that even if you reach 200 MHz operation on an FPGA
that will be approximately the same as running at 57 MHz if your
instructions used a single clock cycle, not a tricky goal usually. In
addition that makes many aspects of the machine simpler.

--

Rick
 
On 9/4/2014 4:50 PM, jim.brakefield@ieee.org wrote:
On Thursday, September 4, 2014 1:20:28 PM UTC-5, rickman wrote:

On Wednesday, September 3, 2014 10:11:39 PM UTC-5, rickman wrote:







I'm not quite understanding. First, I don't really know what an



"accumulator" machine is other than one which has very limited



instructions that don't let you do much other than move stuff through an



accumulator. Is there some advantage to an accumulator CPU?







For me the classic accumulator machine is the CDC 1604.



Instruction addresses a memory location and result between memory and accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.







So it is a stack machine with stack pointers into on chip memory. You



feel it is useful as a platform for C. Do you have any plans to provide



a C compiler for this?







The stack pointers can be used either as stack pointers or as index registers.



The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.







Would like to have a C compiler. Probably beyond my ability.



Am comfortable with assembler.



Consider the programming model up for grabs. E.g., this can be considered a research machine.







I would find it interesting if you could compare this to the ZPU, a 32



bit soft core designed for C which can be quite small. I believe the



small version fits in around 500 LUT4s. I'm not sure how to compare



LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow



when running C code and possibly any code. It takes a lot of CPU cycles



to do nearly anything. Do you have any timing info on your design?







ZPU has a limited instruction set. Here, have tried to put as much functionality into each instruction so each instruction does the work of several RISC instructions. While keeping code density high.







Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.



http://opencores.com/project,up_core_list,downloads Click on family comparison link.



As currently designed instructions take 2, 3 or 4 clock cycles with a weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual port Block RAM.



Ok, so you are shooting for high code density. Have you done any

comparisons with other machines? Saying "each instruction does the work

of several RISC instructions" is just shooting from the hip.



There are several sources of code inefficiency in the standard RISC instruction set:

A) 16-bit immediates and displacements when they are mostly under 8-bits.

B) 15-bits per instruction for register locations (3x5)

C) Load and store instructions in addition to calculation instructions

D) Separate address modification instructions

and worst of all:

E) Subroutine overhead



A thru D: Besides the normal code density advantage of non-RISC and compacted RISC versus standard RISC, the architecture supports instruction byte granularity with the single byte instructions being stack instructions. Don't have any statistics.

E: The standard C model for subroutines has the effect of discouraging short subroutines. This is where stack machines gain a big advantage.

Ok, that is compared to RISC in a subjective manner. How about other
ISA types? MISC? Other CISC designs?



I find it interesting that you refer to issues in using C while you have
no intent to work toward having your CPU supported by a C compiler. Is
anything to do with C important then?

Tend to consider C as typical of using a single memory stack to hold subroutine frames, parameters, result, previous frame pointer and return address. However, some compilers probably manage to keep some of this in registers..
Original Fortran used parameter lists and globally allocated memory for non-recursive subroutines. On a dual stack machine, parameters can moved to the return stack to create a frame. My intent is that the ISA support any of these memory usages and others as well.

I would point out that even if you reach 200 MHz operation on an FPGA
that will be approximately the same as running at 57 MHz if your
instructions used a single clock cycle, not a tricky goal usually. In
addition that makes many aspects of the machine simpler.

If each instruction does twice as much as a RISC instruction and if the dual issue does not increase the LUT count significantly, then will have a 200MHz RISC equivalent. Without pipelining.

Also possible to use LUT RAM for the stacks and increase the execution rate..
For now content to go with the slower design. Yes RISCs are simple. Keeping all the stacks in memory and all the pointers in LUT RAM is also simple. For this instruction set, address ALU is 100% busy and data ALU is 30% busy. With dual issue one needs a second address ALU and data ALU is 60% busy.

Am aiming for a low LUT count, single block RAM design. My figure of merit is instructions per second per LUT (with adjustment for word size). Very easy to add a few features and double the LUT count. There is an extensive comparison of soft core processors at:
http://opencores.com/project,up_core_list,downloads click on best of each design link

Jim Brakefield

Ok, let us know how it shakes out.

--

Rick
 
Am Freitag, 25. Juli 2014 04:45:53 UTC+2 schrieb signaltap:
Hi all,



Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

What about NetFPGA.org?

Andreas
 

Welcome to EDABoard.com

Sponsor

Back
Top