The littlest CPU

R

rickman

Guest
I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick
 
On Jul 18, 10:07 pm, rickman <gnu...@gmail.com> wrote:
I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

The Xilinx PicoBlaze is less than 100 LUTs plus one block ram.
Someone has been working on a simple C compiler for the PicoBlaze, but
I have not tried it yet. I have used the PicoBlaze in many projects
and I am quite happy with it.

I have not used it, but Lattice has the Micro8. Have you looked at
it? It has been mentioned here as the Lattice equivalent to the
PicoBlaze.

Regards,

John McCaskill
www.FasterTechnology.com
 
On Jul 18, 11:09 pm, John McCaskill <jhmccask...@gmail.com> wrote:

The Xilinx PicoBlaze is less than 100 LUTs plus one block ram.
That should be less than 100 slices.

Regards,

John McCaskill
 
If a 8 bits CPU is fine you may want to see my site. There is VHDL or
verilog design. For this CPU it is easy to find free or non free
tools. All is discussed in detail at:
http://bknpk.no-ip.biz/usb_1.html

"I used 8051 from http://www.cs.ucr.edu/~dalton/i8051/i8051syn. The
VHDL code has been translated to verilog to avoid mix languages
simulation. The cpu is also slightly modified to be able to use XILINX
memories: for ROM I use..."



On 19 יולי, 07:23, John McCaskill <jhmccask...@gmail.com> wrote:
On Jul 18, 11:09 pm, John McCaskill <jhmccask...@gmail.com> wrote:



The Xilinx PicoBlaze is less than 100 LUTs plus one block ram.

That should be less than 100 slices.

Regards,

John McCaskill
 
On 19 juuli, 06:07, rickman <gnu...@gmail.com> wrote:
I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick
im OP

hi I may have different interests, yes smallest nonserialized CPU
as for your current task is one of the wishes, and here also there
is no one definitive winner

pico paco blazes and mico8 are out of the question, most others
are too large

i have used cut AVR core in XP3 but i dont recall the lut count

Antti
 
On Jul 19, 2:57 am, Antti <Antti.Luk...@googlemail.com> wrote:
On 19 juuli, 06:07, rickman <gnu...@gmail.com> wrote:



I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

im OP

hi I may have different interests, yes smallest nonserialized CPU
as for your current task is one of the wishes, and here also there
is no one definitive winner

pico paco blazes and mico8 are out of the question, most others
are too large

i have used cut AVR core in XP3 but i dont recall the lut count
Have you tabulated your findings anywhere? The last time I did a
survey of ARM7 processors, I put it all into a spread sheet and posted
it on the web. I think it was useful for a while, but the market
overtook it and I couldn't keep up!

I read your thread about the serial processor and it was interesting.
I think my project actually has the time to use such a processor, but
I think you never found one that met your requirements.

I am not looking for a large address space, but I would like for it to
be able to read data from an SD card. My design uses FPGAs both on
the application board and the test fixture. Ultimately I want the
test fixture to be able to read a programming file from an SD card and
configure the target FGPA without a programming cable.

Of all the suggestions, so far the PIC sounds like the best one. I
couldn't find a C compiler for the picoblaze or the pacoblaze. There
is mention of someone creating one, but the web site is no longer
accessible.

Rick
 
On Jul 19, 12:23 am, John McCaskill <jhmccask...@gmail.com> wrote:
On Jul 18, 11:09 pm, John McCaskill <jhmccask...@gmail.com> wrote:



The Xilinx PicoBlaze is less than 100 LUTs plus one block ram.

That should be less than 100 slices.
Still, that's 200 LUTs which is very small. But I can't find a C
compiler for it.

Rick
 
On 20 juuli, 06:58, rickman <gnu...@gmail.com> wrote:
On Jul 19, 2:57 am, Antti <Antti.Luk...@googlemail.com> wrote:



On 19 juuli, 06:07, rickman <gnu...@gmail.com> wrote:

I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

im OP

hi I may have different interests, yes smallest nonserialized CPU
as for your current task is one of the wishes, and here also there
is no one definitive winner

pico paco blazes and mico8 are out of the question, most others
are too large

i have used cut AVR core in XP3 but i dont recall the lut count

Have you tabulated your findings anywhere?  The last time I did a
survey of ARM7 processors, I put it all into a spread sheet and posted
it on the web.  I think it was useful for a while, but the market
overtook it and I couldn't keep up!

I read your thread about the serial processor and it was interesting.
I think my project actually has the time to use such a processor, but
I think you never found one that met your requirements.

I am not looking for a large address space, but I would like for it to
be able to read data from an SD card.  My design uses FPGAs both on
the application board and the test fixture.  Ultimately I want the
test fixture to be able to read a programming file from an SD card and
configure the target FGPA without a programming cable.

Of all the suggestions, so far the PIC sounds like the best one.  I
couldn't find a C compiler for the picoblaze or the pacoblaze.  There
is mention of someone creating one, but the web site is no longer
accessible.

Rick
Hi Rick here is reply to your post :)
http://antti-lukats.blogspot.com/2008/07/rules-of-life.html

in short i am doing almost the same as you intend to at the moment

Antti
 
On 19.7.2008 6:07, rickman wrote:
I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick
Maybe something worth checking:

http://www.zylin.com/zpu.htm

From the above website:

1. The ZPU is now open source. See ZPU mailing list for more details.
2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
3. GPL license for architecture, documentation and tools
4. Completely FPGA brand and type neutral implementation
5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
7. Codesize 80% of ARM thumb
8. Configurable 16/32 bit datapath
9. GCC toolchain(GDB, newlib, libstdc++)
10. Debugging via simulator or GDB stubs
11. HDL simulation feedback to simulator for powerful profiling
capabilities
12. Eclipse ZPU plug-in
13. eCos embedded operating system support.



Henri
 
Of all the suggestions, so far the PIC sounds like the best one. I
couldn't find a Ccompilerfor thepicoblazeor the pacoblaze. There
is mention of someone creating one, but the web site is no longer
accessible.

Rick
You can find a download link here :

http://www.asm.ro/fpga/

Disclaimer : I never used it myself


Josep
 
On 20 juuli, 15:21, Henri <h...@s.fi> wrote:
On 19.7.2008 6:07, rickman wrote:



I may need to add a CPU to a design I am doing.  I had rolled my own
core once with a 16 bit data path and it worked out fairly well.  But
it was 600 LUT/FFs and I would like to use something smaller if
possible.  The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used.  I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly.  The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz.  I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler.  I
don't think any ever turned up.  But the OP had some other
requirements that may have excluded a few very small designs.  Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs?  The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

Maybe something worth checking:

http://www.zylin.com/zpu.htm

 From the above website:

    1.   The ZPU is now open source. See ZPU mailing list for more details.
    2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
    3. GPL license for architecture, documentation and tools
    4. Completely FPGA brand and type neutral implementation
    5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
    6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
    7. Codesize 80% of ARM thumb
    8. Configurable 16/32 bit datapath
    9. GCC toolchain(GDB, newlib, libstdc++)
   10. Debugging via simulator or GDB stubs
   11. HDL simulation feedback to simulator for powerful profiling
capabilities
   12. Eclipse ZPU plug-in
   13. eCos embedded operating system support.

Henri
eh this is still on my MUST evaluate plan :)

80% of THUMB? that nice also, i just made my first THUMB assembly
program
Atmel dataflash bootstrap loader, its about 60 bytes of code (thumb)
would be fun to compare if that optimized to max thumb code still
compacts on zpu :)
my code is really funky it loads 1 32 bit constant and constructs all
other constants, also uses lower port of io address as mask constant,
etc..

Antti
 
On Jul 20, 8:21 am, Henri <h...@s.fi> wrote:
On 19.7.2008 6:07, rickman wrote:



I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.

I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.

Rick

Maybe something worth checking:

http://www.zylin.com/zpu.htm

From the above website:

1. The ZPU is now open source. See ZPU mailing list for more details.
2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
3. GPL license for architecture, documentation and tools
4. Completely FPGA brand and type neutral implementation
5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
7. Codesize 80% of ARM thumb
8. Configurable 16/32 bit datapath
9. GCC toolchain(GDB, newlib, libstdc++)
10. Debugging via simulator or GDB stubs
11. HDL simulation feedback to simulator for powerful profiling
capabilities
12. Eclipse ZPU plug-in
13. eCos embedded operating system support.

Henri
I'm pretty impressed. Small, fast and with GCC support!

Rick
 
The '16 Bit Microcontroller' at Opencores by Dr. Juergen Sauermann is
also an impressive piece of work.

rickman wrote:
On Jul 20, 8:21 am, Henri <h...@s.fi> wrote:
On 19.7.2008 6:07, rickman wrote:



I may need to add a CPU to a design I am doing. I had rolled my own
core once with a 16 bit data path and it worked out fairly well. But
it was 600 LUT/FFs and I would like to use something smaller if
possible. The target is a Lattice XP3 with about 3100 LUT/FFs and
about 2000 are currently used. I believe that once I add the CPU
core, I can take out a lot of the logic since it runs so slowly. The
fastest parallel data rate is 8 kHz with some at 1 kHz and the rest at
100 Hz. I probably would have used a CPU to start with instead of the
FPGA, but there was a possible need to handle higher speed signals
which seems to have gone away.
I recall that someone had started a thread about serial
implementations of processors that were supported by a C compiler. I
don't think any ever turned up. But the OP had some other
requirements that may have excluded a few very small designs. Are
there any CPU cores, serial or parallel, that are significantly
smaller than 600 LUT/FFs? The Lattice part has LUT memory even dual
port, so that is not a constraint, the LUTs can be used for
registers.
Rick
Maybe something worth checking:

http://www.zylin.com/zpu.htm

From the above website:

1. The ZPU is now open source. See ZPU mailing list for more details.
2. BSD license for HDL implementations--no hiccups when using in
proprietary commercial products. Under the open source royalty free
license, there are no limits on what type of technology (FPGA,
anti-fuse, or ASIC) in which the ZPU can be implemented.
3. GPL license for architecture, documentation and tools
4. Completely FPGA brand and type neutral implementation
5. 298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
6. 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM
7. Codesize 80% of ARM thumb
8. Configurable 16/32 bit datapath
9. GCC toolchain(GDB, newlib, libstdc++)
10. Debugging via simulator or GDB stubs
11. HDL simulation feedback to simulator for powerful profiling
capabilities
12. Eclipse ZPU plug-in
13. eCos embedded operating system support.

Henri

I'm pretty impressed. Small, fast and with GCC support!

Rick
 
What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)


rickman wrote:
On Jul 23, 5:26 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov
wrote:
The '16 Bit Microcontroller' at Opencores by Dr. Juergen Sauermann is
also an impressive piece of work.

Can you tell us what you find impressive about it? I took a look and
it is listed as 800 slices which means it can be as big as 1600 LUTs.
That is over three times the size of my CPU and an even larger ratio
compared to the ZPU and others.

Is it the fact that it has a C compiler and a simulator?

Rick
 
On Jul 23, 5:26 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov>
wrote:
The '16 Bit Microcontroller' at Opencores by Dr. Juergen Sauermann is
also an impressive piece of work.
Can you tell us what you find impressive about it? I took a look and
it is listed as 800 slices which means it can be as big as 1600 LUTs.
That is over three times the size of my CPU and an even larger ratio
compared to the ZPU and others.

Is it the fact that it has a C compiler and a simulator?

Rick
 
rickman wrote:
On Jul 23, 6:57 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov
wrote:
What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)

No problem. I was waiting for this one to appear so I could respond
in public. I think there is some interest in the discussion.

Yes, once I had a chance to look a bit more at the docs, I see the
history and I also like the idea. I'm not sure why it is so large
though. His design sounds simple with few registers and not even an
internal stack if I understand correctly. The various Forth like CPUs
all have one if not two internal stacks which in effect are local
memories (in FPGA implementations). I expect (without looking at the
design in detail) that this design suffers somewhat in speed in that
things are done sequentially that can be done in parallel in other
processors. But then those "other" processors are not built to run
C. So I expect any fair comparison needs to take that into account.

I can't say my design is impressive really. It is not complete in
that there are no tools of any sort. I made a crude assembler but
mostly hand coded in machine language. So I don't really have any
idea of how fast it would run an application written in a high level
language. I like to think that it would handle Forth pretty well, but
I have not spent the time to really get that underway.

I did see that the C16 (that is Dr. Juergen Sauermann's CPU name) is
constructed somewhat like the 8080. That processor had a three
machine cycle instruction timing and may have also used two input
clocks for each machine cycle (this is really stretching my wayback
machine). I remember this partly because I have an 8008 computer
which was the predecessor to the 8080. It used the three machine
cycles because it only had an 8 bit multiplexed bus. It used two
cycles to output a 14 bit address (IIRC) and the third cycle was for
the 8 bits of data. Every instruction was built of these three
machine cycle memory ops (even if it was a register transfer).

His machine seems to have emulated that and so uses up to 6 clock
cycles for a basic instruction. I don't know much about the ZPU, but
my CPU uses one clock cycle for any instruction other than program
memory reads which require three cycles.

You like the variable length literal instructions ala the Transputer?
They are used to set up the immediate addresses for jumps and calls
too. Unfortunately this makes for some trouble with defining
addresses in the assembler. I never did get that to work correctly.
Every time a byte was added or subtracted from the opcodes, it would
move all of the other labels and you had to start over with the
calculations. I think you could have situations that never
converged.

Otherwise I was pretty happy with my CPU. But I don't want to
continue using it if there are better CPUs available. But it will be
a couple of weeks before I can really spend any time on this.

Rick
Yes, I like the idea of prefix instructions -- I am a believer in
compact instruction sets, even if it makes the CPU slightly more
complex. The transputer linker had the same issues you allude with
yours -- the linker would sometimes have to make many 10's, or even a
few hundred passes (for a large program) to make all of the variable
length prefix instructions as short as possible. That is probably one
of the reasons that the successor to the transputer from www.xmos.com
looks much more like a modern register-based architecture with a lot of
other clever transputer features retained or extended. Sauermann
started with the 8080/Z80 only to come across the poor match to a C
compiler. Since this was his starting point, I am not surprised that
his final design shows some heritage from these designs. I would be
very interested in knowing how your design fares with a C compiler (if
someone smarter than me has the strength to do the port).
 
On Jul 23, 6:57 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov>
wrote:
What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)
No problem. I was waiting for this one to appear so I could respond
in public. I think there is some interest in the discussion.

Yes, once I had a chance to look a bit more at the docs, I see the
history and I also like the idea. I'm not sure why it is so large
though. His design sounds simple with few registers and not even an
internal stack if I understand correctly. The various Forth like CPUs
all have one if not two internal stacks which in effect are local
memories (in FPGA implementations). I expect (without looking at the
design in detail) that this design suffers somewhat in speed in that
things are done sequentially that can be done in parallel in other
processors. But then those "other" processors are not built to run
C. So I expect any fair comparison needs to take that into account.

I can't say my design is impressive really. It is not complete in
that there are no tools of any sort. I made a crude assembler but
mostly hand coded in machine language. So I don't really have any
idea of how fast it would run an application written in a high level
language. I like to think that it would handle Forth pretty well, but
I have not spent the time to really get that underway.

I did see that the C16 (that is Dr. Juergen Sauermann's CPU name) is
constructed somewhat like the 8080. That processor had a three
machine cycle instruction timing and may have also used two input
clocks for each machine cycle (this is really stretching my wayback
machine). I remember this partly because I have an 8008 computer
which was the predecessor to the 8080. It used the three machine
cycles because it only had an 8 bit multiplexed bus. It used two
cycles to output a 14 bit address (IIRC) and the third cycle was for
the 8 bits of data. Every instruction was built of these three
machine cycle memory ops (even if it was a register transfer).

His machine seems to have emulated that and so uses up to 6 clock
cycles for a basic instruction. I don't know much about the ZPU, but
my CPU uses one clock cycle for any instruction other than program
memory reads which require three cycles.

You like the variable length literal instructions ala the Transputer?
They are used to set up the immediate addresses for jumps and calls
too. Unfortunately this makes for some trouble with defining
addresses in the assembler. I never did get that to work correctly.
Every time a byte was added or subtracted from the opcodes, it would
move all of the other labels and you had to start over with the
calculations. I think you could have situations that never
converged.

Otherwise I was pretty happy with my CPU. But I don't want to
continue using it if there are better CPUs available. But it will be
a couple of weeks before I can really spend any time on this.

Rick
 
On Jul 23, 7:58 pm, Eric Smith <e...@brouhaha.com> wrote:
rickman wrote:
That is over three times the size of my CPU

Which one is your CPU? Is it open source?
Mine was done some 6 or 7 years ago for a simple application and I
never released it. I have called it "Bonus" for no special reason.
If I decide to open source it I will try to come up with a better
name.

I don't know that it is anything special at this point. There are a
*huge* number of CPUs available at opencores.org and other places. A
quick count at opencores gives 93 processors, not counting the special
purpose ones! Does the world really need another one???

The only problem is that most of them are not really very well
documented. Very few of them even tell you how large they are in an
FPGA or how fast they run. Heck, there are three just called,
"Microprocessor" and one of those doesn't even have a page! We seem
to have quantity, but quality only in a few.

Rick
 
On Jul 23, 8:42 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov>
wrote:
rickman wrote:
On Jul 23, 6:57 pm, "Robert F. Jarnot" <Robert.F.Jar...@jpl.nasa.gov
wrote:
What impresses me about this design is the approach -- determine what
kind of architecture a 'clean' compiler would like to see, and implement
the corresponding hardware and compiler. Throwing in an RTOS is a nice
bonus too.

I agree that your design is very impressive, both in resource usage and
performance. I like some of the architectural details too, especially
those borrowed from the transputer (looking back to the transputer for
ideas is a good idea in my opinion). Having GCC support is a big plus
too. What I do not have a feeling for is the relative performance of
the two designs -- do you have any feeling for this?

(Note to rickman: my initial reply was directly to you, not the
newsgroup. Sorry. This reply is very similar to the one I sent you
directly)

No problem. I was waiting for this one to appear so I could respond
in public. I think there is some interest in the discussion.

Yes, once I had a chance to look a bit more at the docs, I see the
history and I also like the idea. I'm not sure why it is so large
though. His design sounds simple with few registers and not even an
internal stack if I understand correctly. The various Forth like CPUs
all have one if not two internal stacks which in effect are local
memories (in FPGA implementations). I expect (without looking at the
design in detail) that this design suffers somewhat in speed in that
things are done sequentially that can be done in parallel in other
processors. But then those "other" processors are not built to run
C. So I expect any fair comparison needs to take that into account.

I can't say my design is impressive really. It is not complete in
that there are no tools of any sort. I made a crude assembler but
mostly hand coded in machine language. So I don't really have any
idea of how fast it would run an application written in a high level
language. I like to think that it would handle Forth pretty well, but
I have not spent the time to really get that underway.

I did see that the C16 (that is Dr. Juergen Sauermann's CPU name) is
constructed somewhat like the 8080. That processor had a three
machine cycle instruction timing and may have also used two input
clocks for each machine cycle (this is really stretching my wayback
machine). I remember this partly because I have an 8008 computer
which was the predecessor to the 8080. It used the three machine
cycles because it only had an 8 bit multiplexed bus. It used two
cycles to output a 14 bit address (IIRC) and the third cycle was for
the 8 bits of data. Every instruction was built of these three
machine cycle memory ops (even if it was a register transfer).

His machine seems to have emulated that and so uses up to 6 clock
cycles for a basic instruction. I don't know much about the ZPU, but
my CPU uses one clock cycle for any instruction other than program
memory reads which require three cycles.

You like the variable length literal instructions ala the Transputer?
They are used to set up the immediate addresses for jumps and calls
too. Unfortunately this makes for some trouble with defining
addresses in the assembler. I never did get that to work correctly.
Every time a byte was added or subtracted from the opcodes, it would
move all of the other labels and you had to start over with the
calculations. I think you could have situations that never
converged.

Otherwise I was pretty happy with my CPU. But I don't want to
continue using it if there are better CPUs available. But it will be
a couple of weeks before I can really spend any time on this.

Rick

Yes, I like the idea of prefix instructions -- I am a believer in
compact instruction sets, even if it makes the CPU slightly more
complex. The transputer linker had the same issues you allude with
yours -- the linker would sometimes have to make many 10's, or even a
few hundred passes (for a large program) to make all of the variable
length prefix instructions as short as possible. That is probably one
of the reasons that the successor to the transputer fromwww.xmos.com
looks much more like a modern register-based architecture with a lot of
other clever transputer features retained or extended. Sauermann
started with the 8080/Z80 only to come across the poor match to a C
compiler. Since this was his starting point, I am not surprised that
his final design shows some heritage from these designs. I would be
very interested in knowing how your design fares with a C compiler (if
someone smarter than me has the strength to do the port).
I want to say the ZPU is a stack oriented processor, two stacks in
fact, like you would use for Forth; one is for data and the other for
addresses, but I don't recall and I can't seem to find the docs on my
hard drive. But ZPU has a C compiler, so you could compare this one
to other, non-stack processors.

Rick
 

Welcome to EDABoard.com

Sponsor

Back
Top