New soft processor core paper publisher?

Eric Wallin <tammie.eric@gmail.com> wrote:

(snip)
If the programmer writes the individual thread programs so that
two threads never write to the same address then by definition
it can't happen (unless there is a bug in the code).
The question is related to communication between threads.
If they are independent, processing independent data, then no
problem. Usually they at least need to communicate with the OS
(or outside world, in general), which often needs semaphores.

I probably haven't thought about this as much as you have,
but I don't see the fundamental need for more hardware if
the programmer does his/her job.
-- glen
 
On Sunday, June 23, 2013 6:30:28 PM UTC-4, rickman wrote:

I'm not sure why you need to insert so much opinion of stack machines in
the discussions of the paper. Some of what I have read so far is not
very clear exactly what your point is and just comes off as a general
bias about stack machines including those who promote them. I don't
mind at all when technical shortcomings are pointed out, but I'm not
excited about reading the sort of opinion shown...
Point taken. I suppose I'm trying to spare others from wasting too much time and energy on canonical one and two stack machines. There just aren't enough stacks, so unless you want to deal with the top entry or two right now you'll be digging around, wasting both programming and real time, and getting confused. And they automatically toss data away that you often very much need, so you waste more time copying it or reloading it or whatever. I spent years trying to like them, thinking the problem was me. The J processor really helped break the spell.

Not saying I have all the answers, I hope the paper doesn't come across that way, but I do have to sell it to some degree (the paper ends with the down sides that I'm aware of, I'm sure there are more).

"Stack machines are (perhaps somewhat inadvertently) portrayed as a
panacea for all computing ills" I don't recall ever hearing anyone
saying that. Certainly there are a lot of claims for stack machines,
but the above is almost hyperbole.
Defense exhibit A:

http://www.ultratechnology.com/cowboys.html

Maybe I'm seeing things that aren't there, but almost every web site, paper, and book on stack machines and Forth that I've encountered has a vibe of "look at this revolutionary idea that the man has managed to keep down!" Absolutely no down sides mentioned, so the hapless noob is left with much too flattering of an impression. In my case this false impression was quite lasting, so I guess I've got something of an axe to grind. Perhaps I'll moderate this in future releases of the design document.
 
rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
I suppose. Stack machines are pretty much out of style now.
One reason is that current compiler technology has a hard
time generating good code for them.

I think you might be referring to the sort of stack machines used in
minicomputers 30 years ago. For FPGA implementations stack CPUs are
alive and kicking. Forth seems to do a pretty good job with them. What
is the problem with other languages?
The code generators designed for register machines, such as that
used by GCC or LCC, don't adapt to stack machines well.

As users of HP calculators know, given an expression with unrelated
arguments, it isn't hard to evaluate using a stack. But consider that
the expression might have some common subexpressions? You want to
evaluate the expression, evaluating the common subexpressions only
once. It is not so easy to get things into the right place on
the stack, such that they are at the top at the right time.

-- glen
 
Les Cargill <lcargill99@comcast.com> wrote:
Rob Doyle wrote:
(snip)

Lack of atomic operations.

No. The only requirement for semaphores
to work is to be able to turn off interrupts briefly.
What about other processors or I/O using the same memory?

-- glen
 
Eric Wallin wrote:
On Sunday, June 23, 2013 6:43:06 PM UTC-4, Tom Gardner wrote:

Do you have interrupts?

Yes, one per thread.

If so you need semaphores.

Not sure I follow, but I'm not sure you've read the paper.
Nope. I've too many other things to understand in detail.
I have no bandwidth to debug your design.


Can more than one "source" cause a memory location
to be read or written within one processor instruction
cycle? If so you need semaphores.

If the programmer writes the individual thread programs so that two threads never write to the same address then by definition it can't happen (unless there is a bug in the code). I probably haven't thought about this as much as you have, but I don't see the fundamental need for more hardware if the programmer does his/her job.
The problems that arise with the lack of atomic operations
and/or semaphores are a known problem. Any respectable
university-level software course will cover the problems
and various solutions.

Consider trying to pass a message consisting of one
integer from one thread to another such that the
receiving thread is guaranteed to be able to picks
it up exactly once.
 
Eric Wallin wrote:
On Monday, June 24, 2013 3:24:44 AM UTC-4, Tom Gardner wrote:

Consider trying to pass a message consisting of one
integer from one thread to another such that the
receiving thread is guaranteed to be able to picks
it up exactly once.

Thread A works on the integer value and when it is done it writes it to location Z. It then reads a value at location X, increments it, and writes it back to location X.

Thread B has been repeatedly reading location X and notices it has been incremented. It reads the integer value at Z, performs some function on it, and writes it back to location Z. It then reads a value at Y, increments it, and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.

The above seems airtight to me if reads and writes to memory are not cached or otherwise delayed, and I don't see how interrupts are germane, but perhaps I haven't taken everything into account.
Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.

Do some research to find why "test and set" and "compare and swap" instructions exist.
 
glen herrmannsfeldt wrote:
Les Cargill <lcargill99@comcast.com> wrote:
Rob Doyle wrote:

(snip)

Lack of atomic operations.

No. The only requirement for semaphores
to work is to be able to turn off interrupts briefly.

What about other processors or I/O using the same memory?

-- glen
Then it's no longer what I would call a semaphore. A
semaphore is, SFAIK, only a Dijkstra P() or V() operation.

It's not that things like this don't exist but rather that
they should be called something else, like "bus arbitration
scheme."

--
Les Cargill
 
Eric Wallin wrote:
On Monday, June 24, 2013 3:24:44 AM UTC-4, Tom Gardner wrote:

Consider trying to pass a message consisting of one integer from
one thread to another such that the receiving thread is guaranteed
to be able to picks it up exactly once.

Thread A works on the integer value and when it is done it writes it
to location Z. It then reads a value at location X, increments it,
and writes it back to location X.

Thread B has been repeatedly reading location X and notices it has
been incremented. It reads the integer value at Z, performs some
function on it, and writes it back to location Z. It then reads a
value at Y, increments it, and writes it back to location Y to let
thread A know it took, worked on, and replaced the integer at Z.

The above seems airtight to me if reads and writes to memory are not
cached or otherwise delayed, and I don't see how interrupts are
germane, but perhaps I haven't taken everything into account.
http://www.acm.uiuc.edu/sigops/roll_your_own/6.a.html

--
Les Cargill
 
Eric Wallin wrote:
On Monday, June 24, 2013 8:30:46 AM UTC-4, Tom Gardner wrote:

Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.

I think you're missing the point that in my processor the threads run concurrently, not sequentially.
Nope. That usually exacerbates problems, plus having 8-port memory (one for each thread) is not cheap!

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?
 
On Monday, June 24, 2013 3:24:44 AM UTC-4, Tom Gardner wrote:

Consider trying to pass a message consisting of one
integer from one thread to another such that the
receiving thread is guaranteed to be able to picks
it up exactly once.
Thread A works on the integer value and when it is done it writes it to location Z. It then reads a value at location X, increments it, and writes it back to location X.

Thread B has been repeatedly reading location X and notices it has been incremented. It reads the integer value at Z, performs some function on it, and writes it back to location Z. It then reads a value at Y, increments it, and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.

The above seems airtight to me if reads and writes to memory are not cached or otherwise delayed, and I don't see how interrupts are germane, but perhaps I haven't taken everything into account.
 
On Monday, June 24, 2013 8:30:46 AM UTC-4, Tom Gardner wrote:

Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.
I think you're missing the point that in my processor the threads run concurrently, not sequentially.
 
Eric Wallin wrote:
On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions.
Please point me to the section which discusses the primitive operations/attributes/properties that you have provided to enable inter-thread communication.


If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev.
Sorry, I don't have time to poorly recapitulate subjects that have been know about and solved for decades.


Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.
Ditto, and I don't have time to find the flaws in your arguments.

If you want people to use your processor, it might be wise to give them the information they need to have confidence in its design.
 
On 6/24/2013 11:57 AM, Eric Wallin wrote:
On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.
Eric, I think you have explained properly how your design will deal with
synchronization. I'm not sure what Tom is going on about. Clearly he
doesn't understand your design.

If it is of any help, Eric's design is more like 8 cores running in
parallel, time sharing memory and in fact, the same processor hardware
on a machine cycle basis (so no 8 ported memory required). If an
interrupt occurs it doesn't cause one of the other 7 tasks to run, they
are already running, it simply invokes the interrupt handler. I believe
Eric is not envisioning multiple tasks on a single processor.

As others have pointed out, test and set instructions are not required
to support concurrency and communications. They are certainly nice to
have, but are not essential. In your case they would be superfluous.

--

Rick
 
rickman wrote:
On 6/24/2013 11:57 AM, Eric Wallin wrote:
On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then
perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a
footing as possible.

Eric, I think you have explained properly how your design will deal with synchronization. I'm not sure what Tom is going on about. Clearly he doesn't understand your design.
Correct.

If it is of any help, Eric's design is more like 8 cores running in parallel, time sharing memory and in fact, the same processor hardware on a machine cycle basis
(so no 8 ported memory required).
Fair enough; sounds like it is in the same area as the propellor chip.

Is there anything to prevent multiple cores reading/writing the
same memory location in the same machine cycle? What is the
result when that happens?


If an interrupt occurs it doesn't cause one of the other 7 tasks to run, they are already running, it simply invokes the interrupt handler. I believe Eric is not envisioning multiple tasks on a
single processor.
Such presumptions would be useful to have in the white paper.


As others have pointed out, test and set instructions are not required to support concurrency and communications. They are certainly nice to have, but are not essential.
Agreed. I'm perfectly prepared to accept alternative techniques,
e.g. disable interrupts.


In your case they would be superfluous.
Not proven to me.

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.

And too often those problems can be very difficult to solve
in software. Nowadays it is hard to find people that have
sufficient experience across the whole hardware/firmware/system
software spectrum to enable them to avoid such traps.

I don't know whether Eric is such a person, but I'm afraid
his answers have raised orange flags in my mind.

As a point of reference, I had similar misgivings when I
first heard about the Itanium's architecture in, IIRC,
1994. I suppressed them because the people involved were
undoubtedly more skilled in the area that I, and had been
working for 5 years. Much later I regrettably came to the
conclusion the orange flags were too optimistic.
 
On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?
I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.
 
glen herrmannsfeldt wrote:
Tom Gardner <spamjunk@blueyonder.co.uk> wrote:

(snip)
Is there anything to prevent multiple cores reading/writing the
same memory location in the same machine cycle? What is the
result when that happens?

Yes. Even if the threads don't communicate with each other, they
might share I/O devices which needs some communication.

(sniP)

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.

The 8087 was originally designed to have a virtual stack, where
on stack overflow an interrupt would trigger a software routine
to spill some stack registers to memory, and on underflow bring
them back again. But no-one tried to write the interrupt routine
until the hardware was done, and it turned out that it wasn't
possible. Not all the required state was available or settable.
Oh, inaccessible state is a problem that has been repeated
many times in many companies! Often to be found near to
virtual memory tables, exceptions, interrupts, and
debuggers - making Heisenbugs the norm not the exception :(


They might have fixed it in the 80287, but then they had to be
compatible with the 8087. Actually, I don't know why they didn't
fix it, but it still isn't fixed.
I strongly suspect compatibility is the (fully justifiable)
reason; it is the reason for all sorts of hardware and
software cruft.
 
Tom Gardner <spamjunk@blueyonder.co.uk> wrote:

(snip)
Is there anything to prevent multiple cores reading/writing the
same memory location in the same machine cycle? What is the
result when that happens?
Yes. Even if the threads don't communicate with each other, they
might share I/O devices which needs some communication.

(sniP)

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.
The 8087 was originally designed to have a virtual stack, where
on stack overflow an interrupt would trigger a software routine
to spill some stack registers to memory, and on underflow bring
them back again. But no-one tried to write the interrupt routine
until the hardware was done, and it turned out that it wasn't
possible. Not all the required state was available or settable.

They might have fixed it in the 80287, but then they had to be
compatible with the 8087. Actually, I don't know why they didn't
fix it, but it still isn't fixed.

-- glen
 
Eric Wallin wrote:
On Monday, June 24, 2013 12:56:05 PM UTC-4, Tom Gardner wrote:

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.

I know what you mean. Maybe I'm too picky, or perhaps too rigid to conform to other's coding styles, but 90% of the HDL code I've encountered both vocationally and avocationally has been quite poor. I tend to rewrite everything, including my slightly older code if I'm using it again, because my style keeps evolving, and it never hurts to take another look at things or at least give them some more polish.

And too often those problems can be very difficult to solve
in software. Nowadays it is hard to find people that have
sufficient experience across the whole hardware/firmware/system
software spectrum to enable them to avoid such traps.

I don't know whether Eric is such a person, but I'm afraid
his answers have raised orange flags in my mind.

Processors aren't my main thing, but I do have need of them in my main thing so I've been quite interested in them for ~15 years now. My EE graduate adviser was a professor of computer engineering and I took a couple of his courses. I own and have read the two Hennesey & Patterson (sp?) texts, though I must admit my eyes glazed over when TLBs and pipeline hazards were being discussed.

Discovering stack machines was a transforming experience for me, showing that it was possible to have much simpler and tractable HW underlying it all. But it's hard to beat indexed registers. This hybrid is something of a middle ground, and so far I'm not finding it too revolting to program by hand - then again it might be something only a mother can love.
Doing something "just because I want to" is an _excellent_
reason. For me probably the next such project will be to
make a cheapskate 1GS/s oscilloscope & TDR. But there's a
heck of a learning curve w.r.t. FPGA clocking, i/o
structures, floorplanning and dev kits nowadays!

Whenever I've come across people that say "if you have
problem X then my product will solve it provided Y applies",
I give them more credence if they also say "but my product
doesn't do Z, you have to do that some other way".
I'm sure you've had similar experiences.
 
On Monday, June 24, 2013 12:07:23 AM UTC-4, rickman wrote:

I'm glad you can take (hopefully) constructive criticism. I was
concerned when I wrote the above that it might be a bit too blunt.
I apologize to everyone here, I kind of barged in and have behaved somewhat brashly.

... part of the utility
of a design is the ease of programming efficiently. I haven't looked at
yours yet, but just picturing the four stacks makes it seem pretty
simple... so far. :^)
Writing a conventional stack machine in an HDL isn't too daunting, but programming it afterward, for me anyway, was just too much.

I have to say I'm not crazy about the large instruction word. That is
one of the appealing things about MISC to me. I work in very small
FPGAs and 16 bit instructions are better avoided if possible, but that
may be a red herring. What matters is how many bytes a given program
uses, not how many bits are in an instruction.
Yes. Opcode space obviously expands exponentially with bit count, so one can get a lot more with a small size increase. I think a 32 bit opcode is pushing it for a small FPGA implementation, but a 16 bit opcode gives one a couple of small operand indices, and some reasonably sized immediate instructions (data, conditional jumps, shifts, add) that I find I'm using quite a bit during the testing and verification phase. Data plus operation in a single opcode is hard to beat for efficiency but it has to earn it's keep in the expanded opcode space. With the operand indices you get a free copy/move with most single operand operations which is another efficiency.

I am supposed to present to the SVFIG and I think your design would be a
very interesting part of the presentation unless you think you would
rather present yourself. I'm sure they would like to hear about it and
they likely would be interested in your opinions on MISC. I know I am.
I'm on the other coast so I most likely can't attend, but I would be most honored if you were to present it to SVFIG.
 
On Monday, June 24, 2013 12:56:05 PM UTC-4, Tom Gardner wrote:

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.
I know what you mean. Maybe I'm too picky, or perhaps too rigid to conform to other's coding styles, but 90% of the HDL code I've encountered both vocationally and avocationally has been quite poor. I tend to rewrite everything, including my slightly older code if I'm using it again, because my style keeps evolving, and it never hurts to take another look at things or at least give them some more polish.

And too often those problems can be very difficult to solve
in software. Nowadays it is hard to find people that have
sufficient experience across the whole hardware/firmware/system
software spectrum to enable them to avoid such traps.

I don't know whether Eric is such a person, but I'm afraid
his answers have raised orange flags in my mind.
Processors aren't my main thing, but I do have need of them in my main thing so I've been quite interested in them for ~15 years now. My EE graduate adviser was a professor of computer engineering and I took a couple of his courses. I own and have read the two Hennesey & Patterson (sp?) texts, though I must admit my eyes glazed over when TLBs and pipeline hazards were being discussed.

Discovering stack machines was a transforming experience for me, showing that it was possible to have much simpler and tractable HW underlying it all. But it's hard to beat indexed registers. This hybrid is something of a middle ground, and so far I'm not finding it too revolting to program by hand - then again it might be something only a mother can love.
 

Welcome to EDABoard.com

Sponsor

Back
Top