New soft processor core paper publisher?

Bakul Shah · Jun 28, 2013

On 6/28/13 2:04 PM, Tom Gardner wrote:

On 28/06/13 20:55, Bakul Shah wrote:

Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
64 core processors?

No I haven't.

FYI the epiphany III processor is used in the $99
parallela "supercomputer". Should be available by
August end according to
http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts

What is more interestingly tractable are "embarrassingly
parallel" problems (e.g. massive event processing systems),
and completely new approaches (currently typified by
big data and map-reduce, but that's just the beginning).

And yet these run on traditional computers. Parallelism
is at the node level.

Tom Gardner · Jun 28, 2013

On 28/06/13 22:22, Bakul Shah wrote:

On 6/28/13 2:04 PM, Tom Gardner wrote:
What is more interestingly tractable are "embarrassingly
parallel" problems (e.g. massive event processing systems),
and completely new approaches (currently typified by
big data and map-reduce, but that's just the beginning).

And yet these run on traditional computers. Parallelism
is at the node level.

Just so, but even such nodes can be the subject of innovation.
A recent good example is Sun's Niagara/Rock T series sparcs,
which forego OOO and caches in favour of a medium number of
cores each operating at the speed of main memory.

rickman · Jun 29, 2013

On 6/28/2013 5:11 PM, Tom Gardner wrote:

On 28/06/13 20:06, rickman wrote:
On 6/28/2013 12:23 PM, Tom Gardner wrote:
On 28/06/13 15:52, rickman wrote:
On 6/28/2013 5:33 AM, Tom Gardner wrote:
On 28/06/13 10:09, RCIngham wrote:
snip

Mind you, I'd *love* to see a radical overhaul of traditional
multicore processors so they took the form of
- a large number of processors
- each with completely independent memory
- connected by message passing fifos
can
In the long term that'll be the only way we can continue
to scale individual machines: SMP scales for a while, but
then cache coherence requirements kill performance.

Transputer?
http://en.wikipedia.org/wiki/Transputer

It had a lot going for it, but was a too dogmatic about
the development environment.

You mean 'C'? I worked on a large transputer oriented project and they
used ANSI 'C' rather than Occam. It got the job done... or should I
say "jobs"?

I only looked at the Transputer when it was Occam only.
I liked Occam as an academic language, but at that time
it would have been a bit of a pain to do any serious
engineering; ISTR anything other than primitive types
weren't supported in the language. IIRC that was
ameliorated later, but by then the opportunity for
me (and Inmos) had passed.

I don't know how C fitted onto the Transputer, but
I'd only have been interested if "multithreaded"
(to use the term loosely) code could have been
expressed reasonably easily.

Shame, I'd have loved to use it.

At the time it was respectably
fast, but that wasn't sufficient -- particularly since there
was so much scope for increasing speed of uniprocessor
machines.

Given that uniprocessors have hit a wall, transputer
*concepts* embodied in a completely different form
might begin to be fashionable again.

You mean like 144 transputers on a single chip?

Or Intel's 80 cored chip

I"m not sure where processing is headed.

Not that way! Memory bandwidth and latency are
key issues - but you knew that!

Yeah, but I think the current programming paradigm is the problem. I
think something else needs to come along. The current methods are all
based on one, massive von Neumann design and that is what
has hit the wall... duh!

Time to think in terms of much smaller entities not totally different
from what is found in FPGAs, just processors rather than logic.

An 80 core chip will just be a starting point, but the hard part will
*be* getting started.

I actually just see confusion ahead as all of the existing methods
seem to have come to a steep incline if
not a brick wall. It may be time for something completely different.

Precisely. My bet is that message passing between
independent processor+memory systems has the
biggest potential. It matches nicely onto many
forms of event-driven industrial and financial
applications and, I am told, onto significant
parts of HPC. It is also relatively easy to
comprehend and debug.

The trick will be to get the sizes of the
processor + memory + computation "just right".
And desktop/GUI doesn't match that.

I think the trick will be in finding ways of dividing up the programs
so they can meld to the hardware rather than trying to optimize
everything.

My suspicion is that, except for compute-bound
problems that only require "local" data, that
granularity will be too small.

Examples where it will work, e.g. protein folding,
will rapidly migrate to CUDA and graphics processors.

You are still thinking von Neumann. Any application can be broken down
into small units and parceled out to small processors. But you have to
think in those terms rather than just saying, "it doesn't fit". Of
course it can fit!

Consider a chip where you have literally a trillion operations per
second available all the time. Do you really care if half go to waste?
I don't! I design FPGAs and I have never felt obliged (not
since the early days anyway) to optimize the utility of each LUT and
FF. No, it turns out the precious resource in FPGAs is routing and you
can't do much but let the tools manage that anyway.

Those internal FPGA constraints also have analogues at
a larger scale, e.g. ic pinout, backplanes, networks...

So a fine grained processor array could be very effective if the
programming can be divided down to suit. Maybe it takes 10 of these
cores to handle 100 Mbps Ethernet, so what? Something like a
browser might need to harness a couple of dozen. If the load slacks
off and they are idling, so what?

The fundamental problem is that in general as you make the
granularity smaller, the communications requirements
get larger. And vice versa

Actually not. The aggregate comms requirements may increase, but we
aren't sharing an Ethernet bus. All of the local processors talk to
each other and less often have to talk to non-local processors. I think
the phone company knows something about that.

If you apply your line of reasoning to FPGAs with the lowly 4 input LUT
it would seem like they would be doomed to eternal comms congestion.
Look at the routing in FPGAs and other PLDs sometime. They are
hierarchical. Works pretty well, but the trade off is in worrying about
providing enough comms to let all of the logic be used for every design
or just not worrying about it and "making do". Works pretty well if the
designers just chill about utilization.

It would also help if people can decide that reliability
is important, and that bucketfuls of salt should be
on hand when listening to salesman's protestations that
"the software/hardware framework takes care of all of
that so you don't have to worry".

What? Since when did engineers listen to salesmen?

Since their PHBs get taken out to the golf course
to chat about sport by the salesmen

It's a bit different with me. I am my own PHB and I kayak, not golf. I
have one disti person who I really enjoy talking to. She tried to help
me from time to time, but often she can't do a lot
because I'm not buying 1000's of chips. But my quantities have gone up
a bit lately, we'll see where it goes.

I'm sort-of retired (I got sick of corporate in-fighting,
and I have my "drop dead money", so...)

That's me too, but I found some work that is paying off very well now.
So I've got a foot in both camps, retired, not retired... both are fun
in their own way. But dealing with international shipping is a PITA.

I regard golf as silly, despite having two courses in
walking distance. My equivalent of kayaking is flying
gliders.

That has got to be fun! I've never worked up the whatever to learn to
fly. It seems like a big investment and not so cheap overall. But
there is clearly a great thrill there.

--

Rick

Les Cargill · Jun 29, 2013

Eric Wallin wrote:

On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

You are still thinking von Neumann. Any application can be broken
down into small units and parceled out to small processors. But
you have to think in those terms rather than just saying, "it
doesn't fit". Of course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I
feel processors in general are much too complex, to the point where
I'm repelled by them. I believe one of the drivers for this
over-complexity is the fact that main memory is external. I've been
assembling PCs since the 286 days, and I've never understood why main
memory wasn't tightly integrated onto the uP die.

RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

Plus, beginning not long ago, you're rwally dealing with cache directly,
not RAM. Throw in that main memory is DRAM, and it gets a lot more
complicated.

Building a BSP for a new board from scratch with a DRAM controller
is a lot of work.

Everyone pretty
much gets the same ballpark memory size when putting a PC together,
and I can remember only once or twice upgrading memory after the
initial build (for someone else's Dell or similar where the initial
build was anemically low-balled for "value" reasons). Here we are in
2013, the memory is several light cm away from the processor on the
MB, talking in cache lines, and I still don't get why we have this
gross inefficiency.

That's not generally the bottleneck, though.

My dual core multi-GHz PC with SSD often just sits there for many
seconds after I click on something, and malware is now taking me
sometimes days to fix.

Geez. Ever use virtual machines? If you break/infect one,
just roll it back.

Windows 7 is a dog to install, with
relentless updates that often completely hose it rather than improve
it. The future isn't looking too bright for the desktop with the way
we're going.

--
Les Cargill

Les Cargill · Jun 29, 2013

Bakul Shah wrote:

On 6/28/13 2:33 AM, Tom Gardner wrote:
On 28/06/13 10:09, RCIngham wrote:
snip

Mind you, I'd *love* to see a radical overhaul of traditional
multicore processors so they took the form of
- a large number of processors
- each with completely independent memory
- connected by message passing fifos

In the long term that'll be the only way we can continue
to scale individual machines: SMP scales for a while, but
then cache coherence requirements kill performance.

Transputer?
http://en.wikipedia.org/wiki/Transputer

It had a lot going for it, but was a too dogmatic about
the development environment. At the time it was respectably
fast, but that wasn't sufficient -- particularly since there
was so much scope for increasing speed of uniprocessor
machines.

Have you looked at Tilera's TILEpro64 or Adapteva's Epiphany
64 core processors?

Given that uniprocessors have hit a wall, transputer
*concepts* embodied in a completely different form
might begin to be fashionable again.

Languages like Erlang and Go use similar concepts (as
did Occam on the transputer). But I think the problem
is that /in general/ we still don't know how to write
parallel or distributed programs.

I do - I've been doing it for a long time, too. It's not
all that hard if you have no libraries getting in the way.

This, by the way, is absolutely nothing fancy. It's
precisely the same concepts as when we linked stuff
together with serial ports in the Stone Age.

Most of the concepts
are from ~40 years back (CSP, guarded commands etc.).

Most *all* concepts in computers are from that long ago or longer.
The "new stuff" is more about arbitraging market forces than getting
real work done.

We still don't have decent tools.

I respectfully disagree. But my standard for "decency" is
probably different from your'n. My idea of an IDE is an
editor and a shell prompt...

Turning serial programs
into parallel versions is manual, laborious, error prone
and not very successful.

So don't do that. Write them to be parallel from the
git-go. Write them to be event-driven. It's better in
all dimensions.

After all, we're all really clockmakers. Events regulate our
"wheels" just like the escapement on a pendulum clock. .
When you get that happening, things get to be a lot more
deterministic and that is what parallelism needs the most.

--
Les Cargill

rickman · Jun 29, 2013

On 6/28/2013 10:44 PM, Les Cargill wrote:

Eric Wallin wrote:
On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

You are still thinking von Neumann. Any application can be broken
down into small units and parceled out to small processors. But
you have to think in those terms rather than just saying, "it
doesn't fit". Of course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I
feel processors in general are much too complex, to the point where
I'm repelled by them. I believe one of the drivers for this
over-complexity is the fact that main memory is external. I've been
assembling PCs since the 286 days, and I've never understood why main
memory wasn't tightly integrated onto the uP die.

RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

That's not the reason. Intel could buy any of the DRAM makers any day
of the week. At several points DRAM divisions of companies were sold
off to form merged DRAM specialty companies primarily so the risk was
shared by several companies and they didn't have to take such a large
hit to their bottom line when DRAM was in the down phase of the business
cycle. Commodity parts like DRAMs are difficult to make money on and
there is always one of the makers who could be bought easily.

In fact, Intel started out making DRAMs! The main reason why main
memory isn't on the CPU chip is because there are lots of variations in
size *and* that it just wouldn't fit! You don't put one DRAM chip in a
computer, they used to need a minimum of four, IIRC to make up a module,
often they were 8 to a module and sometimes double sided with 16 chips
to a DRAM module.

The next bigger problem is that CPUs and DRAMs use highly optimized
processes and are not very compatible. A combined chip would likely not
have as fast a CPU and would have poor DRAM on board.

Plus, beginning not long ago, you're rwally dealing with cache directly,
not RAM. Throw in that main memory is DRAM, and it gets a lot more
complicated.

Building a BSP for a new board from scratch with a DRAM controller
is a lot of work.

Everyone pretty
much gets the same ballpark memory size when putting a PC together,
and I can remember only once or twice upgrading memory after the
initial build (for someone else's Dell or similar where the initial
build was anemically low-balled for "value" reasons). Here we are in
2013, the memory is several light cm away from the processor on the
MB, talking in cache lines, and I still don't get why we have this
gross inefficiency.

That's not generally the bottleneck, though.

I'm not so sure. With the multicore processors my understanding is that
memory bandwidth *is* the main bottle neck. If you could move the DRAM
on chip it could run faster but more importantly it could be split into
a bank for each processor giving each one all the bandwidth it could want.

I think a large part of the problem is that we have been designing more
and more complex machines so that the majority of the CPU cycles are
spent supporting the framework rather than doing the work the user
actually cares about. It is a bit like the amount of fuel needed to go
into space. Add one pound of payload and you need some hundred or
thousand more pounds of fuel to launch it. If you want to travel
further out into space, the amount of fuel goes up exponentially. We
seem to be reaching the point that the improvements in processor speed
are all being consumed by the support software rather than getting to
the apps.

--

Rick

Les Cargill · Jun 29, 2013

glen herrmannsfeldt wrote:

Les Cargill <lcargill99@comcast.com> wrote:

(snip)
RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

Not so long ago, some used separate chips for cache in the same
package as the CPU die.

Yes. Or even farther back, the chips were remote
from the CPU package.

I have wondered if processors with large enough cache can run
with no external RAM, as long as you stay within the cache.

I am sure there are drawers full of papers on the subject

If you were clever about managing cache misses* and they were
sufficiently infrequent, it might work out to be a fraction
of the same thing.

*as in "oops, your process gets queued if you have a cache miss".

Then you could call your L3 (I believe) cache the system
main memory, either on chip or in the same package.

I am not sure what prevents massive cache from being universal in
the first pace. I expect it's pricing.

Plus, beginning not long ago, you're rwally dealing with cache directly,
not RAM. Throw in that main memory is DRAM, and it gets a lot more
complicated.

-- glen

--
Les Cargill

Eric Wallin · Jun 29, 2013

On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

You are still thinking von Neumann. Any application can be broken down
into small units and parceled out to small processors. But you have to
think in those terms rather than just saying, "it doesn't fit". Of
course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I feel processors in general are much too complex, to the point where I'm repelled by them. I believe one of the drivers for this over-complexity is the fact that main memory is external. I've been assembling PCs since the 286 days, and I've never understood why main memory wasn't tightly integrated onto the uP die. Everyone pretty much gets the same ballpark memory size when putting a PC together, and I can remember only once or twice upgrading memory after the initial build (for someone else's Dell or similar where the initial build was anemically low-balled for "value" reasons). Here we are in 2013, the memory is several light cm away from the processor on the MB, talking in cache lines, and I still don't get why we have this gross inefficiency.

My dual core multi-GHz PC with SSD often just sits there for many seconds after I click on something, and malware is now taking me sometimes days to fix. Windows 7 is a dog to install, with relentless updates that often completely hose it rather than improve it. The future isn't looking too bright for the desktop with the way we're going.

glen herrmannsfeldt · Jun 29, 2013

Les Cargill <lcargill99@comcast.com> wrote:

(snip)

RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

Not so long ago, some used separate chips for cache in the same
package as the CPU die.

I have wondered if processors with large enough cache can run
with no external RAM, as long as you stay within the cache.

Then you could call your L3 (I believe) cache the system
main memory, either on chip or in the same package.

Plus, beginning not long ago, you're rwally dealing with cache directly,
not RAM. Throw in that main memory is DRAM, and it gets a lot more
complicated.

-- glen

Tom Gardner · Jun 29, 2013

On 29/06/13 02:02, rickman wrote:

On 6/28/2013 5:11 PM, Tom Gardner wrote:
On 28/06/13 20:06, rickman wrote:
I think the trick will be in finding ways of dividing up the programs
so they can meld to the hardware rather than trying to optimize
everything.

My suspicion is that, except for compute-bound
problems that only require "local" data, that
granularity will be too small.

Examples where it will work, e.g. protein folding,
will rapidly migrate to CUDA and graphics processors.

You are still thinking von Neumann. Any application can be broken down into small units and parceled out to small processors. But you have to think in those terms rather than just saying, "it
doesn't fit". Of course it can fit!

Regrettably not. People have been trying different
techniques for ~50 years, with varying degrees of
success as technology bottlenecks change.
The people working in those areas are highly
intelligent and motivated (e.g. high performance
computing research) and there is serious money
available (e.g. life sciences, big energy).

As a good rule of thumb, if you can think of it,
they've already tried it and found where it does
and doesn't work.

Consider a chip where you have literally a trillion operations per
second available all the time. Do you really care if half go to waste?
I don't! I design FPGAs and I have never felt obliged (not
since the early days anyway) to optimize the utility of each LUT and
FF. No, it turns out the precious resource in FPGAs is routing and you
can't do much but let the tools manage that anyway.

Those internal FPGA constraints also have analogues at
a larger scale, e.g. ic pinout, backplanes, networks...

So a fine grained processor array could be very effective if the
programming can be divided down to suit. Maybe it takes 10 of these
cores to handle 100 Mbps Ethernet, so what? Something like a
browser might need to harness a couple of dozen. If the load slacks
off and they are idling, so what?

The fundamental problem is that in general as you make the
granularity smaller, the communications requirements
get larger. And vice versa

Actually not. The aggregate comms requirements may increase, but we aren't sharing an Ethernet bus. All of the local processors talk to each other and less often have to talk to non-local
processors. I think the phone company knows something about that.

That works to an extent, particularly in "embarrassingly parallel"
problems such as telco systems. I know: I've architected and
implemented some

It still has its limits in most interesting computing systems.

I'm sort-of retired (I got sick of corporate in-fighting,
and I have my "drop dead money", so...)

That's me too, but I found some work that is paying off very well now. So I've got a foot in both camps, retired, not retired... both are fun in their own way. But dealing with international shipping
is a PITA.

Or even sourcing some components, e.g. a MAX9979KCTK+D or +TD

I regard golf as silly, despite having two courses in
walking distance. My equivalent of kayaking is flying
gliders.

That has got to be fun!

Probably better than you imaging (and that's recursive
without a terminating condition). I know instructors
that still have pleasant surprises after 50 years

I did a tiny bit of kayaking on flat water, but now
I wear hearing aids

I've never worked up the whatever to learn to fly.

Going solo is about as difficult as learning to drive
a car. And then the learning really starts

It seems like a big investment and not so cheap overall.

Not in money. In the UK club membership is $500/year,
a launch + 10 mins instruction is $10, and an hour
instruction in the air is $30. The real cost is time:
club members help you get airborne, and you help them
in return. Very sociable, unlike aircraft with air
conditioning fans up front or scythes above.

But there is clearly a great thrill there.

0-40kt in 3s, 0-50kt in 5s, climb with your feet
above your head, fly in close formation with raptors,
eyeball sheep on a hillside as you whizz past
below them at 60kt, 10-20kft, 40kt-150kt, hundreds
and thousands of km range, pre-solo spinning at
altitudes that make power pilots blanche, and
pre-solo flying in loose formation with other
aircraft.

Let me know if you want pointers to youtube vids.

Tom Gardner · Jun 29, 2013

On 29/06/13 03:15, Eric Wallin wrote:

On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

You are still thinking von Neumann. Any application can be broken down
into small units and parceled out to small processors. But you have to
think in those terms rather than just saying, "it doesn't fit". Of
course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I feel processors in general are much too complex, to the point where I'm repelled by them.

I believe one of the drivers for this over-complexity is the fact that main memory is external.

Partly. The random-access latency is also a killer (Sun's
Niagara series sparcs have an interesting work-around.)

I've never understood why main memory wasn't tightly integrated onto the uP die.

1) Yield => cost, which is highly non-linear with
increasing area. Tolerable for big iron where the
processor cost is a small fraction of the total

2) Different semiconductor structures, which are
can't be used o the same die.

My dual core multi-GHz PC with SSD often just sits there for many seconds after I click on something, and malware is now taking me sometimes days to fix. Windows 7 is a dog to install, with relentless updates that often completely hose it rather than improve it. The future isn't looking too bright for the desktop with the way we're going.

Download xubuntu, blow it onto a cd/dvd or usb.
Reboot and try that "live cd" version without
touching your disk.

Live cd has slow disk accesses since everything
has to be fetched from cd. But the desktop is
blindingly fast, even on notebooks.

No resident malware to worry about (just spearphishing
and man-in-the browser attacks).

No endless re-booting whenever updates arrive -
and they arrive daily. Only reboots are for
kernel upgrades.

Speedy installation: I get a fully-patched installed
system in well under an hour. Last time MS would
let me (!) install XP, it took me well over a day
because of all the reboots.

Speedy re-installation once every 3 years: your
files are untouched so you just upgrade the o/s
(trivial precondition: put /home on a separate
disk partition). Takes < 1 hour.

Tom Gardner · Jun 29, 2013

On 29/06/13 03:56, glen herrmannsfeldt wrote:

Les Cargill <lcargill99@comcast.com> wrote:

(snip)
RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

Not so long ago, some used separate chips for cache in the same
package as the CPU die.

I have wondered if processors with large enough cache can run
with no external RAM, as long as you stay within the cache.

From the point of view of speed, yes. The DRAM becomes the
equivalent of paging to disk. But you still need DRAM to boot

Then you could call your L3 (I believe) cache the system
main memory, either on chip or in the same package.

Yup, that's the effect. Doesn't work with bloatware

Tom Gardner · Jun 29, 2013

On 29/06/13 04:57, Les Cargill wrote:

*as in "oops, your process gets queued if you have a cache miss".

Almost, but see how Sun's Niagara architecture circumvented
that easily and cheaply.

Tom Gardner · Jun 29, 2013

On 29/06/13 04:55, rickman wrote:

With the multicore processors my understanding is that memory bandwidth *is* the main bottle neck.

Bandwidth and latency, particularly the mismatch
between processor cycle speed and DRAM random-access cycle time.

If you could move the DRAM on chip it could run faster but more importantly it
could be split into a bank for each processor giving each one all the bandwidth it could want.

You've just re-invented the L3 cache on AMD's
Opteron processors

I think a large part of the problem is that we have been designing more and more complex machines so that the majority of the CPU cycles are spent supporting the framework rather than doing the work
the user actually cares about. It is a bit like the amount of fuel needed to go into space. Add one pound of payload and you need some hundred or thousand more pounds of fuel to launch it. If you
want to travel further out into space, the amount of fuel goes up exponentially. We seem to be reaching the point that the improvements in processor speed are all being consumed by the support
software rather than getting to the apps.

In general purpose desktop work, that's true.

For high-performance and industrial computing,
it is less clear cut.

Tom Gardner · Jun 29, 2013

On 29/06/13 03:55, Les Cargill wrote:

Bakul Shah wrote:
Most of the concepts
are from ~40 years back (CSP, guarded commands etc.).

Most *all* concepts in computers are from that long ago or longer.
The "new stuff" is more about arbitraging market forces than getting real work done.

There's some truth in that. For most people
re-inventing the wheel is completely unprofitable.

Who want to learn iron-mining, smelting, forging, and
finishing when all you need to do is cut up this
evening's meal.

Turning serial programs
into parallel versions is manual, laborious, error prone
and not very successful.

So don't do that. Write them to be parallel from the
git-go. Write them to be event-driven. It's better in
all dimensions.

But not at all scales; there's a reason fine-grained
dataflow failed.
And not with all timescales

After all, we're all really clockmakers. Events regulate our
"wheels" just like the escapement on a pendulum clock. .
When you get that happening, things get to be a lot more
deterministic and that is what parallelism needs the most.

Don't get me wrong, I really like event-driven programming,
and some programming types are triumphantly re-inventing
it yet again, many-many layers up the software stack!

For example, and to torture the nomenclature:
- unreliable photons/electons at the PMD level
- unreliable bits at the PHY level
- reliable bits, unreliable frames at MAC level
- reliable frames, unreliable packets at the IP level
- reliable packets, unreliable streams at the TCP level
- reliable streams, unreliable conversations at the app level
- app protocols to make conversations reliable
- reliable conversations within apps:
- protocols to make apps reliable
- streams to send unreliable message events
- frameworks to make message events reliable
where some of the app and framework stuff looks *very* like
some of the networking stuff.

But I'm not going to throw the baby out with the
bathwater: there are *very* good reasons why most
(not all) of those levels are there.

Tom Gardner · Jun 29, 2013

On 29/06/13 14:10, Eric Wallin wrote:

I build PCs and do tech support on the side for family and friends, so their problems become my problems

I refuse, unless they let me install some version of Linux; so far the people that have accepted have been happy.

My last Win7 build took several days just to stabilize with all the updates.

What?! That's ridiculous. I can just about understand that for the decade old XP (kudos to MS for supporting it for that long).
But I sure can't see why that should be true for Win7.

One update put it in a permanent update loop and that took me a while to even notice.
And I've got a Win7 laptop sitting here that for the life of me I can't get to run outside of safe mode.
I did a repair install (malware knocked out system restore rollback)

Why not just do a full re-install from CD?

I was thinking of getting Win7 to replace XP when MS withdraw support next year. Now I'm in doubt.
As for Win8: "just say no" (everyone else does).

and it works fine until the .net v4 updates hit after which it stutters and the HD disappears.

I don't get all the accolades for Win7, it's a dog.

Yes, but it is better than Vista, and the hacks don't feel so guilty about supporting it.

Good things about linux: fanbois are vocally and acerbically critical when
things don't work smoothly, and then point you towards the many alternatives
that /do/ work smoothly.

Eric Wallin · Jun 29, 2013

On Friday, June 28, 2013 10:44:37 PM UTC-4, Les Cargill wrote:

Geez. Ever use virtual machines? If you break/infect one,
just roll it back.

My XP PC doesn't get infected too often, but I build PCs and do tech support on the side for family and friends, so their problems become my problems. My last Win7 build took several days just to stabilize with all the updates. One update put it in a permanent update loop and that took me a while to even notice. And I've got a Win7 laptop sitting here that for the life of me I can't get to run outside of safe mode. I did a repair install (malware knocked out system restore rollback) and it works fine until the .net v4 updates hit after which it stutters and the HD disappears. I don't get all the accolades for Win7, it's a dog.

Les Cargill · Jun 29, 2013

rickman wrote:

On 6/28/2013 10:44 PM, Les Cargill wrote:
Eric Wallin wrote:
On Friday, June 28, 2013 9:02:10 PM UTC-4, rickman wrote:

You are still thinking von Neumann. Any application can be broken
down into small units and parceled out to small processors. But
you have to think in those terms rather than just saying, "it
doesn't fit". Of course it can fit!

Intra brain communications are hierarchical as well.

I'm nobody, but one of the reasons for designing Hive was because I
feel processors in general are much too complex, to the point where
I'm repelled by them. I believe one of the drivers for this
over-complexity is the fact that main memory is external. I've been
assembling PCs since the 286 days, and I've never understood why main
memory wasn't tightly integrated onto the uP die.

RAM was both large and expensive until recently. Different people
made RAM than made processors and it would have been challenging to get
the business arrangements such that they'd glue up.

That's not the reason. Intel could buy any of the DRAM makers any day
of the week.

I have to presume they "couldn't", because they didn't. But memory
architectures did evolve over time - I believe 286 machines still
used DIP packages for DRAM. And the target computers I used
from the mid-80s to the mid-90s may well have still used SRAM.

At that point, by the time SIP/SIMM/DIMM modules were available,
the culture expected things to be seperate. We were also arbitraging
RAM prices - we'd buy less-quantity, more-expensive DRAM now, then buy
bigger later when the price dropped.

Some of that was doubtless retail behavioral stuff.

At several points DRAM divisions of companies were sold
off to form merged DRAM specialty companies primarily so the risk was
shared by several companies and they didn't have to take such a large
hit to their bottom line when DRAM was in the down phase of the business
cycle. Commodity parts like DRAMs are difficult to make money on and
there is always one of the makers who could be bought easily.

Right. So if you integrate it into the main core package, you no
longer have to suffer as a commodity vendor. It's a captive market.

I'm sure it's not that simple.

In fact, Intel started out making DRAMs!

Precisely! They did one of the first "bet the company" moves
that resulted in the 4004.

The main reason why main
memory isn't on the CPU chip is because there are lots of variations in
size *and* that it just wouldn't fit! You don't put one DRAM chip in a
computer, they used to need a minimum of four, IIRC to make up a module,
often they were 8 to a module and sometimes double sided with 16 chips
to a DRAM module.

If you were integrating inside the package, you could use any
physical configuration you wanted. But the thing would still have been
too big.

The next bigger problem is that CPUs and DRAMs use highly optimized
processes and are not very compatible. A combined chip would likely not
have as fast a CPU and would have poor DRAM on board.

I also have to wonder if the ability to cool things was involved.

Plus, beginning not long ago, you're rwally dealing with cache directly,
not RAM. Throw in that main memory is DRAM, and it gets a lot more
complicated.

Building a BSP for a new board from scratch with a DRAM controller
is a lot of work.

Everyone pretty
much gets the same ballpark memory size when putting a PC together,
and I can remember only once or twice upgrading memory after the
initial build (for someone else's Dell or similar where the initial
build was anemically low-balled for "value" reasons). Here we are in
2013, the memory is several light cm away from the processor on the
MB, talking in cache lines, and I still don't get why we have this
gross inefficiency.

That's not generally the bottleneck, though.

I'm not so sure. With the multicore processors my understanding is that
memory bandwidth *is* the main bottle neck. If you could move the DRAM
on chip it could run faster but more importantly it could be split into
a bank for each processor giving each one all the bandwidth it could want.

That's consistent with my understanding as well. The big thing on
transputers in the '80s was the 100MBit links between them. As
we used to say - "the bus is usually the bottleneck". Er,
at least once you got past 10MHz clock speeds...

I think a large part of the problem is that we have been designing more
and more complex machines so that the majority of the CPU cycles are
spent supporting the framework rather than doing the work the user
actually cares about.

Yep - although it's eminently possible to avoid this problem. I use
a lot of old programs - some going back to Win 3.1.

Really, pure 64-bit computers would have completely failed
had there not been the abiity to run a legacy O/S in a VM
or run 32 bit progs through the main O/S.

It is a bit like the amount of fuel needed to go
into space. Add one pound of payload and you need some hundred or
thousand more pounds of fuel to launch it. If you want to travel
further out into space, the amount of fuel goes up exponentially.

So Project X is trying to do something about that. There is something
about engineering culture that "wants scale" - a Saturn V is a really
impressive thing to watch, I am sure.

We
seem to be reaching the point that the improvements in processor speed
are all being consumed by the support software rather than getting to
the apps.

But things like BeOS and the like have been available, and remain
widely unused. There is some massive culture fail in play; either
that or things are just good enough.

Heck, pad/phone computers do much much *less* than desktops and
have the bullet in the market. You can't even type on them but people
still try...

--
Les Cargill

Les Cargill · Jun 29, 2013

Eric Wallin wrote:

On Friday, June 28, 2013 10:44:37 PM UTC-4, Les Cargill wrote:

Geez. Ever use virtual machines? If you break/infect one, just roll
it back.

My XP PC doesn't get infected too often, but I build PCs and do tech
support on the side for family and friends, so their problems become
my problems.

Ah! Right.

My last Win7 build took several days just to stabilize
with all the updates.

Holy cow. I would be tempted to just not do that, then.

One update put it in a permanent update loop
and that took me a while to even notice. And I've got a Win7 laptop
sitting here that for the life of me I can't get to run outside of
safe mode. I did a repair install (malware knocked out system
restore rollback) and it works fine until the .net v4 updates hit
after which it stutters and the HD disappears. I don't get all the
accolades for Win7, it's a dog.

Yeah, that's ugly. Although that's more the update infrastructure
that's ugly rather than Win7 itself.

--
Les Cargill

Les Cargill · Jun 29, 2013

Tom Gardner wrote:

On 29/06/13 03:15, Eric Wallin wrote:
snip

Speedy installation: I get a fully-patched installed
system in well under an hour. Last time MS would
let me (!) install XP, it took me well over a day
because of all the reboots.

Speedy re-installation once every 3 years: your
files are untouched so you just upgrade the o/s
(trivial precondition: put /home on a separate
disk partition). Takes < 1 hour.

And things like virtualbox make running a
Windows guest pretty simple. I'm stuck with a
Win7 host for now because of one PCI card, but
virtualbox claims to be able to publish PCI cards to
guests presently but only on a Linux host.

--
Les Cargill

New soft processor core paper publisher?

Bakul Shah

Guest

Tom Gardner

Guest

rickman

Guest

Les Cargill

Guest

Les Cargill

Guest

rickman

Guest

Les Cargill

Guest

Eric Wallin

Guest

glen herrmannsfeldt

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Tom Gardner

Guest

Eric Wallin

Guest

Les Cargill

Guest

Les Cargill

Guest

Les Cargill

Guest

Log in

Welcome to EDABoard.com

Sponsor