dead programming languages...

Martin Brown · Feb 24, 2023

On 23/02/2023 16:43, John Larkin wrote:

On Wed, 22 Feb 2023 20:33:57 -0700, Don Y
blockedofcourse@foo.invalid> wrote:

On 2/22/2023 8:15 PM, Sylvia Else wrote:
But can you afford the memory and time overheads inherent in run-time range
checks of things like array accesses?

That;s a small cost. Modern tools can often make (some) of those
tests at compile time.

The bigger problem with many newer languages is that they rely heavily
on dynamic memory allocation, garbage collection, etc.

And, most of the folks I\'ve met can\'t look at a line of arbitrary
code and tell you -- with *confidence* -- that they know what it
costs to execute, regardless of how comfortable they are with
the language in question.

Programmers typically can\'t estimate run times for chunks of their
code. They typically guess pessimistically, by roughly 10:1.

Anyone who is serious about timing code knows how to read the free
running system clock. RDTSC in Intel CPUs is very handy (even if they
warn against using it for this purpose it works very well).

Other CPUs have equivalent performance monitoring registers although
they may be hidden in some fine print in dark recesses of the manual.

There\'s nothing unreasonable about an IRQ doing a decent amount of i/o
and signal processing on a small ARM at 100 KHz, if programmed in bare
c.

These days most binary operations are single cycle and potentially less
if there are sub expressions that have no interdependencies. Divides are
still a lot slower. This makes Pade 5,4 a good choice for rational
approximations on current Intel CPU\'s the numerator and denominator
evaluate in parallel (the physical hardware is that good) and some of
the time for the divide is lost along the way.

In the old days we were warned to avoid conditional branches but today
you can get away with them and sometimes active loop unrolling will work
against you. If it has to be as fast as possible then you need to
understand every last quirk of the target CPU.

Speculative execution is a bit like quantum mechanics in that it
explores both paths simultaneously and then the wavefunction collapses
onto the path that was taken once the outcome of the comparison is
known. This significantly alters the viability of some algorithms.

The only caveat is that everything stops dead if two divides or
comparisons occur before the previous has completely finished executing
and you get a hard pipeline stall (a very bad for speed).

How fast things will go in practice can only be determined today by
putting all of the pieces together and seeing how fast it will run. On
some CPU\'s Halley\'s more complicated 3rd order method runs as fast or
sometimes faster than the more commonly used Newton-Raphson.

This has changed quite recently. On an i5-12600 they benchmark at 25
cycles and 26 cycles respectively but Halley has cubic convergence
whereas NR is only quadratic (and intrinsically less stable).

Benchmarks can be misleading too. It doesn\'t tell you how the component
will behave in the environment where you will actually use it.

Even C is becoming difficult, in some cases, to \'second guess\'.
And, ASM isn\'t immune as the hardware is evolving to provide
performance enhancing features that can\'t often be quantified,
at design/compile time.

I have a little puzzle on that one too. I have some verified correct
code for cube root running on x87 and SSE/AVX hardware and when
benchmarked aggressively for blocks of 10^7 cycles gets progressively
faster it can be by as much as a factor of two. I know that others have
seen this effect sometimes too but it only happens sometimes - usually
on dense frequently executed x87 code. These are cube root benchmarks:

ASM_log2 67 64 64 64 64 ...
ASM_2^x 107 90 90 90 90 ...
ASM_cbrt 122 123 96 64 64 ...
ASM_cbrt2 152 153 122 85 85 ...

Note how all the assembler based code speeds up a bit or even a lot
after the first few few measurements then stabilises.

ASM_log2 and ASM_2^x are very short routines.
ASM cbrt contains one FP divide. cbrt2 has no division.

Compare these with timings of pure SSE2 or AVX SIMD code:

Pow(x,1/3) 136 136 137 136 138
exp(log(x)/3) 168 164 147 147 171
N_cbrt 69 69 69 69 69

I also have to mention how impressed I am with the latest 2023 Intel
compiler - their system cbrt is more accurate than is possible with a
strict 64 bit only FP representation and fastest too!

Sneaky use of fused multiply add allows very precise refinement of the
answer using more than the 53 bit mantissa of 64bit FP.

System cbrt on GNU and MS are best avoided entirely - both are amongst
the slowest and least accurate of all those I have ever tested. The best
public algorithm for general use is by Sun for their Unix library. It is
a variant of Kahan\'s magic constant hack. BSD is slower but OK.

This is amazingly a best buy thanks to remarkable recent speed
improvements in exp and log library functions (and also sqrt).

double cbrt(double x)
{
double y, t;
if (x == 0) return x;
if (x > 0 ) y = exp(log(x)/3); else y = -exp(log(-x)/3);
t = y*y*y;
return y - y*(x-t)/(x+2*t); // Halley refinement
}

Is faster than most other algorithms at least on Intel CPUs and as
accurate as all but the very cunning Intel C++ library routine.

PS don\'t be tempted to algebraically rearrange to y*(2x+t)/(x+2t)
It might look neater but you will lose (at least) a factor of two
numerical precision by using that form of the expression.

And, while the interface looks like a traditional function call,
the developer now has to consider the possibility that the
service may be unavailable -- even if it WAS available on the
previous line of code! (developers have a tough time thinking
in pseudo-parallel, let alone *true* parallelism)

Lots of CPUs help.

To make a horrible mess. Few problems scale well on multiple CPUs.

So, the language becomes less of an issue but the system design
and OS features/mechanisms (the days of toy RTOSs are rapidly coming
to an end)

We don\'t need no stinkin\' OS!

You may not think you need one but you do need a way to share the
physical resources between the tasks that want to use them.

Cooperative multitasking can be done with interrupts for the realtime IO
but life is a lot easier with a bit of help from an RTOS.

--
Martin Brown

John Larkin · Feb 24, 2023

On Fri, 24 Feb 2023 05:23:21 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 2/24/2023 3:08 AM, Martin Brown wrote:
We had that with our village hall electricity bill.

Meter had 5 digits but central computer believed it had six - meter reading
muppet zero padded the reading at the wrong end and we got a quarterly bill
that was 9x our total usage to date (about £200k).

Amazingly the electricity company wanted this paid ASAP and got a court order
to cut off supply on the basis that it hadn\'t been paid and we were in dispute
with them. Fortunately I spotted the guys arriving and unlocked for them showed
them the meter and explained the nature of the dispute. They saw sense and rang
up their HQ to report the cock-up.

We changed electricity suppliers shortly afterwards.

Many years ago, I got a \"terse\" letter from one of my banks threatening
to withhold 10% of my interest income -- because they didn\'t have my
social security number (to report to gummit) on file.

There, at the top of the letter, beneath my name/address, was my social
security number!

Annoyed that this was *obviously* a cock-up that was going to take time
out of my day to resolve, I called the bank.

Appears any SSN that *begins* with a \'0\' was treated as \'000-00-0000\'
(or some other marker for \"not available\").

In the US (at the time I got my SSN, no idea what current practice
is), the SSN number-space was partitioned into geographical regions
(to minimize the need for a national on-line registry, no doubt).
So, everyone in my part of the country had a SSN that began with
\'0\'. The fact that most of the remaining digits were NOT \'0\'
was obviously overlooked by the cretin who wrote the code!

There is a lot of autism among programmers. Less, I think, among
hardware engineers.

John Larkin · Feb 24, 2023

On Fri, 24 Feb 2023 15:49:34 +0000, Martin Brown
<\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 23/02/2023 16:43, John Larkin wrote:
On Wed, 22 Feb 2023 20:33:57 -0700, Don Y
blockedofcourse@foo.invalid> wrote:

On 2/22/2023 8:15 PM, Sylvia Else wrote:
But can you afford the memory and time overheads inherent in run-time range
checks of things like array accesses?

That;s a small cost. Modern tools can often make (some) of those
tests at compile time.

The bigger problem with many newer languages is that they rely heavily
on dynamic memory allocation, garbage collection, etc.

And, most of the folks I\'ve met can\'t look at a line of arbitrary
code and tell you -- with *confidence* -- that they know what it
costs to execute, regardless of how comfortable they are with
the language in question.

Programmers typically can\'t estimate run times for chunks of their
code. They typically guess pessimistically, by roughly 10:1.

Anyone who is serious about timing code knows how to read the free
running system clock. RDTSC in Intel CPUs is very handy (even if they
warn against using it for this purpose it works very well).

I like to raise and lower a port pin and scope it. One sees
interesting patterns.

Other CPUs have equivalent performance monitoring registers although
they may be hidden in some fine print in dark recesses of the manual.

There\'s nothing unreasonable about an IRQ doing a decent amount of i/o
and signal processing on a small ARM at 100 KHz, if programmed in bare
c.

These days most binary operations are single cycle and potentially less
if there are sub expressions that have no interdependencies. Divides are
still a lot slower. This makes Pade 5,4 a good choice for rational
approximations on current Intel CPU\'s the numerator and denominator
evaluate in parallel (the physical hardware is that good) and some of
the time for the divide is lost along the way.

In the old days we were warned to avoid conditional branches but today
you can get away with them and sometimes active loop unrolling will work
against you. If it has to be as fast as possible then you need to
understand every last quirk of the target CPU.

Speculative execution is a bit like quantum mechanics in that it
explores both paths simultaneously and then the wavefunction collapses
onto the path that was taken once the outcome of the comparison is
known. This significantly alters the viability of some algorithms.

The only caveat is that everything stops dead if two divides or
comparisons occur before the previous has completely finished executing
and you get a hard pipeline stall (a very bad for speed).

How fast things will go in practice can only be determined today by
putting all of the pieces together and seeing how fast it will run. On
some CPU\'s Halley\'s more complicated 3rd order method runs as fast or
sometimes faster than the more commonly used Newton-Raphson.

This has changed quite recently. On an i5-12600 they benchmark at 25
cycles and 26 cycles respectively but Halley has cubic convergence
whereas NR is only quadratic (and intrinsically less stable).

Benchmarks can be misleading too. It doesn\'t tell you how the component
will behave in the environment where you will actually use it.

Even C is becoming difficult, in some cases, to \'second guess\'.
And, ASM isn\'t immune as the hardware is evolving to provide
performance enhancing features that can\'t often be quantified,
at design/compile time.

I have a little puzzle on that one too. I have some verified correct
code for cube root running on x87 and SSE/AVX hardware and when
benchmarked aggressively for blocks of 10^7 cycles gets progressively
faster it can be by as much as a factor of two. I know that others have
seen this effect sometimes too but it only happens sometimes - usually
on dense frequently executed x87 code. These are cube root benchmarks:

ASM_log2 67 64 64 64 64 ...
ASM_2^x 107 90 90 90 90 ...
ASM_cbrt 122 123 96 64 64 ...
ASM_cbrt2 152 153 122 85 85 ...

Note how all the assembler based code speeds up a bit or even a lot
after the first few few measurements then stabilises.

ASM_log2 and ASM_2^x are very short routines.
ASM cbrt contains one FP divide. cbrt2 has no division.

Compare these with timings of pure SSE2 or AVX SIMD code:

Pow(x,1/3) 136 136 137 136 138
exp(log(x)/3) 168 164 147 147 171
N_cbrt 69 69 69 69 69

I also have to mention how impressed I am with the latest 2023 Intel
compiler - their system cbrt is more accurate than is possible with a
strict 64 bit only FP representation and fastest too!

Sneaky use of fused multiply add allows very precise refinement of the
answer using more than the 53 bit mantissa of 64bit FP.

System cbrt on GNU and MS are best avoided entirely - both are amongst
the slowest and least accurate of all those I have ever tested. The best
public algorithm for general use is by Sun for their Unix library. It is
a variant of Kahan\'s magic constant hack. BSD is slower but OK.

This is amazingly a best buy thanks to remarkable recent speed
improvements in exp and log library functions (and also sqrt).

double cbrt(double x)
{
double y, t;
if (x == 0) return x;
if (x > 0 ) y = exp(log(x)/3); else y = -exp(log(-x)/3);
t = y*y*y;
return y - y*(x-t)/(x+2*t); // Halley refinement
}

Is faster than most other algorithms at least on Intel CPUs and as
accurate as all but the very cunning Intel C++ library routine.

PS don\'t be tempted to algebraically rearrange to y*(2x+t)/(x+2t)
It might look neater but you will lose (at least) a factor of two
numerical precision by using that form of the expression.
And, while the interface looks like a traditional function call,
the developer now has to consider the possibility that the
service may be unavailable -- even if it WAS available on the
previous line of code! (developers have a tough time thinking
in pseudo-parallel, let alone *true* parallelism)

Lots of CPUs help.

To make a horrible mess. Few problems scale well on multiple CPUs.

So, the language becomes less of an issue but the system design
and OS features/mechanisms (the days of toy RTOSs are rapidly coming
to an end)

We don\'t need no stinkin\' OS!

You may not think you need one but you do need a way to share the
physical resources between the tasks that want to use them.

In most electronic instruments, an IRQ can do the realtime loop, and
the main program can just be a simple state-machine loop with flags
to/from some interrupt handlers. That\'s even better on a dual-core
ARM.

I\'ve written four RTOS\'s so far but have mostly decided that I don\'t
need them. Most electronic instruments don\'t interact with local
screens or keyboards or printers so don\'t need any persistant
contexts.

Cooperative multitasking can be done with interrupts for the realtime IO
but life is a lot easier with a bit of help from an RTOS.

John Larkin · Feb 24, 2023

On Fri, 24 Feb 2023 06:07:48 GMT, Jan Panteltje <alien@comet.invalid>
wrote:

On a sunny day (Thu, 23 Feb 2023 08:54:20 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
266fvhl8gae2sdj0ecp7n511phphmkg47i@4ax.com>:

On Thu, 23 Feb 2023 06:34:25 GMT, Jan Panteltje <alien@comet.invalid
wrote:

On a sunny day (Wed, 22 Feb 2023 11:05:30 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
3opcvh111k7igirlsm6anc8eekalofvtcj@4ax.com>:

https://en.wikipedia.org/wiki/Timeline_of_programming_languages

Now I\'m told that we should be coding hard embedded products in C++ or
Rust.

Cplushplush is a crime against humanity
C will do better
But asm is the thing, it will always be there
and gives you full control.
It is not that hard to write an integer math library in asm..

I did that for the 68K. The format was signed 32.32. That worked great
for control systems. Macros made it look like native instructions.

But asm is clumsy for risc CPUs. Plain bare-metal c makes sense for
small instruments.

True, I have no experience with asm on my Raspberries for example.
But lots of C, gcc is a nice compiler.
Most (all?) things I wrote for x86 in C also compile and run on the Raspberries.
Some minor make file changes were required for the latest gcc version..

I\'m planning a series of PoE aerospace instruments based on Pi Pico
and the WizNet ethernet chip. You could help if you\'re interested.

The dev system will be a Pi4.

https://www.amazon.com/MARSTUDY-Raspberry-Model-Ultimate-Pre-Installed/dp/B0BB912MV1/ref=sr_1_3?crid=3FKMZETXZJ1B9&keywords=marstudy+raspberry+pi+4+model+b+ultimate+starter+kit&qid=1677259015&sprefix=MARSTUDY+Raspberry+Pi+4+Model+B%2Caps%2C143&sr=8-3

I ordered one and it was up and running their dev software in 10
minutes.

Liz Tuddenham · Feb 24, 2023

Sylvia Else <sylvia@email.invalid> wrote:

On 24-Feb-23 2:00 pm, Don Y wrote:

That was my first (actually, second) experience writing code.
On Hollerith cards, of course.Â Amusing to think your proficiency in
operating the punch was as much a factor in your \"productivity\"
as was your programming skills!

Using a punch? Sheer luxury. We were using coding sheets that managed to
be garbled by punch operators.

Fortunately, this was before the time when most people could type, and
the few machines available to students were not much used, other than by me.

Also, the place I was working during the holidays was a time-sharing
service (remember those?), so I did most of the COBOL work there.

I learned Algol in the 1970s to sort out the results from a data logger
(program on cards, data on punched tape). Most of my life was spent
working in analogue until, about 5 years ago when I needed to do some
programming in PHP. To my surprise I hardly had to make any changes to
Algol for the machine to accept it as PHP.

--
~ Liz Tuddenham ~
(Remove the \".invalid\"s and add \".co.uk\" to reply)
www.poppyrecords.co.uk

John Robertson · Feb 24, 2023

On 2023/02/24 3:30 a.m., Don Y wrote:

On 2/24/2023 1:23 AM, John Robertson wrote:
Poketa, poketa, poketa...

Nothing like the sound of an EM machine running happily!

Yeah, but the maintenance is a killer!Â Burnishing and regapping all
those frigging contacts!Â Particularly if you\'ve got someone in the
family \"addicted\" to itÂ :<Â Or, if you get obsessive about keeping
all the rubber clean, targets working properly, replace burned bulbs,
etc.

(I only have one, here, and dismantled it \"temporarily\" many years ago.
SWMBO has given up asking me to set it back up realizing that it really
eats up a lot of space for the entertainment it provides!Â :>Â )

By comparison, videos are so much less headache -- but almost as big!

If you need schematics or advice I may be able to help...and there is
still some life on rec.games.pinball!

John :-#)#
--
(Please post followups or tech inquiries to the USENET newsgroup)
John\'s Jukes Ltd.
#7 - 3979 Marine Way, Burnaby, BC, Canada V5J 5E3
(604)872-5757 (Pinballs, Jukes, Video Games)
www.flippers.com
\"Old pinballers never die, they just flip out.\"

John Larkin · Feb 24, 2023

On Thu, 23 Feb 2023 23:06:49 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 2/23/2023 10:09 PM, Sylvia Else wrote:
Yes, we had to submit decks at an \"input window\". Then, wander around to
pick up our output whenever the job managed to get run. Really annoying
to discover (an hour later) that the job abended because of a bad JCL card!

The time-sharing service I worked at provided dial-up access, at 110 baud. Even
in-house, we were mostly limited to 110 baud teletypes, which were in a
separate room because they were so noisy.

Yes. My first experience with a computer was with an ASR-33
and 110/300 baud acoustical coupler. You \"saved\" your files to
punched paper tape -- else you\'d have to type them in, again,
tomorrow!

You learn how to be as efficient as *practical* with the
tools you are given. E.g., when I shared a development
system with two other engineers, I had to spend a lot of time
reviewing *listings* so I knew what I wanted to change when
I had an opportunity to access my files (my *floppy*).

And, how to organize your code so you didn\'t have to burn a
complete set of EPROMs if your changes could, instead, be
localized to a single device (i.e., constrain the linkage editor).
Finally, how to get the hardware to tell you where the software
was executing without the benefit of a logic analyzer or ICE.

Still, you were lucky if you could turn the crank *twice* in
an 8-hour day!

One consequence of hard-to-edit and printed listings and eproms (or
one-time roms!) is that people would *read* their code before they ran
it, and bugs were rare. Most stuff worked first try. Similarly, I
print schematics and look them over carefully, which tends to make the
first board sellable. I know of one giant organization that schedules
*six* PCB iterations.

I wrote an RTOS while visiting a friend in Juneau. Longhand, and
mailed back to the factory a page at a time. They punched cards and
assembled it. I\'m told it had one bug.

Cards were a leap past punched tape. I hacked FOCAL and the PDP-11
assembler to read cards.

Hard to imagine now.

I have an ASR-33, here (but no acoustical coupler). In use,
the sound is unmistakably familiar -- like a blast from the past.
The sort of familiarity that an old electro-mechanical pinball
machine elicits.

Teletypes were horrible machines. The first \"glass teletypes\" and
floppy-based editors were a vast improvement.

I shot some Spra-Kleen contact cleaner into an ASR-33 and all the
polycarb parts instantly shattered.

Don Y · Feb 24, 2023

On 2/24/2023 8:49 AM, Martin Brown wrote:
> To make a horrible mess. Few problems scale well on multiple CPUs.

Except, of course, truly parallel ones. :>

You may not think you need one but you do need a way to share the physical
resources between the tasks that want to use them.

As well as providing a secure environment, managing timeliness,
abstracting architectural issues in ways that remove that
requirement from the application, supporting future workloads, etc.

Do you let your code decide which core it should run on?
(\"Cores? What are those??\") Which task is \"best\" suited to use
the processor now? And now? And...

Cooperative multitasking can be done with interrupts for the realtime IO but
life is a lot easier with a bit of help from an RTOS.

It makes for brittle implementations. To reduce latency, you want to
give up the processor often (because YOU don\'t want to know about the
timeliness needs of other tasks in making YOUR decisions). So, you
spend lots of unnecessary time switching tasks.

Or, risk having higher latencies.

This leads to people rejiggering their code. Or, rejiggering
\"priorities\" on naive RTOSs (a tasks priority should be inherently
designed, not something that needs to be \"tuned\" based on observations
of how things perform)

Don Y · Feb 24, 2023

On 2/24/2023 10:31 AM, John Robertson wrote:

If you need schematics or advice I may be able to help...and there is still
some life on rec.games.pinball!

Thanks for the offer but I\'ve got the drawing set. (I used to repair
pin-tables as a kid)

Like everything, there\'s just not enough *time* to do everything that
*should* get done. So, you trade-away the things that don\'t seem quite
as important...

[The rest of my machines have been gifted to friends, over the years,
for similar reasons. Though I *do* miss Robotron! And, even Tempest.]

Don Y · Feb 24, 2023

On 2/24/2023 10:25 AM, Liz Tuddenham wrote:

I learned Algol in the 1970s to sort out the results from a data logger
(program on cards, data on punched tape). Most of my life was spent
working in analogue until, about 5 years ago when I needed to do some
programming in PHP. To my surprise I hardly had to make any changes to
Algol for the machine to accept it as PHP.

If you look at a \"family tree\" of programming languages, you quickly
realize that there\'s a lot of incest involved! Far too many languages are just
refinements of others or collections of \"The Best Of\" features.

OOP was a different mindset -- one that some folks can\'t wrap their
minds around.

Folks in the desktop world had a tough time adjusting to multitasking
(as most were writing monolithic programs, not designing \"systems\").
Esp to preemptive multitasking (\"you mean the time between statements
22 and 23 can be, conceivably, HOURS??\")

I expect there will be a lot of growing pains as folks start dealing
with distributed applications/systems -- where you have true parallelism,
transport delays, service availability issues, etc.

[Folks still forget that malloc can fail; how will they deal with
the notion that other \"functions/procedures\" can fail because the
box that is supposed to implement them is \"unavailable\" -- even if
it *was*, two opcode fetches earlier?]

Don Y · Feb 24, 2023

On 2/24/2023 8:49 AM, Martin Brown wrote:

Anyone who is serious about timing code knows how to read the free running
system clock. RDTSC in Intel CPUs is very handy (even if they warn against
using it for this purpose it works very well).

But, if you are running code on anything other than bare metal, you are
seeing a composite effect from the OS, other coexecuting tasks, etc.

As most of the code I\'ve written is RT (all of the products I\'ve designed
but none of the desktop tools I\'ve written), I have to have a good handle
on how an algorithm will perform, in the time domain, over the range of
different operating conditions. Hence, the *design* of the algorithm
becomes keenly important.

And, while there are hardware mechanisms (on SOME processors and to varying
degrees) that act as performance enhancers, there are also things that
work to defeat these mechanisms (e.g., different tasks competing for cache
lines).

In a desktop environment, the jiffy is an eternity; not so in embedded
designs. So, while a desktop app can run \"long enough\" to make effective
use of some of these mechanisms, an embedded one may actually just be
perpetually \"muddying the waters\". (if a line of YOUR task\'s code gets
installed in the cache, but pieces of 30 other tasks run before you get
another opportunity, how likely is the cache to still be \"warm\", from
YOUR point of view?)

Other CPUs have equivalent performance monitoring registers although they may
be hidden in some fine print in dark recesses of the manual.

You often need to understand more than the CPU to be able to guesstimate
(or measure) performance. E.g., at the most abstract implementation
levels in my design, users write \"scripts\" in a JITed language (Limbo).
As it does a lot of dynamic object allocation, behind the scenes, the
GC has to run periodically.

So, if you happen to measure performance of an algorithm -- but the GC ran
during some number of those iterations -- your results can be difficult
to interpret; what\'s the performance of the algorithm vs. that of the
supporting *system*?

These days most binary operations are single cycle and potentially less if
there are sub expressions that have no interdependencies. Divides are still a
lot slower. This makes Pade 5,4 a good choice for rational approximations on
current Intel CPU\'s the numerator and denominator evaluate in parallel (the
physical hardware is that good) and some of the time for the divide is lost
along the way.

But, that\'s still only for \"bigger\" processors. Folks living in 8051-land
are still looking at subroutines to perform math operations (in anything
larger than a byte).

In the old days we were warned to avoid conditional branches but today you can
get away with them and sometimes active loop unrolling will work against you.
If it has to be as fast as possible then you need to understand every last
quirk of the target CPU.

And, then you are intimately bound to THAT target. You have to rethink your
implementation when you want (or need!) to move to another (economic reasons,
parts availability, etc.).

I find it best (embedded systems) to just design with good algorithms,
understand how their performance can vary (and try to avoid application
in those cases where it will!) and rely on the RTOS to ensure the
important stuff gets done in a timely manner.

If your notion of how your \"box\" works is largely static, then you will
likely end up with more hardware than you need. If you are cost conscious,
then you\'re at a disadvantage if you can\'t decouple things that MUST
get done, now, from things that you would LIKE to get done, now.

[This is the folly that most folks fall into with HRT designs; they
overspecify the hardware because they haven\'t considered how to
address missed deadlines. Because many things that they want to think
of as being \"hard\" deadlines really aren\'t.]

How fast things will go in practice can only be determined today by putting all
of the pieces together and seeing how fast it will run.

Exactly. Then, wonder what might happen if something gets *added* to the
mix (either a future product enhancement -- but same old hardware -- or
interactions with external agencies in ways that you hadn\'t anticipated).

[IMO, this is where the future of embedded systems lies -- especially with
IoT. Folks are going to think of all sorts of ways to leverage functionality
in devices B and F to enhance the performance of a system neither the Bnor F
designers had envisioned at their respective design times!]

[[E.g., I use an EXISTING security camera to determine when the mail has been
delivered. Why try to \"wire\" the mailbox with some sort of \"mail sensor\"??]]

Benchmarks can be misleading too. It doesn\'t tell you how the component will
behave in the environment where you will actually use it.

Yes. So, \"published\" benchmarks always have to be taken with a grain
of salt.

In the 70/80\'s (heyday of MPU diversity), vendors all had their own
favorite benchmarks that they would use to tout how great THEIR
product was, vs. their competitor(s) -- cuz they all wanted design-ins.
We\'d run our own benchmarks and make OUR decisions based on how
the product performed running the sorts of code *we* wanted to run
on the sorts of hardware COSTS that we wanted to afford.

Even C is becoming difficult, in some cases, to \'second guess\'.
And, ASM isn\'t immune as the hardware is evolving to provide
performance enhancing features that can\'t often be quantified,
at design/compile time.

I have a little puzzle on that one too. I have some verified correct code for
cube root running on x87 and SSE/AVX hardware and when benchmarked aggressively
for blocks of 10^7 cycles gets progressively faster it can be by as much as a
factor of two. I know that others have seen this effect sometimes too but it
only happens sometimes - usually on dense frequently executed x87 code. These
are cube root benchmarks:

How have you eliminated the effects of the rest of the host system
from your measurements? Are you sure other tasks aren\'t muddying
the cache? Or, forcing pages out of RAM? (Or, *allowing* you better
access to cache and physical memory?)

System cbrt on GNU and MS are best avoided entirely - both are amongst the
slowest and least accurate of all those I have ever tested. The best public
algorithm for general use is by Sun for their Unix library. It is a variant of
Kahan\'s magic constant hack. BSD is slower but OK.

Most applications aren\'t scraping around in the mud, trying to eek out a
few processor cycles. Instead, their developers are more worried about
correctness and long term \"support\"; I have algorithms that start with
large walls of commentary that essentially says: \"Here, There be Dragon\'s.
Don\'t muck with this unless you completely understand all of the following
documentation and the reasons behind each of the decisions made in THIS
implementation.\"

[So, when some idiot DOES muck with it, I can just grin and say, \"Well, I
guess YOU\'LL have to figure out what you did wrong, eh? Or, figure out how
to restore the original implementation and salvage any other hacks you
may have needed.\"]

We don\'t need no stinkin\' OS!

You may not think you need one but you do need a way to share the physical
resources between the tasks that want to use them.

Single-threaded and trivial designs can usually live without an OS.
I still see people using \"superloops\" (which should have gone away
in the 70\'s as they are brittle to maintain).

But, you\'ve really constrained a product\'s design if it doesn\'t
exploit the abstractions that an OS supplies.

I built a box, some years ago, and recreated a UNIX-like API
for the application. Client was blown away when I was able to
provide the same functionality that was available on the \"console
port\" (RS232) on ALL the ports. Simultaneously. By adding *one*
line of code for each port!

Cuz, you know, sooner or later something like that would be on
the wish list and a more naive implementation would lead to
a costly refactoring.

Cooperative multitasking can be done with interrupts for the realtime IO but
life is a lot easier with a bit of help from an RTOS.

Especially if you truly are operating with timeliness constraints.

How do you know that each task met its deadline? Can you even quantify
them? What do you do when a task *misses* its deadline? How do you
schedule your tasks\' execution to optimize resource utilization?
What happens when you add a task? Or, change a task\'s workload?

Don Y · Feb 24, 2023

On 2/24/2023 3:25 AM, upsidedown@downunder.com wrote:

In industrial control systems in addition to the actual measured value
(e.g. from an ADC) you also have a separate data quality variable and
often also a time stamp (sometimes with a time quality field,
resolution, synched etc.).

In the 8/16/32 bit data quality variable, you can report e.g.
overflow, underflow, frozen (old) values, faulty value (e.g. ADC
reference missing or input cable grounded, manually forced values etc.
The actual value, data quality and time stamp are handled as a unit
through the application.

So, you pass structs/rudimentary objects instead of \"data values\".

If some internal calculation causes overflow, the overflow bit can be
set in the data quality variable or some special value replaced and a
derived value data quality bit can be set.

When the result is to be used for control, the data quality bits can
be analyzed and determined, if the value can be used for control or if
some other action must be taken. Having a data quality word with every
variable makes it possible have special handling of some faulty
signals without shutting down the whole system after first error in
some input or calculations. Such systems can run for years without
restarts.

I\'ve always worked with a strict definition of the accuracy and precision
at which the system must be calibrated. So, at the point of data acquisition,
you could determine whether the measurement \"made sense\" or not; if not,
then something is broken.

I built a system that allowed for sensors/actuators of different
capabilities to co-operate. But, assumed that they were true to
their specified capabilities. E.g., if you could measure over
a range of [X,Y] to a precision of Z, then I knew that all
values had to be in that range (unless broken) and that I could
speak with certainty about the \"real\" sensed value only to the extent
defined by Z (absolute or percent of value).

I didn\'t have to combine measurements from different sensors, etc.

In IEEE floats, there are some rudimentary possibilities to inform
about overflows (+/-infinity) Not a Number (NaN), but using a separate
data quality variable with every signals allow a much wider selection
of special cases to be monitored.

In a more common application, I\'ve seen scant few devices with serial
ports that make \"error information\" available to levels above the driver
level. Ditto for things like keyboard buffer overruns, etc.

As a result, handling those conditions is remanded to the driver
(cuz nothing else is aware of them). So, you end up with a
\"worst case\" remedy for faults instead of a more tolerant one.

John Larkin · Feb 24, 2023

On Fri, 24 Feb 2023 12:43:59 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 2/24/2023 3:25 AM, upsidedown@downunder.com wrote:
In industrial control systems in addition to the actual measured value
(e.g. from an ADC) you also have a separate data quality variable and
often also a time stamp (sometimes with a time quality field,
resolution, synched etc.).

In the 8/16/32 bit data quality variable, you can report e.g.
overflow, underflow, frozen (old) values, faulty value (e.g. ADC
reference missing or input cable grounded, manually forced values etc.
The actual value, data quality and time stamp are handled as a unit
through the application.

So, you pass structs/rudimentary objects instead of \"data values\".

Or do everything static and don\'t pass anything.

wmartin · Feb 24, 2023

On 2/23/23 20:53, Don Y wrote:

On 2/23/2023 9:29 PM, Sylvia Else wrote:
On 24-Feb-23 2:00 pm, Don Y wrote:

That was my first (actually, second) experience writing code.
On Hollerith cards, of course.Â Amusing to think your proficiency in
operating the punch was as much a factor in your \"productivity\"
as was your programming skills!

Using a punch? Sheer luxury. We were using coding sheets that managed
to be garbled by punch operators.

Ah.Â I always found it annoying as the keypunch machines were always
mounted
in such a way that you had to use them *standing up*.Â I think the theory
was that it would deter folks from monopolizing a scarce resource.

A savvy user would set up a program card to make the tedium a little more
manageable!

Fortunately, this was before the time when most people could type, and
the few machines available to students were not much used, other than
by me.

Also, the place I was working during the holidays was a time-sharing
service (remember those?), so I did most of the COBOL work there.

Yes, we had to submit decks at an \"input window\".Â Then, wander around to
pick up our output whenever the job managed to get run.Â Really annoying
to discover (an hour later) that the job abended because of a bad JCL card!

Moving to a teletypewriter (Trendata 1200\'s) was a huge step up in
efficiency.Â Then, DECwriters.Â The only glass TTY that I used in
school was the Imlac (PDS-1).Â So, it was painfully difficult to be
productive (and you ended up carting reams of 132 column paper around
with you!)

By contrast, at work, we had glass TTYs -- VT100s on the \'11 and
something> on the development systems (an MDS800 and some other
one -- maybe from Tek?).Â The i4004 development was done on the
11 (and with \"pocket assemblers\" -- index cards with opcode maps
that you carried in your pocket/wallet).Â To think that we\'ve gone
from instruction times of 10 *microseconds* to *nanoseconds*!

i4004...that brings back memories/nightmares of all-night sessions...
:-( toggle switch code entry!

Three Jeeps · Feb 25, 2023

On Thursday, February 23, 2023 at 6:43:47â¯PM UTC-5, Don Y wrote:

On 2/23/2023 12:42 PM, three_jeeps wrote:
In my world (safety critical sw systems, e.g. flight control, medical), Ada
is still used - probably on life support tho. More usage in Europe than
USA. C is out and out dangerous in this environment even when standards such
as MISRA-C are used.
Every language is dangerous when the practitioners don\'t understand the
tools of their trade, sufficiently. Would you hand a soldering iron to
an accountant and expect him to make a good joint?
What do you me by \'coding hard embedded products\'? you mean \'hard real-time
embedded systems\' eg. system timing and thread scheduling must be
completely deterministic?
HRT only means that there is no value to continuing work on the
problem once the deadline has passed.

You could write your HRT code in LISP, COBOL, <whatever>. If it
meets the deadline (even if only occasionally), then so be it.
If not, pull the plug AT the deadline cuz there\'s nothing more
that can be done TOWARDS ACHIEVING THE GOAL.

The designer has to decide how important the deadlines are.
Missile defense likely assigns a high cost to missing a deadline.
Yet, I\'m sure there is some accepted level of \"missed deadlines\"
formally specified in the design documents for those systems.
You don\'t shut down the defense system when *a* deadline is
missed (i.e., when the incoming armament has passed beyond the
point where you can counteract it). Rather, you say \"Ooops\"
and move your resources on to addressing other deadlines
(incoming armaments) -- instead of wasting effort on a lost
cause!

HRT is an excuse people use to avoid thinking about what to
do WHEN they miss a deadline (how much MORE resources would
you devote to being 100.00% sure you don\'t miss *any*? Is
this a reasonable cost to assume for the benefit afforded?)

I politely disagree. The system requirements specify system properties that need to be met not the software engineer. For example end to end latency of a control signal from a yoke to a flight control surface. It is up to the sw architect to define a sw arch that will meet those requirements. Major system problems occur when sw engineers make decisions about how threads are scheduled (in the small) in order to meet a number of system performance requirements that are \'in the large\'.
The architect specifies what happens if a deadline is missed. More importantly the scheduling approach should guarantee that deadlines are not missed. That is the goal of developing an architecture that preserves determinism..
I don\'t care what language the embodies the implementation. As long as the performance architectural properties are met.
The ADA language and run-time environment, if used correctly, goes a long way to ensuring determinism.
When stressed, systems will miss deadlines. The point is to make this a extremely rare occasion and to provide mechanisms to deal with missed deadlines.if done properly, the most that should happen is perhaps a jitter in operation. If done badly, the whole system falls apart.

John Larkin · Feb 25, 2023

On Thu, 23 Feb 2023 16:43:33 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 2/23/2023 12:42 PM, three_jeeps wrote:
In my world (safety critical sw systems, e.g. flight control, medical), Ada
is still used - probably on life support tho. More usage in Europe than
USA. C is out and out dangerous in this environment even when standards such
as MISRA-C are used.

Every language is dangerous when the practitioners don\'t understand the
tools of their trade, sufficiently. Would you hand a soldering iron to
an accountant and expect him to make a good joint?

What do you me by \'coding hard embedded products\'? you mean \'hard real-time
embedded systems\' eg. system timing and thread scheduling must be
completely deterministic?

HRT only means that there is no value to continuing work on the
problem once the deadline has passed.

You could write your HRT code in LISP, COBOL, <whatever>. If it
meets the deadline (even if only occasionally), then so be it.
If not, pull the plug AT the deadline cuz there\'s nothing more
that can be done TOWARDS ACHIEVING THE GOAL.

The designer has to decide how important the deadlines are.
Missile defense likely assigns a high cost to missing a deadline.
Yet, I\'m sure there is some accepted level of \"missed deadlines\"
formally specified in the design documents for those systems.
You don\'t shut down the defense system when *a* deadline is
missed (i.e., when the incoming armament has passed beyond the
point where you can counteract it). Rather, you say \"Ooops\"
and move your resources on to addressing other deadlines
(incoming armaments) -- instead of wasting effort on a lost
cause!

HRT is an excuse people use to avoid thinking about what to
do WHEN they miss a deadline (how much MORE resources would
you devote to being 100.00% sure you don\'t miss *any*? Is
this a reasonable cost to assume for the benefit afforded?)

Simple: never miss a deadline. If I run a control loop in a 100 KHz
periodic interrupt, make sure it never runs for more than 8
microseconds.

Don Y · Feb 25, 2023

On 2/24/2023 7:15 PM, Three Jeeps wrote:

On Thursday, February 23, 2023 at 6:43:47â¯PM UTC-5, Don Y wrote:
On 2/23/2023 12:42 PM, three_jeeps wrote:
In my world (safety critical sw systems, e.g. flight control, medical),
Ada is still used - probably on life support tho. More usage in Europe
than USA. C is out and out dangerous in this environment even when
standards such as MISRA-C are used.
Every language is dangerous when the practitioners don\'t understand the
tools of their trade, sufficiently. Would you hand a soldering iron to an
accountant and expect him to make a good joint?
What do you me by \'coding hard embedded products\'? you mean \'hard
real-time embedded systems\' eg. system timing and thread scheduling must
be completely deterministic?
HRT only means that there is no value to continuing work on the problem
once the deadline has passed.

You could write your HRT code in LISP, COBOL, <whatever>. If it meets the
deadline (even if only occasionally), then so be it. If not, pull the plug
AT the deadline cuz there\'s nothing more that can be done TOWARDS
ACHIEVING THE GOAL.

The designer has to decide how important the deadlines are. Missile
defense likely assigns a high cost to missing a deadline. Yet, I\'m sure
there is some accepted level of \"missed deadlines\" formally specified in
the design documents for those systems. You don\'t shut down the defense
system when *a* deadline is missed (i.e., when the incoming armament has
passed beyond the point where you can counteract it). Rather, you say
\"Ooops\" and move your resources on to addressing other deadlines (incoming
armaments) -- instead of wasting effort on a lost cause!

HRT is an excuse people use to avoid thinking about what to do WHEN they
miss a deadline (how much MORE resources would you devote to being 100.00%
sure you don\'t miss *any*? Is this a reasonable cost to assume for the
benefit afforded?)

I politely disagree. The system requirements specify system properties that
need to be met not the software engineer.

And the software engineer is the one who *meets* them.

For example end to end latency of
a control signal from a yoke to a flight control surface. It is up to the
sw architect to define a sw arch that will meet those requirements. Major
system problems occur when sw engineers make decisions about how threads are
scheduled (in the small) in order to meet a number of system performance
requirements that are \'in the large\'.

You schedule threads based on timeliness (deadlines) and \"value\" of
meeting/missing those deadlines. Surely not EVERY task is \"most
important\"?

That\'s why you have a specification. The *implementation* mus
MEET the specification, not some arbitrary (personal) assessment
of what\'s important.

The architect specifies what happens
if a deadline is missed. More importantly the scheduling approach should
guarantee that deadlines are not missed.

That presumes ALL deadlines *can* be met.

The scheduling algorithm will determine which deadlines can
be met and which can\'t. E.g., you may not be able to utilize
the entire processor\'s capabilities in order to ensure that.

That is the goal of developing an
architecture that preserves determinism.

We differ only in that you assume there is a \"software architect\"
while I call that person a \"software engineer\". His role is
to interpret specifications in much the same way that a \"hardware
engineer\" interprets the specifications that drive the HW design
(and architecture).

Perhaps you are more accustomed to large design teams with
fine-grained separation of responsibilities?

I don\'t care what language the
embodies the implementation. As long as the performance architectural
properties are met. The ADA language and run-time environment, if used
correctly, goes a long way to ensuring determinism. When stressed, systems
will miss deadlines. The point is to make this a extremely rare occasion and
to provide mechanisms to deal with missed deadlines.

But the mechanism need not be a formal part of the implementation
language -- any more than task management needs to be a first-class
language function.

*something* tells the application that a deadline has been missed.
There\'s nothing in the *language* that is aware of time.

It can just as easily tell something ELSE that the deadline has been
missed and arrange for a \"deadline handler\" to be invoked
(that\'s how my system addresses deadlines; scheduling being
a different issue)

if done properly, the
most that should happen is perhaps a jitter in operation. If done badly,
the whole system falls apart.

Don Y · Feb 25, 2023

On 2/24/2023 4:45 PM, wmartin wrote:

On 2/23/23 20:53, Don Y wrote:
By contrast, at work, we had glass TTYs -- VT100s on the \'11 and
something> on the development systems (an MDS800 and some other
one -- maybe from Tek?).Â The i4004 development was done on the
11 (and with \"pocket assemblers\" -- index cards with opcode maps
that you carried in your pocket/wallet).Â To think that we\'ve gone
from instruction times of 10 *microseconds* to *nanoseconds*!

i4004...that brings back memories/nightmares of all-night sessions... :-(
toggle switch code entry!

We burned small ROMs and plugged into prototype... and hoped! :>

I\'ve only had to \"toggle\" code into minicomputers. You quickly
learn to make your programs REALLY TINY as each \"STORE\" operation
is a potential for a screwup! (and, reading back the code as
a string of lights isn\'t very convenient!)

I am just amazed (and chagrined!) at how slow the processor
was -- yet how *fast* we considered it to be! <frown> I
guess everything truly *is* relative!

John Larkin · Feb 25, 2023

On Fri, 24 Feb 2023 21:06:13 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 2/24/2023 4:45 PM, wmartin wrote:
On 2/23/23 20:53, Don Y wrote:
By contrast, at work, we had glass TTYs -- VT100s on the \'11 and
something> on the development systems (an MDS800 and some other
one -- maybe from Tek?). The i4004 development was done on the
11 (and with \"pocket assemblers\" -- index cards with opcode maps
that you carried in your pocket/wallet). To think that we\'ve gone
from instruction times of 10 *microseconds* to *nanoseconds*!

i4004...that brings back memories/nightmares of all-night sessions... :-(
toggle switch code entry!

We burned small ROMs and plugged into prototype... and hoped! :

I\'ve only had to \"toggle\" code into minicomputers. You quickly
learn to make your programs REALLY TINY as each \"STORE\" operation
is a potential for a screwup! (and, reading back the code as
a string of lights isn\'t very convenient!)

I am just amazed (and chagrined!) at how slow the processor
was -- yet how *fast* we considered it to be! <frown> I
guess everything truly *is* relative!

I designed a CPU from MSI TTL logic, for a marine data logger. It had
a 20 KHz 4-phase clock and three opcodes.

Gerhard Hoffmann · Feb 25, 2023

Am 25.02.23 um 05:06 schrieb Don Y:

On 2/24/2023 4:45 PM, wmartin wrote:
On 2/23/23 20:53, Don Y wrote:
By contrast, at work, we had glass TTYs -- VT100s on the \'11 and
something> on the development systems (an MDS800 and some other
one -- maybe from Tek?).Â The i4004 development was done on the
11 (and with \"pocket assemblers\" -- index cards with opcode maps
that you carried in your pocket/wallet).Â To think that we\'ve gone
from instruction times of 10 *microseconds* to *nanoseconds*!

i4004...that brings back memories/nightmares of all-night sessions...
:-( toggle switch code entry!

We burned small ROMs and plugged into prototype... and hoped!Â :

In my case, i2708. A friend and I wrote a floating point package
for the 8080 this way. Later, it was replaced by an AMD Am9511.
That IC took a lot of iterations, they were not bug compatible.
What healed one version provoked errors in another one.

Later we got in-circuit emulators from tek, intel and HP.
The Pascal compiler for the HP64000 was the crappiest piece of
it that I ever have seen.

I\'ve only had to \"toggle\" code into minicomputers.Â You quickly
learn to make your programs REALLY TINY as each \"STORE\" operation
is a potential for a screwup!Â (and, reading back the code as
a string of lights isn\'t very convenient!)

I am just amazed (and chagrined!) at how slow the processor
was -- yet how *fast* we considered it to be!Â <frown>Â I
guess everything truly *is* relative!

At the univ we had a PDP11/40E. Someone wrote a p-code machine
for it in microprogram. It was blazing fast for the time, at
least when it wasn\'t too hot.
There were only 5 PDP11/40E in the world and we had 2 of them,
both CPUs on the same UniBus, apart from tests only one working.
E means microprogrammable.
Usually it ran an obscure operating system called Unix V6,
tape directly from K&R.

Gerhard

dead programming languages...

Martin Brown

Guest

John Larkin

Guest

John Larkin

Guest

John Larkin

Guest

Liz Tuddenham

Guest

John Robertson

Guest

John Larkin

Guest

Don Y

Guest

Don Y

Guest

Don Y

Guest

Don Y

Guest

Don Y

Guest

John Larkin

Guest

wmartin

Guest

Three Jeeps

Guest

John Larkin

Guest

Don Y

Guest

Don Y

Guest

John Larkin

Guest

Gerhard Hoffmann

Guest

Log in

Welcome to EDABoard.com

Sponsor