highest frequency periodic interrupt?...

lørdag den 14. januar 2023 kl. 22.33.29 UTC+1 skrev John Larkin:
On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen
lang...@fonz.dk> wrote:

lørdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin:
On Sat, 14 Jan 2023 12:20:07 -0700, Don Y
blocked...@foo.invalid> wrote:

On 1/14/2023 8:52 AM, Martin Brown wrote:
ISR code is generally very short and best done in assembler if you want it as
quick as possible. Examining the code generation of GCC is worthwhile since it
sucks compared to Intel(better) and MS (best).

I always code ISRs in a HLL -- if only to act as pseudo-code
illustrating what the (ASM) code is actually doing. IME, people
miss details in ASM so having those expressed in a HLL makes
it easier for them to understand the *goal* of the code.

Looking at a .S is a great starting point *if* you have to
hand-tweak the code. Remembering that the code that gets
executed will change as the compiler is revised; ASM won\'t
(which can be A Good Thing as well as A Bad Thing).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when
generating Intel CPU specific SIMD code with maximum optimisation.

I\'d be less worried about quality of code generator (compiler vs. human ASM)
than the effects of cache, core affinity, *which* bus(es) are called on
for each instruction, other contenders for those resources, etc.
The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16
Kbyte cache. Cache misses will be *very* slow. So code will need to be
very tight bare-metal. The entire ISR should fit in cache.

you can copy some (or all) of the code to ram instead of using execute-in-place from flash
That\'s a good idea. A typical ISR could be pretty small, and let the
mainline program thrash all it likes.

I think you can even turn off the cache to get an additional 16k ram
Yikes, execute out of SPI flash?

no, copy all the code to ram on boot
 
On Sat, 14 Jan 2023 13:42:54 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

lørdag den 14. januar 2023 kl. 22.33.29 UTC+1 skrev John Larkin:
On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen
lang...@fonz.dk> wrote:

lørdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin:
On Sat, 14 Jan 2023 12:20:07 -0700, Don Y
blocked...@foo.invalid> wrote:

On 1/14/2023 8:52 AM, Martin Brown wrote:
ISR code is generally very short and best done in assembler if you want it as
quick as possible. Examining the code generation of GCC is worthwhile since it
sucks compared to Intel(better) and MS (best).

I always code ISRs in a HLL -- if only to act as pseudo-code
illustrating what the (ASM) code is actually doing. IME, people
miss details in ASM so having those expressed in a HLL makes
it easier for them to understand the *goal* of the code.

Looking at a .S is a great starting point *if* you have to
hand-tweak the code. Remembering that the code that gets
executed will change as the compiler is revised; ASM won\'t
(which can be A Good Thing as well as A Bad Thing).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when
generating Intel CPU specific SIMD code with maximum optimisation.

I\'d be less worried about quality of code generator (compiler vs. human ASM)
than the effects of cache, core affinity, *which* bus(es) are called on
for each instruction, other contenders for those resources, etc.
The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16
Kbyte cache. Cache misses will be *very* slow. So code will need to be
very tight bare-metal. The entire ISR should fit in cache.

you can copy some (or all) of the code to ram instead of using execute-in-place from flash
That\'s a good idea. A typical ISR could be pretty small, and let the
mainline program thrash all it likes.

I think you can even turn off the cache to get an additional 16k ram
Yikes, execute out of SPI flash?

no, copy all the code to ram on boot

Ok, OK, the entire app and variables and stacks and buffers would have
to fit in 256K. Might work.
 
On 13/01/2023 23:46, John Larkin wrote:
> What\'s the fastest periodic IRQ that you have ever run?

<snip>

Got a 3us interrupt servicing an ADC, assembler of course. Only a
40MIPs processor and the 3us has a scattering of 3.025us and 2.975us
intervals as needed to maintain synchronisation to a remote transmitter
with no possibility of a common clock.

Works fine at Gas Mark 4, aka 180\'C.

--
Cheers
Clive
 
On a sunny day (Sat, 14 Jan 2023 10:50:30 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<nmt5shp49hqugndric000iqtgprikikub9@4ax.com>:

On Sat, 14 Jan 2023 17:57:08 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Sat, 14 Jan 2023 08:31:33 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
tol5shtb7chchpkq63hnb1mfsveolk1tib@4ax.com>:

On Sat, 14 Jan 2023 06:27:45 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

Payed about 100 USD for my Pi4 4 GB and my Pi4 8 GB just 2 years ago December 2020,
including SDcard, RapiOS, plastic housing, cables, cooling fins and supply.

No fan, it does run hot, about 70 C.
But I use that one for web browsing.
The older one with 4 GB memory has an ebay metal housing and a fan.
After lubricating that fan with vaseline it now has run quiet for 4 years?
The metal housing also stops any WiFi, as that one is part of the security
system and no WiFi allowed there.
It runs 24/7 recording 6 cameras, 2 audio channels, weather sensors (temp, air pressure, humidity
airtraffic, ship traffic, radiation etc (from an even older rRaspberry Pi that works as server) ..
http://panteltje.com/panteltje/xgpspc/index.html
Each Pi4 has a 4 TB Toshiba USB harddisk connected to it.




The enclosure is a nightmare so I threw that away. Just run the board.
It doesn\'t seem to need the fan.

Type this in a terminal to see the current temperature:
vcgencmd measure_temp


Fingers are easier.

This is from google:
For Raspberry Pi 3+, a \'soft\' temperature limit of 60°C has been introduced.]
This means that even before reaching the hard limit at 85°C, the clock speed is reduced from 1.4GHz to lower frequencies,
reducing the temperatu
and
That is the so-called throttling. The Raspberry Pi monitors the temperature continuously.
Above 82 °C (180 °F), the clock frequency is automatically lowered, regardless of which flag is set. This action will reduce
heat

So better use vcgencmd and it saves your finger too from getting fried.

It might get hotter when it\'s compiling or something, but it\'s not
very warm. It would be easy to add the fan if it got necessary. The
kit did come with three stick-on heat sinks.

There are also LCD monitor things that the 4B mounts on the back of.
They have a fan.


I should actually get a better housing with fan for my Pi4 8 GB like I have for my Pi4 4 GB that runs at about 46 Degrees C.
Of course maybe bringing your own fried finger to a restaurant ?? ..Discount?

My finger is calibrated. I can touch 50C forever and 60C for about
half a second. Touching 100C briefly hurts but does no harm. Baking a
real pie is more dangerous.

I\'ve had interns that refused to touch chips to see if they are hot.
They were afraid of being electrocuted by 3.3 volts.

Yea, OK,
The thing about vcgencmd is that you can use it from a script or program
to for example reduce priority or temporarily slow down or halt some not essential code
to prevent the Pi lowering the clock on your important things, give alarms, etc etc.
Change processor use too.
 
Buy Vape Cartridges Online
Variegated Plants For Sale Near Me
Bruce Banner #3 Strain
Buy Edibles Online
Buy Dank Gummies 500mg
Brass Knuckles For Sale
White Monstera For Sale
Buy AK-47 Weed Online
Buy One Up Mushroom Bar 3.5G
Tales Of Arabian Nights
Buy Green Crack Online
Ghost Train Haze For Sale
Buy Alaskan Thunder Fuck Online
Buy Budheads Edibles Chewy Cubes 600 mg
Buy Rhaphidophora tetrasperma
Buy Acapulco Gold strain online
Batman 66 Pinball For Sale
Monstera Albo For Sale Florida
Buy Gas Heads Edibles 600mg
Buy Bhang Cartridges Online
Philodendron fibraecataphyllum
Buy Iron Man Pinball Online
Buy Sour Diesel Online
Caudex (Beaucarnea)
Twilight Zone Pinball For Sale
Buy Nova Vape Carts Online
Maranta Lemon Lime For Sale
Philodendron Caramel Marble Variegated
Blueberry Strain For Sale
Pinball Machine Star Wars
Philodendron Florida Beauty Variegata
Buy Kali Mist Online
Jurassic Park Pinball
Buy Chocolope Online
Buy Durban Poison Online
Buy Spliffin Vape Cartridges Online
Buy Skywalker OG Online
Buy Push Vape Cartridges Online
Buy Wonders 1000mg THC Canna Lean Online
Buy Grapefruit Online
Friendly Farms Carts For Sale
Buy Lemon Haze Strain
Buy Weed Online
Variegated Plant Shop


https://megaweedmarketltd.com/product/bruce_banner_strain/
https://megaweedmarketltd.com/product/dank_gummies/
https://megaweedmarketltd.com/product/brass_knuckles_for_sale/
https://qualityvariegatedplants.com/product/white-monstera-for-sale/
https://megaweedmarketltd.com/product/ak_47_strain/
https://megaweedmarketltd.com/product/one_up_bar/
https://qualitypinballcompany.com/product/tales_of_arabian_nights/
https://megaweedmarketltd.com/product/green_crack_strain/
https://megaweedmarketltd.com/product/ghost-train-haze/
https://megaweedmarketltd.com/product/buy-alaskan-thunder-fuck-online/
https://megaweedmarketltd.com/product/budheads/
https://qualityvariegatedplants.com/product/buy-rhaphidophora-tetrasperma/
https://megaweedmarketltd.com/product/buy-acapulco-gold-strain-online/
https://qualitypinballcompany.com/product/batman_66_pinball_for_sale/
https://qualityvariegatedplants.com/product/monstera-albo-for-sale-florida/
https://megaweedmarketltd.com/product/gas_heads/
https://megaweedmarketltd.com/product/buy_bhang_cartridges_online/
https://qualityvariegatedplants.com/product/philodendron-fibraecataphyllum/
https://qualitypinballcompany.com/product/buy_iron_man_pinball_online/
https://qualityvariegatedplants.com/product/caudex-beaucarnea/
https://qualitypinballcompany.com/product/twilight_zone_pinball_for_sale/
https://megaweedmarketltd.com/product/buy_nova_vape_carts_online/
https://qualityvariegatedplants.com/product/maranta-lemon-lime-for-sale/
https://qualityvariegatedplants.com/product/philodendron-caramel-marble/
https://megaweedmarketltd.com/product/blueberry_strain/
https://qualitypinballcompany.com/product/pinball_machine_star_wars/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-2/
https://megaweedmarketltd.com/product/kali-mist/
https://qualitypinballcompany.com/product/jurassic_park_pinball/
https://megaweedmarketltd.com/product/chocolope/
https://megaweedmarketltd.com/product/buy-durban-poison-online/
https://megaweedmarketltd.com/product/spliffin_cartridges/
https://megaweedmarketltd.com/product/skywalker_strain/
https://megaweedmarketltd.com/product/buy_push_vape_cartridges_online/
https://megaweedmarketltd.com/product/thc_lean/
https://megaweedmarketltd.com/product/grapefruit/
https://megaweedmarketltd.com/product/friendly_farms/
https://megaweedmarketltd.com/product/lemon_haze/
https://megaweedmarketltd.com/product/buy_grease_monkey_exotic_carts/
https://megaweedmarketltd.com/product/710_kingpen_cartridges_for_sale/
https://megaweedmarketltd.com/product/buy_moonrock_clear_carts_online/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-variegated-for-sale/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-for-sale-near-me/
https://megaweedmarketltd.com/product/rove_carts/
 
On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 13/01/2023 23:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

Usually try to avoid having fast periodic IRQs in favour of offloading
them onto some dedicated hardware. But CPUs were slower then than now.

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

It might be worth benchmarking how fast the FPU really is on that device
(for representative sample code). The Intel i5 & i7 can do all except
divide in a single cycle these days - I don\'t know what Arm is like in
this respect. You get some +*- for free close to every divide too.

The RP2040 chip has FP routines in the rom, apparently code with some
sorts of hardware assist, but it\'s callable subroutines and not native
instructions to a hardware FP engine. When it returns it\'s done.

Various web sites seem to confuse microseconds and nanoseconds. 150 us
does seem slow for a \"fast\" fp operation. We\'ll have to do
experiments.

I wrote one math package for the 68K, with the format signed 32.32.
That behaved just like floating point in real life, but was small and
fast and avoided drecky scaled integers.


*BIG* time penalty for having two divides or branches too close
together. Worth playing around to find patterns the CPU does well.

Without true hardware FP, call locations probably don\'t matter.


Beware that what you measure gets controlled but for polynomials up to 5
term or rationals up to about 5,2 call overhead may dominate the
execution time (particularly if the stupid compiler puts a 16byte
structure across a cache boundary on the stack).

We occasionally use polynomials, but 2nd order and rarely 3rd is
enough to get analog i/o close enough.


Forcing inlining of small code sections can help. DO it to excess and it
will slow things down - there is a sweet spot. Loop unrolling is much
less useful these days now that branch prediction is so good.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

ISR code is generally very short and best done in assembler if you want
it as quick as possible. Examining the code generation of GCC is
worthwhile since it sucks compared to Intel(better) and MS (best).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

That is silly
http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was amazed about the other thread about logic analyzers.
Why did I never need one for my code / projects?
All you need is a scope... especially an analog one, digital ones are liars!

If you have no clue then having a hall full of equipment does not give you one!

mm
did I use a scope for any of this?
http://panteltje.com/panteltje/newsflex/download.html
I only have a 10 MHz analog dual trace one!
Wel, it shows 25 MHz too, but attenuated.
But I DO have rtl_sdr sticks that show spectrum from 25 MHz to 1.5 GHz
It is so simple, all of it...
maybe not for a mamatician, but then...
/
 
On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje
<pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
q5p3shh8f34tt34ka767750oc2ou8p7vl8@4ax.com>:

What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

Well in that sort of thing you need to think in asm, instruction times,
but I have no experience with the RP2040, and little with ASM on ARM.
Should be simple to test how long the C code takes, do you have an RP2040?
Playing with one would be a good starting point.
Should I get one? Was thinking just for fun...

In the past coding ISRs in assembly was the way to go, but the
complexity of current processors (cache, pipelining) makes it hard to
beat a _good_ compiler.

The main principle still is to minimize the number of registers saved
at
interrupt entry (and restored at exit).On a primitive processor only
the processor status word and program counter needs to be saved (and
restored). Additional registers may need to be saved(restored if the
ISR uses them.

If the processor has separate FP registers and/or separate FP status
words, avoid using FP registers in ISRs.

Some compilers may have \"interrupt\" keywords or similar extensions and
the compiler knows which registers need to be saved in the ISR. To
help the compiler, include all functions that are called by the ISR in
the same module(preferably in-lined) prior to the ISR, so that the
compiler knows what needs to be saved. Do not call external library
routines from ISR, since the compiler doesn\'t know which registers
need to be saved and saves all.
 
On 1/14/2023 10:10 PM, upsidedown@downunder.com wrote:
In the past coding ISRs in assembly was the way to go, but the
complexity of current processors (cache, pipelining) makes it hard to
beat a _good_ compiler.

Exactly. And, it\'s usually easier to see what you are trying
to do in a HLL vs. ASM (and heaven forbid you want to port
the application to a different processor!)

The problem with using an HLL is making sure you actually
understand some \"line of code\" translates into when it comes
to actual opcode/memory accesses (not just which instructions
but, rather, the *cost* of those instructions)

And, this can change, based on *how* the compiler is invoked
(how aggressive the code generator)

The main principle still is to minimize the number of registers saved
at
interrupt entry (and restored at exit).On a primitive processor only
the processor status word and program counter needs to be saved (and
restored). Additional registers may need to be saved(restored if the
ISR uses them.

Some \"advanced\" processors still support a \"Fast IRQ\" that saves
just an accumulator and PSW. A tacit acknowledgement that you
don\'t want to have to save the *entire* processor state (as
you likely don\'t know what portions of it the compiler *might*
call on).

If the processor has separate FP registers and/or separate FP status
words, avoid using FP registers in ISRs.

As with everything, *how* you use them can make a difference.
E.g., if your ISR reenables interrupts (prior to completion), it
can make sense to use \"expensive\" instruction sequences (assuming
the ISR doesn\'t interrupt itself).

[Degenerate example: the scheduler being invoked!]

Some compilers may have \"interrupt\" keywords or similar extensions and
the compiler knows which registers need to be saved in the ISR. To
help the compiler, include all functions that are called by the ISR in
the same module(preferably in-lined) prior to the ISR, so that the
compiler knows what needs to be saved. Do not call external library
routines from ISR, since the compiler doesn\'t know which registers
need to be saved and saves all.
 
On 14/01/2023 18:21, John Larkin wrote:
On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

The guardians of big iron won\'t let you poke around its internals with a
scope although I do recall them having an AM radio on the console so
that you could listen in on the RFI to see if it was stuck in a loop.

I prefer to use RDTSC for my Intel timings anyway.

On many of the modern CPUs there is a freerunning 64 bit counter clocked
at once per cycle. Intel deprecates using it for such purposes but I
have never found it a problem provided that you bracket it before and
after with CPUID to force all the pipelines into an empty state.

The equivalent DWT_CYCCNT on the Arm CPUs that support it is described here:

https://stackoverflow.com/questions/32610019/arm-m4-instructions-per-cycle-ipc-counters

I prefer hard numbers to a vague scope trace.

If I\'m really serious about finding out why something is unusually slow
I run a dangerous system level driver that allows me full access to the
model specific registers to monitor cache misses and pipeline stalls.

One of my recent test shows that in the MS SSE2 library whilst sin and
cos are both properly rounded to acceptable machine precision tolerance
the results from the combined sincos have worst case behaviour 4x eps.

This makes answers change when the optimisation level is increased to
maximum in code which uses both sin(x) and cos(x) and mine does.

--
Regards,
Martin Brown
 
On 1/15/2023 2:48 AM, Martin Brown wrote:
I prefer to use RDTSC for my Intel timings anyway.

On many of the modern CPUs there is a freerunning 64 bit counter clocked at
once per cycle. Intel deprecates using it for such purposes but I have never
found it a problem provided that you bracket it before and after with CPUID to
force all the pipelines into an empty state.

The equivalent DWT_CYCCNT on the Arm CPUs that support it is described here:

https://stackoverflow.com/questions/32610019/arm-m4-instructions-per-cycle-ipc-counters

I prefer hard numbers to a vague scope trace.

Two downsides:
- you have to instrument your code (but, if you\'re concerned with performance,
you\'ve already done this as a matter of course)
- it doesn\'t tell you about anything that happens *before* the code runs
(e.g., latency between event and recognition thereof)

If I\'m really serious about finding out why something is unusually slow I run a
dangerous system level driver that allows me full access to the model specific
registers to monitor cache misses and pipeline stalls.

But, those results can change from instance to instance (as can latency,
execution time, etc.). So, you need to look at the *distribution* of
values and then think about whether that truly represents \"typical\"
and/or *worst* case.

Relying on exact timings is sort of naive; it ignores how much
things can vary with the running system (is the software in a
critical region when the ISR is invoked?) and the running
*hardware* (multilevel caches, etc.)

Do you have a way of KNOWING when your expectations (which you
have now decided are REQUIRMENTS!) are NOT being met? And, if so,
what do you do (at runtime) with that information? (\"I\'m sorry,
one of my basic assumptions is proving to be false and I am not
equipped to deal with that...\")

Esp given that your implementation will likely evolve and
folks doing that work may not be as focused as you were on
this specific issue...

One of my recent test shows that in the MS SSE2 library whilst sin and cos are
both properly rounded to acceptable machine precision tolerance the results
from the combined sincos have worst case behaviour 4x eps.

This makes answers change when the optimisation level is increased to maximum
in code which uses both sin(x) and cos(x) and mine does.
 
søndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder..com:
On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje
pNaonSt...@yahoo.com> wrote:

On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin
jla...@highlandSNIPMEtechnology.com> wrote in
q5p3shh8f34tt34ka...@4ax.com>:

What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

Well in that sort of thing you need to think in asm, instruction times,
but I have no experience with the RP2040, and little with ASM on ARM.
Should be simple to test how long the C code takes, do you have an RP2040?
Playing with one would be a good starting point.
Should I get one? Was thinking just for fun...
In the past coding ISRs in assembly was the way to go, but the
complexity of current processors (cache, pipelining) makes it hard to
beat a _good_ compiler.

The main principle still is to minimize the number of registers saved
at
interrupt entry (and restored at exit).On a primitive processor only
the processor status word and program counter needs to be saved (and
restored). Additional registers may need to be saved(restored if the
ISR uses them.

If the processor has separate FP registers and/or separate FP status
words, avoid using FP registers in ISRs.

Some compilers may have \"interrupt\" keywords or similar extensions and
the compiler knows which registers need to be saved in the ISR. To
help the compiler, include all functions that are called by the ISR in
the same module(preferably in-lined) prior to the ISR, so that the
compiler knows what needs to be saved. Do not call external library
routines from ISR, since the compiler doesn\'t know which registers
need to be saved and saves all.

cortex-m automatically stack the registers needed to call a regular C function
and if it has an FPU it supports \"lazy stacking\" which means it keeps track of
whether the FPU is used and only stack/un-stack them when they are used

it also knows that if another interrupt is pending at ISR exit is doesn\'t need to
to un-stack/stack before calling the other interrupt
 
On 1/15/2023 12:48, Lasse Langwadt Christensen wrote:
søndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder.com:
On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje
pNaonSt...@yahoo.com> wrote:

On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin
jla...@highlandSNIPMEtechnology.com> wrote in
q5p3shh8f34tt34ka...@4ax.com>:

What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

Well in that sort of thing you need to think in asm, instruction times,
but I have no experience with the RP2040, and little with ASM on ARM.
Should be simple to test how long the C code takes, do you have an RP2040?
Playing with one would be a good starting point.
Should I get one? Was thinking just for fun...
In the past coding ISRs in assembly was the way to go, but the
complexity of current processors (cache, pipelining) makes it hard to
beat a _good_ compiler.

The main principle still is to minimize the number of registers saved
at
interrupt entry (and restored at exit).On a primitive processor only
the processor status word and program counter needs to be saved (and
restored). Additional registers may need to be saved(restored if the
ISR uses them.

If the processor has separate FP registers and/or separate FP status
words, avoid using FP registers in ISRs.

Some compilers may have \"interrupt\" keywords or similar extensions and
the compiler knows which registers need to be saved in the ISR. To
help the compiler, include all functions that are called by the ISR in
the same module(preferably in-lined) prior to the ISR, so that the
compiler knows what needs to be saved. Do not call external library
routines from ISR, since the compiler doesn\'t know which registers
need to be saved and saves all.

cortex-m automatically stack the registers needed to call a regular C function
and if it has an FPU it supports \"lazy stacking\" which means it keeps track of
whether the FPU is used and only stack/un-stack them when they are used

it also knows that if another interrupt is pending at ISR exit is doesn\'t need to
to un-stack/stack before calling the other interrupt

How many registers does it stack automatically? I knew the HLL nonsense
would catch up with CPU design eventually. Good CPU design still means
load/store machines, stacking *nothing* at IRQ, just saving PC and CCR
to special purpose regs which can be stacked as needed by the IRQ
routine, along with registers to be used in it. Memory accesses are
the bottleneck, and with HLL code being bloated as it is chances
are some cache will have to be flushed to make room for stacking.
Some *really* well designed for control applications processors allow
you to lock a part of the cache but I doubt ARM have that, they seem to
have gone the way \"make programming a two click job\" to target a
wider audience.
 
On 1/14/2023 1:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

10 us for a 100+ MHz CPU should be doable; I don\'t know about ARM
though, they keep on surprising me with this or that nonsense. (never
used one, just by chance stumbling on that sort of thing).
What you might need to consider is that on modern day CPUs you
don\'t have the nice prioritized IRQ scheme you must be used to from
the CPU32; once in an interrupt you are just masked for all interrupts,
they have some priority resolver which only resolves which interrupt
will come next *after* you get unmasked. Some I have used have a
second, higher priority IRQ (like the 6809 FIRQ) but on the core I have
used they differ from the 6809-s FIRQ in that the errata sheet says
they don\'t work.
On load/store machines latency should be less of an issue for the
jitter you will get as long as you don\'t do division in your code to
be interrupted.
Make sure you look into the FPU you\'d consider deep enough, none
will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or
so (can\'t remember exactly now) mantissa, the rest goes on the
exponent. I have found 32 bit FP numbers convenient to store some
constants (on the core I use the load is 1 cycle, expanding
automatically to 64 bit), did not find any other use for those.

Finally, to give you some numbers :). Back during the 80-s I wrote
a floppy disk controller for the 765 on a 1 MHz 6809. It had about
10 us per byte IIRC; doing IRQ was out of question. But the 6809
had a \"sync\" opcode, if IRQs were masked it would stop and wait
for an IRQ; and would just resume execution once the line was pulled.
This worked for the fastest of floppies (5\" HD), so perhaps you
can use a 6809 :D. (I may have one or two somewhere here, 2 MHz
ones at that - in DIP40....).

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/
 
On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje
<pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 13/01/2023 23:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

Usually try to avoid having fast periodic IRQs in favour of offloading
them onto some dedicated hardware. But CPUs were slower then than now.

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

It might be worth benchmarking how fast the FPU really is on that device
(for representative sample code). The Intel i5 & i7 can do all except
divide in a single cycle these days - I don\'t know what Arm is like in
this respect. You get some +*- for free close to every divide too.

The RP2040 chip has FP routines in the rom, apparently code with some
sorts of hardware assist, but it\'s callable subroutines and not native
instructions to a hardware FP engine. When it returns it\'s done.

Various web sites seem to confuse microseconds and nanoseconds. 150 us
does seem slow for a \"fast\" fp operation. We\'ll have to do
experiments.

I wrote one math package for the 68K, with the format signed 32.32.
That behaved just like floating point in real life, but was small and
fast and avoided drecky scaled integers.


*BIG* time penalty for having two divides or branches too close
together. Worth playing around to find patterns the CPU does well.

Without true hardware FP, call locations probably don\'t matter.


Beware that what you measure gets controlled but for polynomials up to 5
term or rationals up to about 5,2 call overhead may dominate the
execution time (particularly if the stupid compiler puts a 16byte
structure across a cache boundary on the stack).

We occasionally use polynomials, but 2nd order and rarely 3rd is
enough to get analog i/o close enough.


Forcing inlining of small code sections can help. DO it to excess and it
will slow things down - there is a sweet spot. Loop unrolling is much
less useful these days now that branch prediction is so good.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

ISR code is generally very short and best done in assembler if you want
it as quick as possible. Examining the code generation of GCC is
worthwhile since it sucks compared to Intel(better) and MS (best).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

That is silly
http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was thinking about doing closed-loop control, switching power
supplies and dummy loads and such, using one core of an RP2040 instead
of an FPGA. That would be coded hard-metal, no OS or RTOS.

I guess I don\'t really need interrupts. I could run a single
persistant loop that waits on a timer until it\'s time to compute
again, to run at for instance 100 KHz. If execution time is reasonably
constant, it could just loop as fast as it can; even simpler. I like
that one.

I was amazed about the other thread about logic analyzers.
Why did I never need one for my code / projects?
All you need is a scope... especially an analog one, digital ones are liars!

I\'ve never used a logic analyzer; they look hard to connect,
especially into a single-chip uP. But color digital scopes rock.

If you have no clue then having a hall full of equipment does not give you one!

mm
did I use a scope for any of this?
http://panteltje.com/panteltje/newsflex/download.html
I only have a 10 MHz analog dual trace one!

My usual scope is a 500 MHz 4-channel Rigol. And an old Tek 11802
sampler for the fast stuff and TDR. I have a 40 GHz plugin.
 
On Sun, 15 Jan 2023 16:29:00 +0200, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

On 1/14/2023 1:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.


10 us for a 100+ MHz CPU should be doable; I don\'t know about ARM
though, they keep on surprising me with this or that nonsense. (never
used one, just by chance stumbling on that sort of thing).
What you might need to consider is that on modern day CPUs you
don\'t have the nice prioritized IRQ scheme you must be used to from
the CPU32; once in an interrupt you are just masked for all interrupts,
they have some priority resolver which only resolves which interrupt
will come next *after* you get unmasked. Some I have used have a
second, higher priority IRQ (like the 6809 FIRQ) but on the core I have
used they differ from the 6809-s FIRQ in that the errata sheet says
they don\'t work.

I\'ll be doing single-function bare-metal control, like a power supply
for example, on a dedicated CPU core. The only interrupt will be a
periodic timer, or maybe an ADC that digitizes a few channels and then
interrupts.

I\'d like the power supply to be a mosfet half-bridge and an ADC to
digitize output voltage and current, and code to close the voltage and
current limit loops. I could use a uP timer to make the PWM into the
half-bridge. Possibly go full-bridge and have a bipolar supply.

I\'m just considering new product possibilities now; none of this may
ever happen. Raspberry Pi Pico is sort of a solution looking for a
problem, bottom-up design.


On load/store machines latency should be less of an issue for the
jitter you will get as long as you don\'t do division in your code to
be interrupted.
Make sure you look into the FPU you\'d consider deep enough, none
will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or
so (can\'t remember exactly now) mantissa, the rest goes on the
exponent. I have found 32 bit FP numbers convenient to store some
constants (on the core I use the load is 1 cycle, expanding
automatically to 64 bit), did not find any other use for those.

The RP2040 doesn\'t have an FPU, and its semi-hardware FP calls look
too slow to run a decent control loop. The barbaric way to do this is
with signed 32-bit ints where the LSB is 1 microvolt.

Finally, to give you some numbers :). Back during the 80-s I wrote
a floppy disk controller for the 765 on a 1 MHz 6809. It had about
10 us per byte IIRC; doing IRQ was out of question. But the 6809
had a \"sync\" opcode, if IRQs were masked it would stop and wait
for an IRQ; and would just resume execution once the line was pulled.
This worked for the fastest of floppies (5\" HD), so perhaps you
can use a 6809 :D. (I may have one or two somewhere here, 2 MHz
ones at that - in DIP40....).

I wrote an RTOS for the MC6800! Longhand in Juneau Alaska! That was
fairly awful. I mean the RTOS; Juneau was great. The 6800 wouldn\'t
even push the index onto the stack.

I did some 6802 and 6803 poducts too, but skipped 6809 and went to
68K. You can still buy 68332\'s !!!!!!



======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/
 
On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 13/01/2023 23:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

Usually try to avoid having fast periodic IRQs in favour of offloading
them onto some dedicated hardware. But CPUs were slower then than now.

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

It might be worth benchmarking how fast the FPU really is on that device
(for representative sample code). The Intel i5 & i7 can do all except
divide in a single cycle these days - I don\'t know what Arm is like in
this respect. You get some +*- for free close to every divide too.

The RP2040 chip has FP routines in the rom, apparently code with some
sorts of hardware assist, but it\'s callable subroutines and not native
instructions to a hardware FP engine. When it returns it\'s done.

Various web sites seem to confuse microseconds and nanoseconds. 150 us
does seem slow for a \"fast\" fp operation. We\'ll have to do
experiments.

I wrote one math package for the 68K, with the format signed 32.32.
That behaved just like floating point in real life, but was small and
fast and avoided drecky scaled integers.


*BIG* time penalty for having two divides or branches too close
together. Worth playing around to find patterns the CPU does well.

Without true hardware FP, call locations probably don\'t matter.


Beware that what you measure gets controlled but for polynomials up to 5
term or rationals up to about 5,2 call overhead may dominate the
execution time (particularly if the stupid compiler puts a 16byte
structure across a cache boundary on the stack).

We occasionally use polynomials, but 2nd order and rarely 3rd is
enough to get analog i/o close enough.


Forcing inlining of small code sections can help. DO it to excess and it
will slow things down - there is a sweet spot. Loop unrolling is much
less useful these days now that branch prediction is so good.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

ISR code is generally very short and best done in assembler if you want
it as quick as possible. Examining the code generation of GCC is
worthwhile since it sucks compared to Intel(better) and MS (best).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

That is silly
http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was thinking about doing closed-loop control, switching power
supplies and dummy loads and such, using one core of an RP2040 instead
of an FPGA. That would be coded hard-metal, no OS or RTOS.

I guess I don\'t really need interrupts. I could run a single
persistant loop that waits on a timer until it\'s time to compute
again, to run at for instance 100 KHz. If execution time is reasonably
constant, it could just loop as fast as it can; even simpler. I like
that one.

This is a very common approach, being pioneered by Bell Labs when
designing the first digital telephone switch, the 1ESS:

..<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System>

The approach endures in such things as missile autopilots, but always
with some way to gracefully handle when the control code occasionally
runs too long and isn\'t done in time for the next frame to start.

Typically, the frames are started by arrival of a clock interrupt, and
there are no data interrupts.

The problem being that interrupts (including the hardware to monitor
10,000 lines) are expensive in both overhead and hardware cost, and so
are not worthwhile when doing something like scanning 10,000 phone
lines for new activity (like a phone having been picked up) where
individual lines change only rarely.


I was amazed about the other thread about logic analyzers.
Why did I never need one for my code / projects?
All you need is a scope... especially an analog one, digital ones are liars!

I\'ve never used a logic analyzer; they look hard to connect,
especially into a single-chip uP. But color digital scopes rock.

Logic analyzers are usefully for a board full of logic, but not for
things like power supplies. One does need to design the board to
accept the test leads. This gets interesting with GHz clocks.

Joe Gwinn
 
On a sunny day (Sun, 15 Jan 2023 08:00:39 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<6088shtd32gc5r7t4cksj9oqiviq5udjmr@4ax.com>:

I was thinking about doing closed-loop control, switching power
supplies and dummy loads and such, using one core of an RP2040 instead
of an FPGA. That would be coded hard-metal, no OS or RTOS.

Power supplies work great for me with a Microchip PIC 18F14K22
It has all the PWM, 2 hardware comparators, voltage reference, multi channel ADC,
and is fast enough to do cycle by cycle current limiting,
one of its hardware comparators is hardwired to the PMW generator and resets it in a few ns if needed.

I guess I don\'t really need interrupts. I could run a single
persistant loop that waits on a timer until it\'s time to compute
again, to run at for instance 100 KHz. If execution time is reasonably
constant, it could just loop as fast as it can; even simpler. I like
that one.

Yes,


I was amazed about the other thread about logic analyzers.
Why did I never need one for my code / projects?
All you need is a scope... especially an analog one, digital ones are liars!

I\'ve never used a logic analyzer; they look hard to connect,
especially into a single-chip uP. But color digital scopes rock.

I started looking into building one once,
but for showing the i2c bytes? The software to create i2c I use is so good and has been running for decades
that it is much easier to look at the code...
And the scope for the waveforms, same for the other serial protocols.


If you have no clue then having a hall full of equipment does not give you one!

mm
did I use a scope for any of this?
http://panteltje.com/panteltje/newsflex/download.html
I only have a 10 MHz analog dual trace one!

My usual scope is a 500 MHz 4-channel Rigol. And an old Tek 11802
sampler for the fast stuff and TDR. I have a 40 GHz plugin.

If I wanted I could buy a Rigol or Tek...
Most gigle-Hertz stuff I play with is done via a RTL_SDR stick with converter for 2.4 GHz or 10 GHz (satellite).
http://panteltje.com/panteltje/xpsa/index.html
old version, latest has many more functions.

Now I want a 1 TB (1000 GB) USB stick, found a cheap one for 75 USD online here...
Tomshardware just did a test:
https://www.tomshardware.com/best-picks/best-flash-drives

All your movies and stuff in your pocket when traveling...
Security? Maybe encrypt it with something simple.
Would still be more secure than storage in the cloud.
 
On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net>
wrote:

On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote:

On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 13/01/2023 23:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

Usually try to avoid having fast periodic IRQs in favour of offloading
them onto some dedicated hardware. But CPUs were slower then than now.

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

It might be worth benchmarking how fast the FPU really is on that device
(for representative sample code). The Intel i5 & i7 can do all except
divide in a single cycle these days - I don\'t know what Arm is like in
this respect. You get some +*- for free close to every divide too.

The RP2040 chip has FP routines in the rom, apparently code with some
sorts of hardware assist, but it\'s callable subroutines and not native
instructions to a hardware FP engine. When it returns it\'s done.

Various web sites seem to confuse microseconds and nanoseconds. 150 us
does seem slow for a \"fast\" fp operation. We\'ll have to do
experiments.

I wrote one math package for the 68K, with the format signed 32.32.
That behaved just like floating point in real life, but was small and
fast and avoided drecky scaled integers.


*BIG* time penalty for having two divides or branches too close
together. Worth playing around to find patterns the CPU does well.

Without true hardware FP, call locations probably don\'t matter.


Beware that what you measure gets controlled but for polynomials up to 5
term or rationals up to about 5,2 call overhead may dominate the
execution time (particularly if the stupid compiler puts a 16byte
structure across a cache boundary on the stack).

We occasionally use polynomials, but 2nd order and rarely 3rd is
enough to get analog i/o close enough.


Forcing inlining of small code sections can help. DO it to excess and it
will slow things down - there is a sweet spot. Loop unrolling is much
less useful these days now that branch prediction is so good.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

ISR code is generally very short and best done in assembler if you want
it as quick as possible. Examining the code generation of GCC is
worthwhile since it sucks compared to Intel(better) and MS (best).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

That is silly
http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was thinking about doing closed-loop control, switching power
supplies and dummy loads and such, using one core of an RP2040 instead
of an FPGA. That would be coded hard-metal, no OS or RTOS.

I guess I don\'t really need interrupts. I could run a single
persistant loop that waits on a timer until it\'s time to compute
again, to run at for instance 100 KHz. If execution time is reasonably
constant, it could just loop as fast as it can; even simpler. I like
that one.

This is a very common approach, being pioneered by Bell Labs when
designing the first digital telephone switch, the 1ESS:

.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System

The approach endures in such things as missile autopilots, but always
with some way to gracefully handle when the control code occasionally
runs too long and isn\'t done in time for the next frame to start.

I was thinking of an endless loop that just runs compute bound as hard
as it can. The \"next frame\" is the top of the loop. The control loop
time base is whatever the average loop execution time is.

As you say, no interrupt overhead.
 
On 1/15/2023 18:20, John Larkin wrote:
On Sun, 15 Jan 2023 16:29:00 +0200, Dimiter_Popoff <dp@tgi-sci.com> wrote:

On 1/14/2023 1:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.


10 us for a 100+ MHz CPU should be doable; I don\'t know about ARM
though, they keep on surprising me with this or that nonsense. (never
used one, just by chance stumbling on that sort of thing).
What you might need to consider is that on modern day CPUs you
don\'t have the nice prioritized IRQ scheme you must be used to from
the CPU32; once in an interrupt you are just masked for all interrupts,
they have some priority resolver which only resolves which interrupt
will come next *after* you get unmasked. Some I have used have a
second, higher priority IRQ (like the 6809 FIRQ) but on the core I have
used they differ from the 6809-s FIRQ in that the errata sheet says
they don\'t work.

I\'ll be doing single-function bare-metal control, like a power supply
for example, on a dedicated CPU core. The only interrupt will be a
periodic timer, or maybe an ADC that digitizes a few channels and then
interrupts.

I\'d like the power supply to be a mosfet half-bridge and an ADC to
digitize output voltage and current, and code to close the voltage and
current limit loops. I could use a uP timer to make the PWM into the
half-bridge. Possibly go full-bridge and have a bipolar supply.

I\'m just considering new product possibilities now; none of this may
ever happen. Raspberry Pi Pico is sort of a solution looking for a
problem, bottom-up design.
Do you know whether it is documented enough to allow you to throw away
all the code that comes with it and write your own bare-metal one?
At 100 kHz you\'d likely need to do so.

On load/store machines latency should be less of an issue for the
jitter you will get as long as you don\'t do division in your code to
be interrupted.
Make sure you look into the FPU you\'d consider deep enough, none
will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or
so (can\'t remember exactly now) mantissa, the rest goes on the
exponent. I have found 32 bit FP numbers convenient to store some
constants (on the core I use the load is 1 cycle, expanding
automatically to 64 bit), did not find any other use for those.

The RP2040 doesn\'t have an FPU, and its semi-hardware FP calls look
too slow to run a decent control loop. The barbaric way to do this is
with signed 32-bit ints where the LSB is 1 microvolt.
Nothing I\'d call barbaric about that, you have the 32 bits so why not
use them.

Finally, to give you some numbers :). Back during the 80-s I wrote
a floppy disk controller for the 765 on a 1 MHz 6809. It had about
10 us per byte IIRC; doing IRQ was out of question. But the 6809
had a \"sync\" opcode, if IRQs were masked it would stop and wait
for an IRQ; and would just resume execution once the line was pulled.
This worked for the fastest of floppies (5\" HD), so perhaps you
can use a 6809 :D. (I may have one or two somewhere here, 2 MHz
ones at that - in DIP40....).

I wrote an RTOS for the MC6800! Longhand in Juneau Alaska! That was
fairly awful. I mean the RTOS; Juneau was great. The 6800 wouldn\'t
even push the index onto the stack.

My first board was a 6809 one, but I had no terminal to talk to it
so in order to make one I made a clone of Motorola\'s D5 kit (clone
meaning the debug monitor written by Herve Tireford worked on it,
its source was public). Then I designed a terminal board, 6800
based, programmed it on an Exorciser clone I had access to and
debugged the code with the 6800 on the board being emulated by
the kit, a 40 pin dip right from the kit\'s CPU via a flat cable...
(may be there were buffers to drive the cable, don\'t remember)
So I am also used to push the X register via a swi call, doing
tsx etc., taught us to be grateful for what 68k gave us.

I did some 6802 and 6803 poducts too, but skipped 6809 and went to
68K. You can still buy 68332\'s !!!!!!

Some years (10, may be some more) I tamed the mcf52211 in my
working environment, still available, too. You will feel quite
familiar with it, though it will probably be eol-ed soon.
At 66 MHz you could do a lot, the ADC is true 12 bit, it has
PWM-s (clocked at 33 MHz though, that resolution can be quite
an enemy, especially at higher PWM frequencies). I have done some
auxiliary HV sources with it for our netMCA, but they don\'t work
at 100 kHz (IIRC something like 5) and the stepwise change in
pulse width was still something I had to deal with. Probably
not a great idea to start a new product with it though, only
if you feel it will be best for you to write it in 68k assembler.

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/
 
On Sun, 15 Jan 2023 11:16:45 -0800, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net
wrote:

On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote:

On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje
pNaonStpealmtje@yahoo.com> wrote:

On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
jlarkin@highlandSNIPMEtechnology.com> wrote in
epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
\'\'\'newspam\'\'\'@nonad.co.uk> wrote:

On 13/01/2023 23:46, John Larkin wrote:
What\'s the fastest periodic IRQ that you have ever run?

Usually try to avoid having fast periodic IRQs in favour of offloading
them onto some dedicated hardware. But CPUs were slower then than now.

We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
down some to save power, so the ISR runs for about 7 usec max.

I ask because if I use a Pi Pico on some new projects, it has a
dual-core 133 MHz CPU, and one core may have enough compute power that
we wouldn\'t need an FPGA in a lot of cases. Might even do DDS in
software.

RP2040 floating point is tempting but probably too slow for control
use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
I guess.

It might be worth benchmarking how fast the FPU really is on that device
(for representative sample code). The Intel i5 & i7 can do all except
divide in a single cycle these days - I don\'t know what Arm is like in
this respect. You get some +*- for free close to every divide too.

The RP2040 chip has FP routines in the rom, apparently code with some
sorts of hardware assist, but it\'s callable subroutines and not native
instructions to a hardware FP engine. When it returns it\'s done.

Various web sites seem to confuse microseconds and nanoseconds. 150 us
does seem slow for a \"fast\" fp operation. We\'ll have to do
experiments.

I wrote one math package for the 68K, with the format signed 32.32.
That behaved just like floating point in real life, but was small and
fast and avoided drecky scaled integers.


*BIG* time penalty for having two divides or branches too close
together. Worth playing around to find patterns the CPU does well.

Without true hardware FP, call locations probably don\'t matter.


Beware that what you measure gets controlled but for polynomials up to 5
term or rationals up to about 5,2 call overhead may dominate the
execution time (particularly if the stupid compiler puts a 16byte
structure across a cache boundary on the stack).

We occasionally use polynomials, but 2nd order and rarely 3rd is
enough to get analog i/o close enough.


Forcing inlining of small code sections can help. DO it to excess and it
will slow things down - there is a sweet spot. Loop unrolling is much
less useful these days now that branch prediction is so good.

I was also thinking that we could make a 2 or 3-bit DAC with a few
resistors. The IRQ could load that at various places and a scope would
trace execution. That would look cool. On the 1758 thing we brought
out a single bit to a test point and raised that during the ISR so we
could see ISR execution time on a scope. My c guy didn\'t believe that
a useful ISR could run at 100K and had no idea what execution time
might be.

ISR code is generally very short and best done in assembler if you want
it as quick as possible. Examining the code generation of GCC is
worthwhile since it sucks compared to Intel(better) and MS (best).

In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
when generating Intel CPU specific SIMD code with maximum optimisation.

MS compiler still does pretty stupid things like internal compiler
generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
having them crossing a cache line boundary.

Nobody has answered my question. Generalizations about software timing
abound but hard numbers are rare. Programmers don\'t seem to use
oscilloscopes much.

That is silly
http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was thinking about doing closed-loop control, switching power
supplies and dummy loads and such, using one core of an RP2040 instead
of an FPGA. That would be coded hard-metal, no OS or RTOS.

I guess I don\'t really need interrupts. I could run a single
persistant loop that waits on a timer until it\'s time to compute
again, to run at for instance 100 KHz. If execution time is reasonably
constant, it could just loop as fast as it can; even simpler. I like
that one.

This is a very common approach, being pioneered by Bell Labs when
designing the first digital telephone switch, the 1ESS:

.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System

The approach endures in such things as missile autopilots, but always
with some way to gracefully handle when the control code occasionally
runs too long and isn\'t done in time for the next frame to start.

I was thinking of an endless loop that just runs compute bound as hard
as it can. The \"next frame\" is the top of the loop. The control loop
time base is whatever the average loop execution time is.

As you say, no interrupt overhead.

To be more specific, the frames effectively run at interrupt priority,
triggered by a timer interrupt, but we also run various background
tasks at user level utilizing whatever CPU is left over, if any. The
sample rate is set by controller dynamics, and going faster does not
help. Especially if FFTs are being performed over a moving window of
samples.

Joe Gwinn
 

Welcome to EDABoard.com

Sponsor

Back
Top