rowhammer mitigation...

Oops. I seem to have messed up the quoting.
Hopefully it\'s still legible.

--
Brian Gregory (in England).
 
On 21/04/2023 07:30, Don Y wrote:
On 4/20/2023 10:35 PM, Jasen Betts wrote:
On 2023-04-20, Brian Gregory <void-invalid-dead-dontuse@email.invalid
wrote:
On 20/04/2023 19:55, Don Y wrote:
On 4/20/2023 11:48 AM, Brian Gregory wrote:
On 20/04/2023 15:10, Don Y wrote:
On small (resource starved) processors, I often implement
multitasking using
a structure like:


Main:
      Call task1
      Call task2
      Call task3
      Jmp Main

And, within any task:

      Call Yield
Here:

unwinds the stack (to get the address IN THE TASK at which the
Call was
executed) and plugs the return address to replace the \"taskX\" address
field
of the associated \"Call\" within Main.  So, the next time around the
loop,
\"Call task1\" may execute as \"Call Here\".

[This makes for incredibly fast, low-overhead context switches as the
new PC
is \"restored\" directly *in* the invoking Call]

Doesn\'t work with XIP, though.

I can\'t see how that works. Do you save the task\'s stack somewhere
too?

In truly \"resource starved\" applications (think \"a few KB of RAM -- or
less,
total\"), you don\'t have *space* for a stack per task.  And, some really
small
devices don\'t have an \"accessible\" stack that one can preserve (and
restore)
as part of a context switch.

I\'d just write a proper task switcher, either interrupt driven or not
as suites the situation. It\'s usually no more than a handful of pages
of C and/or assembler.

It\'s not that it is difficult.  Rather, that it consumes runtime
resources.

But, being able to break an application into smaller, concurrent pieces
is too valuable a technique to discard just because \"you can\'t
afford it\".


Then I don\'t get it. Unless maybe the calls to the tasks are just from
habit and could more straightforwardly be replaced by jumps or gotos.

The use of call puts the address of the instruction after the call on
the stack from where it can be fetched by the yield() call.

I\'m guessing that Yield() pops the return address of the stack and
saves it in a register
then pops the next return address off the stack, substracts some
constant, and puts it in a pointer register and and then saves the
first register there,

Or, encodes the contents of the first register into a valid instruction
that references that location -- depends on the specifics of the
instruction set encoding.

The point is, the only bit of task state that is preserved is
the PC (for the location after the yield) AND that it is
preserved in an instruction that can efficiently dispatch
to that address when next encountered.

[Storing the entire state of a task typically has the
PC \"restored\" as the last step in reestablishing the
task\'s state just prior to next invocation with the
PC value (as well as the rest of the state) retrieved
from some TCB structure in general purpose memory
(here, it is stored in \"program memory\")]

adds the constant back on and does jump to that
address.

Because you don\'t have to restore any of the rest of the
machine state, you are free to make that \"jump\" in whatever
means is easiest (consuming the entire machine state in the
process, if necessary)

Okay, this makes more sense now.

--
Brian Gregory (in England).
 
On 4/21/2023 5:18 PM, Brian Gregory wrote:

This all makes perfect sense and is in line with techniques I have used on
occasion except this:

Main:
      Call task1
      Call task2
      Call task3
      Jmp Main

The point is that the problem being solved doesn\'t inherently
*require* preserving any additional state; keeping the PC
is all that is necessary for its correct operation.

For many (esp small) problems, algorithms can be rewritten
to adapt to this constraint relatively easy.

E.g., if you want to iterate a loop N times, there\'s no need
for N to reside in a register; you can explicitly preserve it BEFORE
you YIELD and explicitly restore it in the line of code that
follows the YIELD (or, later, depending on when you need to
access it)

Preserving only the PC -- and in this particular manner -- lets
you take advantage of multitasking without the overhead of
something like a preemptive kernel. (This also simplifies
information sharing as you KNOW no other task can access
data/partial data unless you relinquish the CPU -- global
variables being relatively common in small designs)

Here\'s AN implementation of the flasher application in Z80 ASM:

Loop:
LD HL,HALF_SECOND ; manifest constant related to clock frequency
LD (Timer7),HL ; init timer7 with duration of ON interval

LD A,ON ; turn on indicator
LD (INDICATOR),A

YIELD

LD HL,timer7 ; check time remaining on timer7
Call TestTimer
RET NZ ; can\'t proceed until timer expires

LD HL,HALF_SECOND ; manifest constant related to clock frequency
LD (Timer7),HL ; init timer7 with duration of OFF interval

LD A,OFF ; turn off indicator
LD (INDICATOR),A

YIELD

LD HL,timer7 ; check time remaining on timer7
Call TestTimer
RET NZ ; can\'t proceed until timer expires

JP Loop

Note that anything that must be persistent is simply reloaded
AFTER any preceding YIELD.

[TestTimer can be replaced by a shorter -- in time and space -- sequence
but then that would rely on a particular timer* implementation. Wrapping
the test in a subr lets the implementation change while the information
provided by the subr remains intact]
 
On 4/21/2023 5:24 PM, Brian Gregory wrote:
Because you don\'t have to restore any of the rest of the
machine state, you are free to make that \"jump\" in whatever
means is easiest (consuming the entire machine state in the
process, if necessary)

Okay, this makes more sense now.

I would sure hope so -- there are many products that have been
built on this approach! :>

The real value, though, is in illustrating that there is often
value to challenging your preconceived notions of what you \"need\"
in a solution space. People who \"just write code\" tend not to
examine these issues -- often completely ignorant of the
choices involved!

Do you really need 32-bit floats? Would 24-bit suffice? Maybe
shrink the exponent field to gain another bit in the mantissa?
Do you need support for overflow/underflow/denormalized values?
Do you need to be able to print floats? Hex/octal constants?
Long longs? Limit the width of a printed field? Zero-pad? etc.

Often, these assumptions have (relatively) high associated costs.
E.g., add a printf() invocation to a binary and notice how much
bigger the executable becomes! Note how costly a taskswitch is
*when* you\'ve used a floating point calculation vs. not (most
reschedules are smart enough NOT to save/restore the FPU state
if you\'re not using it... many are smart enough to know not
to do so it if you haven\'t used it *recently*!)
 
On Fri, 21 Apr 2023 19:14:00 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 4/21/2023 5:24 PM, Brian Gregory wrote:
Because you don\'t have to restore any of the rest of the
machine state, you are free to make that \"jump\" in whatever
means is easiest (consuming the entire machine state in the
process, if necessary)

Okay, this makes more sense now.

I would sure hope so -- there are many products that have been
built on this approach! :

The real value, though, is in illustrating that there is often
value to challenging your preconceived notions of what you \"need\"
in a solution space. People who \"just write code\" tend not to
examine these issues -- often completely ignorant of the
choices involved!

Do you really need 32-bit floats? Would 24-bit suffice? Maybe
shrink the exponent field to gain another bit in the mantissa?
Do you need support for overflow/underflow/denormalized values?
Do you need to be able to print floats? Hex/octal constants?
Long longs? Limit the width of a printed field? Zero-pad? etc.

Often, these assumptions have (relatively) high associated costs.
E.g., add a printf() invocation to a binary and notice how much
bigger the executable becomes! Note how costly a taskswitch is
*when* you\'ve used a floating point calculation vs. not (most
reschedules are smart enough NOT to save/restore the FPU state
if you\'re not using it... many are smart enough to know not
to do so it if you haven\'t used it *recently*!)

Can you even get 24 bit floating point processors ?

All of them that are built into my processors are 32 bit or if wanted,
more expensive 64 bit double precision IEEE FP.

If 32 bit FP is single precision, why would you want 24 bit ?

I may have missed an important point here of course :)

boB
 
On 4/22/2023 1:29 PM, boB wrote:
On Fri, 21 Apr 2023 19:14:00 -0700, Don Y
blockedofcourse@foo.invalid> wrote:

On 4/21/2023 5:24 PM, Brian Gregory wrote:
Because you don\'t have to restore any of the rest of the
machine state, you are free to make that \"jump\" in whatever
means is easiest (consuming the entire machine state in the
process, if necessary)

Okay, this makes more sense now.

I would sure hope so -- there are many products that have been
built on this approach! :

The real value, though, is in illustrating that there is often
value to challenging your preconceived notions of what you \"need\"
in a solution space. People who \"just write code\" tend not to
examine these issues -- often completely ignorant of the
choices involved!

Do you really need 32-bit floats? Would 24-bit suffice? Maybe
shrink the exponent field to gain another bit in the mantissa?
Do you need support for overflow/underflow/denormalized values?
Do you need to be able to print floats? Hex/octal constants?
Long longs? Limit the width of a printed field? Zero-pad? etc.

Often, these assumptions have (relatively) high associated costs.
E.g., add a printf() invocation to a binary and notice how much
bigger the executable becomes! Note how costly a taskswitch is
*when* you\'ve used a floating point calculation vs. not (most
reschedules are smart enough NOT to save/restore the FPU state
if you\'re not using it... many are smart enough to know not
to do so it if you haven\'t used it *recently*!)

Can you even get 24 bit floating point processors ?

Likely not. But, not all processors have FPUs. Those that don\'t
either trap to FP libraries *or* you *use* an FP library explicitly.

All of them that are built into my processors are 32 bit or if wanted,
more expensive 64 bit double precision IEEE FP.

If 32 bit FP is single precision, why would you want 24 bit ?

Because you may not need 32b floats in your calculations.
Just like you may not need 64, 32 or 16 bit ints in all
calculations!

And, you almost assuredly don\'t need support for NaNs,
denormalized numbers, FP exceptions, etc. If you don\'t
have an FPU, all of those \"features\" have to be implemented
in code.

Similarly, you may never need to print a long long value.
So, why have a printf() that has the capability of doing so?
Or, being able to print a field with specific widths or precisions,
accepting arguments indirectly (\'*\'), etc.

Does printf *need* to be able to handle:
printf(\"%*d\", MAXINT, value)
(Are you sure yours *does*?)

> I may have missed an important point here of course :)

Simply that of \"minimal resources\" being available in some
environments.

It\'s usually informative to write token routines just to see what
various (commonly used) features *cost* in different environments.
E.g., a windowed application typically costs more than a console
application...

(see how long the above printf() example takes to execute for
practical vs. ridiculous values of the width parameter. It will
give you an idea of how it is implemented. Also, how *large*
the process becomes while executing)
 

Welcome to EDABoard.com

Sponsor

Back
Top