Idiot lights...

D

Don Y

Guest
My RTOS lets me migrate live processes between nodes (distributed
multiprocessor).

My hardware lets me power nodes up/down at will.

I leverage these capabilities to dynamically reassign hardware
resources to do load leveling, load shedding, etc. to adjust
the hardware\'s capabilities to the CURRENT nature of the
problem. I.e., if I don\'t need the I/Os that a node offers,
I can power down that node -- *if* I can seemlessly migrate its
active processes to other nodes. Or, I can power down its
field interface and just use the compute resources to address
a problem running on some other node!

I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

[users can be blind, deaf, CVD, etc.]

I prepared a simulation for colleagues to illustrate what
typical events look like. For example, a node being wound-down
as it\'s I/Os were no longer needed (e.g., wire EDM being
taken off-line). Or, a hardware failure detected in a node
(I/O shorted, etc.). Or, a node unplugged/replugged. Or,
a new load coming on-line requiring additional I/Os. Or,
a process temporarily requiring additional compute resources.
Or, a power failure necessitating the shedding of nonessential
loads (processes), etc.

To someone who knows what\'s happening (or, likely happening),
the indicators prove a helpful reassurance that things are
happening as they are intended.

To a casual user, it\'s just a pretty \"light show\". <frown>

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.
Note that because the panel is powered, it can display the status
of UNpowered nodes, too!]

Unfortunately, there are scant few \"systems\" that are
distributed physically in this way so hard to extrapolate
from any \"status indicators\" that THEY might provide.

The closest *common* sort of application is that of a
network switch feeding a NoW. But, as those aren\'t
necessarily all \"entwined\", there\'s very little that
the switch can offer beyond link status/rate and activity
(because the switch is just *fabric* and has no awareness
of the application in which it is deployed).

The infotainment systems on aircraft might be similar
(I am assuming they aren\'t designed with an \"all-or-nothing\"
approach and can continue to operate with individual
\"seat failures\").

Other interconnected system examples seem like they would
just present a \"check engine\" indicator to the user in the
event of ANY sort of failure. E.g., the user would likely
not be able to tell if one of the headlight drivers in
his vehicle had shit-the-bed (without actually checking the
\"light output\" from the driver!)

Industrial deployments likewise rely on a \"smart display\"
to convey status to the user. (does profinet instrument
switches in an application-specific way?)

What other apps (types of apps) can I look at?

[Note, I don\'t want to have to keep a \"smart display\"
running just so a user can assess the status of all of
the nodes when a panel of tiny LEDs can provide the
necessary information in a reduced form -- that also
makes it easy for the user to identify the *cable*
associated with the node!]
 
Don Y <blockedofcourse@foo.invalid> wrote:
My RTOS lets me migrate live processes between nodes (distributed
multiprocessor).

My hardware lets me power nodes up/down at will.

I leverage these capabilities to dynamically reassign hardware
resources to do load leveling, load shedding, etc. to adjust
the hardware\'s capabilities to the CURRENT nature of the
problem. I.e., if I don\'t need the I/Os that a node offers,
I can power down that node -- *if* I can seemlessly migrate its
active processes to other nodes. Or, I can power down its
field interface and just use the compute resources to address
a problem running on some other node!

I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

[users can be blind, deaf, CVD, etc.]

I prepared a simulation for colleagues to illustrate what
typical events look like. For example, a node being wound-down
as it\'s I/Os were no longer needed (e.g., wire EDM being
taken off-line). Or, a hardware failure detected in a node
(I/O shorted, etc.). Or, a node unplugged/replugged. Or,
a new load coming on-line requiring additional I/Os. Or,
a process temporarily requiring additional compute resources.
Or, a power failure necessitating the shedding of nonessential
loads (processes), etc.

To someone who knows what\'s happening (or, likely happening),
the indicators prove a helpful reassurance that things are
happening as they are intended.

To a casual user, it\'s just a pretty \"light show\". <frown

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.
Note that because the panel is powered, it can display the status
of UNpowered nodes, too!]

Unfortunately, there are scant few \"systems\" that are
distributed physically in this way so hard to extrapolate
from any \"status indicators\" that THEY might provide.

The closest *common* sort of application is that of a
network switch feeding a NoW. But, as those aren\'t
necessarily all \"entwined\", there\'s very little that
the switch can offer beyond link status/rate and activity
(because the switch is just *fabric* and has no awareness
of the application in which it is deployed).

The infotainment systems on aircraft might be similar
(I am assuming they aren\'t designed with an \"all-or-nothing\"
approach and can continue to operate with individual
\"seat failures\").

Other interconnected system examples seem like they would
just present a \"check engine\" indicator to the user in the
event of ANY sort of failure. E.g., the user would likely
not be able to tell if one of the headlight drivers in
his vehicle had shit-the-bed (without actually checking the
\"light output\" from the driver!)

Industrial deployments likewise rely on a \"smart display\"
to convey status to the user. (does profinet instrument
switches in an application-specific way?)

What other apps (types of apps) can I look at?

[Note, I don\'t want to have to keep a \"smart display\"
running just so a user can assess the status of all of
the nodes when a panel of tiny LEDs can provide the
necessary information in a reduced form -- that also
makes it easy for the user to identify the *cable*
associated with the node!]

Have you been drinking again?
 
On Saturday, December 11, 2021 at 4:17:47 AM UTC-8, Don Y wrote:
My RTOS lets me migrate live processes between nodes (distributed
multiprocessor).

My hardware lets me power nodes up/down at will.

I leverage these capabilities to dynamically reassign hardware
resources...
I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

....
To someone who knows what\'s happening (or, likely happening),
the indicators prove a helpful reassurance that things are
happening as they are intended.

To a casual user, it\'s just a pretty \"light show\". <frown

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.
Note that because the panel is powered, it can display the status
of UNpowered nodes, too!]

Unfortunately, there are scant few \"systems\" that are
distributed physically in this way so hard to extrapolate
from any \"status indicators\" that THEY might provide.

What other apps (types of apps) can I look at?

Alas, I\'m going to suggest a log file; they\'re very bloated in
most multiuser/multitasking systems, but if they\'re searchable,
that helps.

Unfortunately, my experience with MacOS logs is that they
get far too gabby... and useful items (like battery status) get
completely ignored. Did the laptop power adapter fail? Or just
a bad connection to the [charge indicator] LED?

So, how about a little bunch of sound snippets, that play \'error tone\' or
progress click/thump/whoosh sounds? If reseating a bad connector
makes the system play a different tune, almost anyone will notice
the status indicated has changed. There\'s more distinct sounds available
than LED colors, and lots of sound effects are memorable.
 
Cydrome Leader wrote:
===================

> Don Y <blocked...@foo.invalid> wrote:

** A pile of indigestible, verbal garbage.

> Have you been drinking again?

** Think he took too many laxatives and let lose a torrent of verbal diarrhoea.


But some of it reminded me of the infamous Yamaha P2200 audio power amplifier.
While having no fan cooling or thermal shutdown whatever - it did however have a tiny red LED on the front that warned of impending total failure in the next few minutes.

If you happened to see it.


...... Phil
 
On 12/11/2021 9:28 PM, whit3rd wrote:
On Saturday, December 11, 2021 at 4:17:47 AM UTC-8, Don Y wrote:
My RTOS lets me migrate live processes between nodes (distributed
multiprocessor).

My hardware lets me power nodes up/down at will.

I leverage these capabilities to dynamically reassign hardware
resources...
I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

...
To someone who knows what\'s happening (or, likely happening),
the indicators prove a helpful reassurance that things are
happening as they are intended.

To a casual user, it\'s just a pretty \"light show\". <frown

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.
Note that because the panel is powered, it can display the status
of UNpowered nodes, too!]

Unfortunately, there are scant few \"systems\" that are
distributed physically in this way so hard to extrapolate
from any \"status indicators\" that THEY might provide.

What other apps (types of apps) can I look at?

Alas, I\'m going to suggest a log file; they\'re very bloated in
most multiuser/multitasking systems, but if they\'re searchable,
that helps.

Log files are great for recording *history* -- to any level
of detail desired!

But, the user has to make a special effort to *review* those.
They don\'t \"command attention\"; the user has to discipline
himself to review them for \"events of interest\" (which *he*
will likely have to define as how a system is used in one
deployment may differ significantly from another!)

I want a mechanism that:
- reassures the user when things are functioning normally
(\"Yup! The board is green!\")
- alerts the user when something merits attention
(\"Hmmm, why is that indicator red/yellow/off??\")
- requires very little effort to make these GENERAL assessments

Note that the idiot lights on *your* network switch give you that
sort of \"overview\" -- even if you\'ve not CAREFULLY examined them
to verify that they are \"as expected\". Are you sure a 100Mb link
hasn\'t been negotiated when the device is *actually* 1000TX?
An FDX/HDX mismatch? etc. As the switch will continue to operate
(albeit in a DEGRADED mode), you may not be provoked into chasing
down the problem (which you\'d do *if* comms ceased completely!)

I can only provide mechanism, not policy. E.g., a \"red\" light may
be insignificant for THAT node in THIS deployment but could be
serious if encountered on some other node (or in some other
deployment).

E.g., a failure in a camera\'s Ir illuminator is only of interest
after dark. And, less significant if the camera is surveilling
the rooftop than the swimming pool (presumably, the swimming
pool can more readily be accessed than the rooftop and could
present a liability issue if used unattended)! If the camera
isn\'t being used at all, then what harm in the failure?!

Similarly, a fault signaling node that has a redundant hot backup
is of less concern than one that doesn\'t. Yes, there\'s something
that SHOULD be addressed, there, but it\'s not preventing
manufacturing from cranking out product!

So, \"policy\" has to be set by the user/application. And, hard
to codify for the system to use in its alerting criteria. You
don\'t want to require users to \"write code\" just to define *their*
particular conditions (faults?) of interest...

Unfortunately, my experience with MacOS logs is that they
get far too gabby... and useful items (like battery status) get
completely ignored. Did the laptop power adapter fail? Or just
a bad connection to the [charge indicator] LED?

Exactly. You\'d want something that told you (superficially)
that there is no need to \"look inside\" *or* that there is
something of interest that should be identified and rectified.

In a pinch, a user could replace one of the switches with one that
doesn\'t have a boundary clock. Everything downstream can still
function properly -- provided they don\'t need to order their
activities wrt those of other nodes that are still synchronized.
*Local* (within a node) temporal relationships would function
perfectly!

Noting that all of the downstream nodes indicate as \"degraded\"
would act to remind/alert a user that they likely share something
that is causing them *all* to perform below par. (\"Why the hell
all of these MAROON indicators??\")

Or, provide a local power source for a node to compensate for a failed
PSE channel -- as long as mains power doesn\'t fail, that node could
remain operational. But, this is degraded performance as it would,
otherwise, continue to operate in the complete absence of power
but *won\'t* with this new power source!

In a residential deployment, the user would likely only notice
\"red\" indications and might not be aware of other conditions that
could herald an upcoming fault. He *may* notice the sudden appearance
of other types of indications (\"Hmmm, I don\'t recall that indicator
as being YELLOW, previously! And, I\'ve NEVER seen CYAN before!!\")
but, for the most part, it would just be a \"light show\"...

[How many idiot lights illuminate during your vehicle\'s \"lamp test\"
when you initially turn the key on? Can you even figure out what they
each *mean* -- if they stayed lit? Or, would you just visit the
dealership or other \"authority\"? One of my WWV clocks has a bunch
of magic indicators; I\'ve not a clue as to why it needs more than
*one* (\"signal acquired and locked\"). Yet, I\'m not curious
enough to chase down the directions for clarification!]

In a commercial/industrial deployment, it is likely that someone(s)
will be more actively monitoring (noticing) the indicators -- perhaps
because it is their *job* to do so! And, they will probably have a
more detailed understanding of the system\'s dynamics, so may be more
able to notice fault patterns before they manifest (e.g., MAROON
\"degraded\" indications eventually becoming RED \"fault\" indications)
And, more motivation to consult more detailed, interactive status
displays and logs to resolve the issue.

[A residential deployment is hundreds of nodes; a commercial one, *many*
hundreds; industrial, thousands!]

So, how about a little bunch of sound snippets, that play \'error tone\' or
progress click/thump/whoosh sounds? If reseating a bad connector
makes the system play a different tune, almost anyone will notice
the status indicated has changed. There\'s more distinct sounds available
than LED colors, and lots of sound effects are memorable.

Sounds are transient. You\'d have to be \"observing\" the panel when the
sound was emitted to perceive it. Or, if they were \"persistent\", there\'d
be a cacophony of \"noise\" that would be impossible to sort out (\"Is that
squeal associated with node #68 or #193? And, what about that buzzing??
Is that a *good* buzzing? Or, a *bad* buzzing? Is it from the camera
associated with node #46? Or, the camera on #89?\" Could you resolve any
of that on a factory floor? Would you want those sounds emanating from
an \"equipment closet\" in your kitchen? Think about how annoying it is when
the smoke detector PERSISTENTLY tries to remind you to change its battery...)

Most distributed/physically large systems we\'ve been discussing seem to
take an all-or-nothing approach; if a node/device/peripheral/component
fails, the *system* claims the fault and doesn\'t try to work-around it. I
don\'t have those constraints; I can continue to operate in the presence
of (some) faults. Indeed, that\'s a design goal: you wouldn\'t want to
shutdown \"coating\" because of unavailability of friability testing.
Or, render your car undriveable because the headlights weren\'t working
properly. Or, the bandsaw to be inoperable out of \"sympathy\" for a
defective lathe!

The concensus from folks who\'ve played with my simulation seems to
be that \"more information is better than less\" -- as long as \"more\"
doesn\'t muddy the distinction between good/bad. There\'s quite a bit
of chatter as to why each color was chosen and why *those* \"states\"
indicated, instead of some others.

Keeping certain \"indications\" as \"definitely bad\" (so they can always
be unambiguously \"noticed\" -- like RED) is sufficient and tolerates any
number of *other* indications that further qualify the *operational*
states of the nodes (for someone who \"knows better\")

How much *other* information I provide then becomes the question
of interest (along with how it is encoded). Trying to use a simple
display to convey too much information is foolhardy -- use one of
the more expressive displays for that added detail!

And how it is made universally accessible!
 
On Sunday, December 12, 2021 at 8:14:37 AM UTC-8, Don Y wrote:
On 12/11/2021 9:28 PM, whit3rd wrote:
On Saturday, December 11, 2021 at 4:17:47 AM UTC-8, Don Y wrote:

I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

To a casual user, it\'s just a pretty \"light show\". <frown

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.

So, how about a little bunch of sound snippets, that play \'error tone\' or
progress click/thump/whoosh sounds? If reseating a bad connector
makes the system play a different tune, almost anyone will notice
the status indicated has changed. There\'s more distinct sounds available
than LED colors, and lots of sound effects are memorable.

Sounds are transient. You\'d have to be \"observing\" the panel when the
sound was emitted to perceive it. Or, if they were \"persistent\", there\'d
be a cacophony of \"noise\" that would be impossible to sort out (\"Is that
squeal associated with node #68 or #193? And, what about that buzzing??
Is that a *good* buzzing? Or, a *bad* buzzing? Is it from the camera
associated with node #46? Or, the camera on #89?\" Could you resolve any
of that on a factory floor?

I\'ve diagnosed many a hard drive problem from sounds, and a system with
a repetitive set of tasks will have a kind of \'tempo\' that one will learn to recognize.
If your car\'s sounds changed, you\'d know it, and take some kind of appropriate
action. A production test setup I once saw, put a blizzard of patterns up on an oscilloscope
display, and the techs knew/learned what each short sequence meant, could get the units
tuned in the midst of that information stream, without getting any
detailed text message. So, a few noises in the background is going to be enough
detail for some problems, not all; it\'s more nuanced than a panel of LEDs,
with minimal hardware for its display, so it seems like it\'d be a useful component.
Given a choice, look for heartbeat-rate tempo patterns, human hearing really notices
those, for \'normal\' operation. If they speed up or get erratic, a infant will start to cry.
 
Phil Allison <pallison49@gmail.com> wrote:
Cydrome Leader wrote:
===================

Don Y <blocked...@foo.invalid> wrote:

** A pile of indigestible, verbal garbage.

Have you been drinking again?

** Think he took too many laxatives and let lose a torrent of verbal diarrhoea.


But some of it reminded me of the infamous Yamaha P2200 audio power amplifier.
While having no fan cooling or thermal shutdown whatever - it did however have a tiny red LED on the front that warned of impending total failure in the next few minutes.

If you happened to see it.

Odd looking amplifier. Was there a mute circuit/relay for power-on that they decided to
not wire to protect indicator?
 
Cydrome Leader wrote:
===================
Don Y <blocked...@foo.invalid> wrote:

** A pile of indigestible, verbal garbage.

Have you been drinking again?

** Think he took too many laxatives and let lose a torrent of verbal diarrhoea.


But some of it reminded me of the infamous Yamaha P2200 audio power amplifier.
While having no fan cooling or thermal shutdown whatever - it did however have a tiny red LED on the front that warned of impending total failure in the next few minutes.

If you happened to see it.
Odd looking amplifier. Was there a mute circuit/relay for power-on that they decided to
not wire to protect indicator?

** No relays. No temp switches.
Just a pair of a PTCs and a LED.

Over heating of the output devices was the main cause of failures too.
You needed a lackey sitting on a stool just to watch the red light.

Must be a Japanese concept .....


...... Phil
 
On 12/12/2021 6:56 PM, whit3rd wrote:
On Sunday, December 12, 2021 at 8:14:37 AM UTC-8, Don Y wrote:
On 12/11/2021 9:28 PM, whit3rd wrote:
On Saturday, December 11, 2021 at 4:17:47 AM UTC-8, Don Y wrote:

I\'ve augmented the patch-panels that distribute the drops
to the nodes with \"indicators\" (visual and otherwise) that
allow a user to quickly assess the state of the system
(\"Any red lights? Too many \'black\' lights??\")

To a casual user, it\'s just a pretty \"light show\". <frown

[In the case of faults, the \"display\" makes it easy to identify the
failed node and consult a \"real\" display for more information.

So, how about a little bunch of sound snippets, that play \'error tone\' or
progress click/thump/whoosh sounds? If reseating a bad connector
makes the system play a different tune, almost anyone will notice
the status indicated has changed. There\'s more distinct sounds available
than LED colors, and lots of sound effects are memorable.

Sounds are transient. You\'d have to be \"observing\" the panel when the
sound was emitted to perceive it. Or, if they were \"persistent\", there\'d
be a cacophony of \"noise\" that would be impossible to sort out (\"Is that
squeal associated with node #68 or #193? And, what about that buzzing??
Is that a *good* buzzing? Or, a *bad* buzzing? Is it from the camera
associated with node #46? Or, the camera on #89?\" Could you resolve any
of that on a factory floor?

I\'ve diagnosed many a hard drive problem from sounds, and a system with
a repetitive set of tasks will have a kind of \'tempo\' that one will learn to recognize.

There\'s nothing wrong with sound -- when you\'re focussed on *a* device
(or problem). But, having hundreds of devices each signalling their
current state with a particular sound would be overwhelming. You
can\'t localize a particular emission to a particular device (network
drop, in this case).

As I suggested, above, how would the user know that a particular
sound was associated with node 46 vs. 89 -- given that the types of
devices associated with those nodes were the same (yet *different*
from other nodes)?

[I obviously use sound as a communication medium for folks that
can\'t rely on vision. But, have to carefully manage the sound
sources presented so they can identify what is being emitted,
and \"where\" (I place sounds in a 3d space around the listener,
assuming he has functioning binaural hearing)]

If your car\'s sounds changed, you\'d know it, and take some kind of appropriate
action. A production test setup I once saw, put a blizzard of patterns up on an oscilloscope
display, and the techs knew/learned what each short sequence meant, could get the units
tuned in the midst of that information stream, without getting any
detailed text message. So, a few noises in the background is going to be enough
detail for some problems, not all; it\'s more nuanced than a panel of LEDs,
with minimal hardware for its display, so it seems like it\'d be a useful component.
Given a choice, look for heartbeat-rate tempo patterns, human hearing really notices
those, for \'normal\' operation. If they speed up or get erratic, a infant will start to cry.

I think that those sorts of patterns would only apply to systems that
have definite patterns in their operation. You hear differences in
how your car sounds because it *always* sounds the same way. It
doesn\'t \"chirp\" one day and \"buzz\" the next and whistle the day
after -- *all* being sounds that \"all is well\". (but, beware if it
growls!)

In my case, what the system does varies in largely unpredictable ways.
And, how it \"copes\" with problems in each of those different
behaviors further alters how it might behave -- while still not
*mis*behaving.

This has been the disappointment we\'ve found when trying to identify
\"other\" systems from which to copy implementation features; almost
all expect to operate \"in their entirety\" and make no provisions
for adjusting to differing loads and usage. And, few products
exist in multiple boxes (with multiple power cords, etc.) As
a result, few developers have first-hand experience addressing
systems with those characteristics.

For example, an air handler is likely just one component in a
bigger system. It will undoubtedly have multiple (mains) power
circuits -- for the blower, heater, dehumidification, etc.
And, multiple sensors (air speed/volume, air temperature,
dew point, etc.). Any of these may be defective (or switched
off!) at any given time.

Yet, the *system* that employs that AHU will likely be designed
to assume ALL of its capabilities (actuators, sensors, control
loops) are available -- even if some of them are not needed for
today\'s job! On top of that, the system may be designed with
that as an *implicit* assumption that is never EXPLICITLY
verified (i.e., does heater work? does RTD properly sense
temperature change brought about by application of heat?
does blower move air? does VFD alter rate of airflow as
expected? etc.).

[In my world, I know what capabilities are available at
any time and can then decide how best to meet a particular
\"need\" of a new \"load\" deployed. Or, indicate that I
can\'t meet the needs of that new load, before even trying
to deploy it! (as well as handling faults that manifest
*after* its deployment)]

Instead, the product will assume all is well and likely
throw an error at *runtime* when a control loop fails to
maintain a desired setpoint (temperature, RH, flow rate).
That\'s really not a good approach as you probably wasted
some \"materials\" that you were thinking you could process
normally, had the subsystem actually been functional!

Or, refuse to allow itself to be activated unless it has
verified ALL of these subsystems are operational. So,
you\'re denied use of the equipment even when its
current state is more than adequate for the job at hand.

This because of an uninspired approach to the problem space.

So, the sort of \"status indication\" they provide is
essentially go/no-go instead of a more nuanced
expression of \"current capabilities\"

[\"Camera #11 is operational -- but *not* it\'s illuminator!\"]

[\"Processor resources associated with Camera node #5 are
operational but the camera itself isn\'t producing video!\"]

A knowledgeable user (e.g., \"staff\") would know (from
local experience/policy) what the consequences of a
given set of degraded capabilities are to their
organization and could adjust expectations to the
availability of \"repairs\". (\"I can pull the rooftop
camera and use it to monitor the swimming pool -- until
we get a replacement camera from the manufacturer\")
 

Welcome to EDABoard.com

Sponsor

Back
Top