D
Don Y
Guest
I use redundancy as a means of increasing availability. I.e.,
the redundant instances are duplicates of their sources.
I\'m preparing a demo for an off-site I\'ll be hosting. I
invited a \"local\" colleague to preview it the other day.
I built a service (\"program\") that runs on the hardware.
Then, configured a \"backup\" service to be present in
case the primary failed, for any reason.
[The typical failure that I am guarding against is a node being
(accidentally) \"unplugged\" or physically destroyed.]
To demonstrate this, I started the service and then
unplugged one of the nodes involved in delivering that service
to show that the service persisted, uninterrupted.
I arranged for an idiot light on the active node(s) to illuminate.
So, the idiot light on the faulted node obviously was extinguished
and a light on the \"backup\" node illuminated.
[This is where things require refining the mental model
associated with the mechanism.]
When I reconnected the original device, the service persisted
AND THE INDICATORS REMAINED UNCHANGED. I.e., the *backup* device
was still providing the service. My colleague had expected
the service to return to the original device (why??).
When I unplugged the backup device, the light moved to yet
*another* device -- which really confounded him! He had
expected the *first* device to resume the role. I explained
that the third device had been nominated as the backup while
the second device was serving as the active device (otherwise,
there would have been NO backup while the original device was
offline!)
[My API lets me nominate a set of devices to provide the
service/backup... I let the workload manager decide which
device is \"best\" as this can vary, over time. So, the goal
is always to nominate the largest set of devices to give
the most flexibility in this assignment]
I, frankly, see nothing wrong with my implementation;
I don\'t see a need to bind a particular service -- and its
backup instance -- to specific devices (though, I could
by deliberately restricting the sets of nominated devices).
And, that any device that CAN provide the backup should
be allowed to assume that role (assuming conditions I\'ve
set at design time can be met).
I\'m thinking that the introduction of a second set of
indicators might make this \"less surprising\"? I.e.,
so the primary AND backup devices can be visually
identified at all times. This would make the choice of
device three as the new backup (when device two takes over)
more obvious and easier to rationalize -- instead of
having it appear, unexpectedly, as the active primary.
I\'ve been looking at redundant HARDWARE systems to see if
there are any clues that I can take away to make this
easier to grok. But, most seem to be simple (dual)
redundant or, possibly, triple redundant. And, the devices
involved are very obvious -- because of packaging, physical
similarities, etc.
I think if folks were more familiar with things like 3DNS
it might be an easier \"sell\"...
the redundant instances are duplicates of their sources.
I\'m preparing a demo for an off-site I\'ll be hosting. I
invited a \"local\" colleague to preview it the other day.
I built a service (\"program\") that runs on the hardware.
Then, configured a \"backup\" service to be present in
case the primary failed, for any reason.
[The typical failure that I am guarding against is a node being
(accidentally) \"unplugged\" or physically destroyed.]
To demonstrate this, I started the service and then
unplugged one of the nodes involved in delivering that service
to show that the service persisted, uninterrupted.
I arranged for an idiot light on the active node(s) to illuminate.
So, the idiot light on the faulted node obviously was extinguished
and a light on the \"backup\" node illuminated.
[This is where things require refining the mental model
associated with the mechanism.]
When I reconnected the original device, the service persisted
AND THE INDICATORS REMAINED UNCHANGED. I.e., the *backup* device
was still providing the service. My colleague had expected
the service to return to the original device (why??).
When I unplugged the backup device, the light moved to yet
*another* device -- which really confounded him! He had
expected the *first* device to resume the role. I explained
that the third device had been nominated as the backup while
the second device was serving as the active device (otherwise,
there would have been NO backup while the original device was
offline!)
[My API lets me nominate a set of devices to provide the
service/backup... I let the workload manager decide which
device is \"best\" as this can vary, over time. So, the goal
is always to nominate the largest set of devices to give
the most flexibility in this assignment]
I, frankly, see nothing wrong with my implementation;
I don\'t see a need to bind a particular service -- and its
backup instance -- to specific devices (though, I could
by deliberately restricting the sets of nominated devices).
And, that any device that CAN provide the backup should
be allowed to assume that role (assuming conditions I\'ve
set at design time can be met).
I\'m thinking that the introduction of a second set of
indicators might make this \"less surprising\"? I.e.,
so the primary AND backup devices can be visually
identified at all times. This would make the choice of
device three as the new backup (when device two takes over)
more obvious and easier to rationalize -- instead of
having it appear, unexpectedly, as the active primary.
I\'ve been looking at redundant HARDWARE systems to see if
there are any clues that I can take away to make this
easier to grok. But, most seem to be simple (dual)
redundant or, possibly, triple redundant. And, the devices
involved are very obvious -- because of packaging, physical
similarities, etc.
I think if folks were more familiar with things like 3DNS
it might be an easier \"sell\"...