Guaranteeing switch bandwidth...

chrisq · Aug 28, 2023

On 8/25/23 11:35, Dan Purgert wrote:

On 2023-08-24, Don Y wrote:
On 8/24/2023 9:40 AM, Dan Purgert wrote:
Is there any way to guarantee the bandwidth available to a
subgroup of ports on a (managed/L3?) switch?

Not really. You can try implementing QoS on the switch to tag priority,
but that doesn\'t necessarily \"guarantee\" bandwidth; noting, of course,
that most (all) decent switches have enough capacity in their switch
fabric to run all ports at line rate simultaneously.

But that assumes the destination port is always available.
E.g., if N-1 ports all try to deliver packets to the Nth
port, then the packets are queued in the switch, using switch
resources.

Yes and no. Switches have *very* small buffers (your typical \"decent
biz\" rack-mount option having somewhere in the realm of 16 MiB ...
maybe).

If a link gets so congested that the buffers fill, the switch will just
start dropping traffic on the floor. This in turn will typically act as
a signal for the sending parties to retry (and slow down).

This would be A Bad Thing as it would have to rely on timeouts
to detect that packets a missing.

The whole idea is to drop frames early, so that you don\'t bog down other
parts of the network.

Note too that ethernet does include provisions for signalling frame
errors.

Bear in mind that \"filling buffers\" only happens in cases like you\'ve
described -- if you\'ve got 10 hosts all talking amongst themselves
(rather than 9 trying to slam the 10th), a decent switch will keep up
with that forever.

There are multiple executing threads on each host. Each host
\"serves\" some number of objects -- that can be referenced by
clients on other hosts (those other hosts serving objects
of their own). Additionally, the objects can be migrated to other
hosts as can the servers that back them. (so, traffic patterns
are highly dynamic)

Okay, so you\'ve got basically a bog-standard network design ...

[...]
A good switch will happily switch all 24 / 48 ports at line rate all day
every day.

It\'s usually just easier to have bigger uplink connections (e.g.
10/100G) making up the backbone.

I\'m not leaving the switch.

Then I\'m not really sure why you\'re asking about N hosts talking to an
M\'th host ... as all that traffic will enter (and exit) your switch ...

Any host (or hosts) on the switch can make a request (RPC)
of any other host at any time. Even for synchronous requests,
having *made* a request doesn\'t mean that other requests
(from other threads on your host) can\'t *also* be tickling the
switch -- possibly the same host that the first RPC targeted,
or possibly some other.

This description still has your data \"leaving\" the switch.

The switch is an approximation of a mesh. Under what conditions
does that approximation fall flat?

Decent manuals will provide three pieces of data for the switch:

- Non-Blocking Throughput -- should be equal to the number of ports.
How much can the switch transmit before it bogs down.

That assumes every destination port is \"available\".

Well, yeah, a switch can\'t magically talk out a port that\'s not
connected to anything

But it\'s good to know that the switch can constantly transmit at the
combined line rate of all ports.

- Switching Capacity -- should be 2x the number of ports. How much
total traffic the fabric can handle before it bogs down.

How does this differ from the above? Or, is this a measure of
how deep the internal store is?

The switching capacity is \"how fast can the switch shuffle stuff around
its ASIC\". Slight correction to my initial statement, switching capacity
should be the sum of 2x the ports of all supported bandwidths.

Consider those super-cheap 5 port things you\'ll find at your local
big-box electronics store. They (might) only have a switching capacity of
5 gbps ... which cannot support the potential 10gbps of traffic 5 hosts
connected to it could generate. BUT, well, it\'s not meant for a scenario
where you have 5 hosts talking amongst themselves at line rate.

As another example, I have a 48 port switch that includes 4x SFP cages
(2x 1G SFP + 2x 1/10G SFP+). As I recall, its switching capacity is on
the order of 104 gbps (i.e. 2x 52 [1gpbs] ports). So I know it\'ll never
become the bottleneck if I don\'t use 10gbit SFP cages ... or if I do
need 10g switching, I have to give up a bit on the copper port capacity,
OR just accept that the switch WILL be a bottleneck if I\'m trying to use
it at port-capacity with at least 1x 10g card).

- Forwarding Rate -- should be about 1.5x the number of ports. How
many frames the switch can process before it bogs down.

As long as you\'re within these specs, \"the switch\" is not impacting the
traffic at all.

If the switch is owned ENTIRELY by the application, then
these limits can be evaluated.

Evaluated? They\'re right there in the datasheet, the work\'s been done
for you.

But, if other applications are also sharing the switch,
then you (me) have to be able to quantify *their* impact on
YOUR application.

Appliation? like \"program\"? Switches don\'t operate with \"applications\".
They operate on ethernet frames.

Imagine serving iSCSI on a switch intended to support
a certain type of \"application traffic\". Suddenly,
there\'s all of this (near continuous) traffic as
the fabric tries to absorb the raw disk I/O.

iSCSI isn\'t served by a switch ... it\'s just SCSI commands from an
initiator to a target, wrapped in TCP. The ultimate bulk data transfer
on the network looks effectively like any other (TCP-based) data
transfer.

Target can only serve it back to the initiator as fast as it can upload
(e.g. 1gpbs, although that\'s quite likely limited by disk read speed).
Likewise, initiator can only accept it as fast as it can download (e.g.
1gpbs). And, well, a halfway decent switch can handle that all day every
day. If either \"Target\" or \"Initiator\" bogs down (because their 1gbps
link can only move 1gbps, and they want to do more than just transfer
block storage data back and forth), then frames start getting dropped,
and TCP starts backing off ... and the switch is not the bottleneck.

In conventional services, things just slow down. You
may wait many seconds before a request times out. And,
you may abandon that request.

But, if the application expects a certain type of performance
from the communication subsystem and something is acting as
a parasite, there, then what recourse does the application
have? It can\'t tell the switch \"disable all other ports
because their activities are interfering with my expected
performance\".

Your \"application\" is bottlenecked by your PC\'s network stack (and
ability to kick data onto the wire) before your theoretical switch gets
involved. If your \"application\" needs more throughput, you\'ll need to
handle it at the host it\'s running on. I mean, if we have a theoretical
switch with 500gpbs capacity, that\'s all going to waste if the 48 hosts
connected to it only have gbit ports ...

Keep it simple. Throughout depends on traffic volume in a cdma network,
due to collision retries. The only way to be sure is to test out with
various network loads. Plenty of open source tools for that sort of
thing now. A three node network with the third node running Kali linux
to generate various levels of traffic, Wireshark as a monitor, night be
a good place to start...

Don Y · Aug 28, 2023

On 8/28/2023 8:08 AM, chrisq wrote:

Keep it simple. Throughout depends on traffic volume in a cdma network, due to
collision retries. The only way to be sure is to test out with various network
loads. Plenty of open source tools for that sort of thing now. A three node
network with the third node running Kali linux to generate various levels of
traffic, Wireshark as a monitor, night be a good place to start...

Open source tools assumes you are running IP -- and, that your traffic
resembles \"typical traffic\".

I have utilities built into the OS that let me see where packets are being
sent and from where received -- along with the time-in-transit (if I
target a null RMI) if I just want to measure transport delays.

But, applications (\"systems\") rely on more than just wire speeds unlike
\"single services\".

chrisq · Aug 28, 2023

On 8/28/23 16:46, Don Y wrote:

On 8/28/2023 8:08 AM, chrisq wrote:
Keep it simple. Throughout depends on traffic volume in a cdma
network, due to collision retries. The only way to be sure is to test
out with various network loads. Plenty of open source tools for that
sort of thing now. A three node network with the third node running
Kali linux to generate various levels of traffic, Wireshark as a
monitor, night be a good place to start...

Open source tools assumes you are running IP -- and, that your traffic
resembles \"typical traffic\".

I have utilities built into the OS that let me see where packets are being
sent and from where received -- along with the time-in-transit (if I
target a null RMI) if I just want to measure transport delays.

But, applications (\"systems\") rely on more than just wire speeds unlike
\"single services\".

Whatever, can only say what has worked here, but probably many
approaches to the problem...

Don Y · Aug 28, 2023

On 8/28/2023 10:33 AM, chrisq wrote:

On 8/28/23 16:46, Don Y wrote:
On 8/28/2023 8:08 AM, chrisq wrote:
Keep it simple. Throughout depends on traffic volume in a cdma network, due
to collision retries. The only way to be sure is to test out with various
network loads. Plenty of open source tools for that sort of thing now. A
three node network with the third node running Kali linux to generate
various levels of traffic, Wireshark as a monitor, night be a good place to
start...

Open source tools assumes you are running IP -- and, that your traffic
resembles \"typical traffic\".

I have utilities built into the OS that let me see where packets are being
sent and from where received -- along with the time-in-transit (if I
target a null RMI) if I just want to measure transport delays.

But, applications (\"systems\") rely on more than just wire speeds unlike
\"single services\".

Whatever, can only say what has worked here, but probably many approaches to
the problem...

If you are using the network for bog-standard \"applications\",
then you can use tools that were designed to support those bog-standard
applications.

If, OTOH, you see the network as a set of components that can be used
in *other* ways, then you use tools that fit those other uses.

You can use a diode as a rectifier, capacitor, voltage reference,
mechanical standoff/spacer, fuse, heater, etc. Picking the appropriate
uses for YOUR application is what makes it engineering.

E.g., I can tell when a packet was queued on the sender, when it
hit the wire, when it arrived at the NIC on the target and when
it made it\'s way \"up\" to the receiving stub. So, I can tell
whether a *specific* packet lingered in the output queue at the
source, or *in* the switch, or in the input queue at the target.
And, repeat this for the \"reply\" so I know whether I should
address the client side stubs, network interrupt handler,
packet priority assignment, fabric, server-side stubs, remote
procedure implementation, etc.

This lets me figure out *where* I need to address performance.

Tools that just look at the wire can only see the packet ON the wire.

Guaranteeing switch bandwidth...

chrisq

Guest

Don Y

Guest

chrisq

Guest

Don Y

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

Guaranteeing switch bandwidth...

chrisq

Guest

Don Y

Guest

chrisq

Guest

Don Y

Guest

Log in

Welcome to EDABoard.com

Sponsor