Discussion:
[Simh] Cluster communications errors
Hunter Goatley
2018-07-18 15:37:40 UTC
Permalink
Good morning.

I recently set up SIMH running under Linux to replace some aging VAX
hardware. The SIMH instance is about 30% faster than the actual
hardware, which is a nice win. I'm running the current code from GitHub,
which I downloaded on Monday.

I have a dedicated Ethernet device on the Linux system for the SIMH
instance.

It's in a cluster of other machines, and all is working well except for
one thing. Every 15--60 seconds, it loses and re-establishes contact
with the cluster:

%CNXMAN, lost connection to system VADER
%CNXMAN, re-established connection to system VADER

And these OPCOM messages from VADER:

%%%%%%%%%%% OPCOM 18-JUL-2018 11:33:01.26 %%%%%%%%%%% (from node VADER a)
11:32:46.71 Node VADER (csid 00010078) lost connection to node DARTH

%%%%%%%%%%% OPCOM 18-JUL-2018 11:33:01.26 %%%%%%%%%%% (from node VADER a)
11:32:49.21 Node VADER (csid 00010078) re-established connection to node DARTH

It recovers every time, but everything hangs briefly while connectivity
is re-established, and, of course, it's generating a ton of OPCOM
messages, since this happens every 15--60 seconds.

Has anyone else seen this issue or have any suggestions?

Thanks!
--
Hunter
------
Hunter Goatley, Process Software, http://www.process.com/
***@goatley.com http://hunter.goatley.com/
Hunter Goatley
2018-07-18 15:57:40 UTC
Permalink
My mistake. I'm not running V4.0, I'm running V3.10-0 RC1.

After posting, it dawned on me that I should have tried SIMH V3.9.0, but
it fails to boot:

(BOOT/R5:0 DUA0



2..
-DUA0
1..0..

HALT instruction, PC: 00004C02 (HALT)
sim>

I'm not sure why. I'm using the KA655x.bin that came with V3.9.0 and a
new nvram.bin file, but everything else is the same as the V3.10-0 RC1
instance.

I just downloaded the current GitHub sources and compiled them (15fd71b
<https://github.com/simh/simh/commit/15fd71b97c8aaec29dc1bbbd3473c3f0d582c9ff>).
It boots, but I see the same behavior of losing connection to the cluster.

I also should have mentioned that this dedicated Ethernet card is
plugged into the same switch as all of the other cluster members, so
that /shouldn't/ be an issue.

Thanks.

Hunter
Dave Wade
2018-07-18 17:14:17 UTC
Permalink
Hunter,

Is it set to Autosense Speed and Duplex? Is it getting confused? Can it be set to a fixed speed.

Dave



From: Simh <simh-***@trailing-edge.com> On Behalf Of Hunter Goatley
Sent: 18 July 2018 16:58
To: Simh <***@trailing-edge.com>
Subject: Re: [Simh] Cluster communications errors



My mistake. I'm not running V4.0, I'm running V3.10-0 RC1.

After posting, it dawned on me that I should have tried SIMH V3.9.0, but it fails to boot:

(BOOT/R5:0 DUA0



2..
-DUA0
1..0..

HALT instruction, PC: 00004C02 (HALT)
sim>

I'm not sure why. I'm using the KA655x.bin that came with V3.9.0 and a new nvram.bin file, but everything else is the same as the V3.10-0 RC1 instance.

I just downloaded the current GitHub sources and compiled them ( <https://github.com/simh/simh/commit/15fd71b97c8aaec29dc1bbbd3473c3f0d582c9ff> 15fd71b). It boots, but I see the same behavior of losing connection to the cluster.

I also should have mentioned that this dedicated Ethernet card is plugged into the same switch as all of the other cluster members, so that shouldn't be an issue.

Thanks.

Hunter
Mark Pizzolato
2018-07-18 18:20:26 UTC
Permalink
Hi Hunter,

Maybe Dave has something, but maybe not.

I just booted a diskless simh VAX instance from a simh VAX system running on the same LAN.

The Second phase of the boot (after the MOP load of the Secondary boot loader) had a few retries until the boot succeeded and then the node came up and stayed up without issue. The booting system had this minimal configuration:

sim> set cpu 256
sim> set XQ mac=08-00-2b-11-22-44
sim> attach XQ eth2
sim> BOOT

Then entered BOOT XQ at the >>> prompt.

The simh host system in this case was running Windows. Just for grins, I tried the same thing from a Ubuntu 18.04 Linux system running in a VirtualBox VM on that same Windows host.

The simh Ethernet layer has dramatically more internal packet buffering (maybe 50 X) than anything real DEC hardware ever had. This might account for the relatively smooth behavior I’m seeing.
Meanwhile, from what you’ve mentioned it seems you’ve got a simh instance talking to real DEC hardware.

Using the 4.0 Current codebase, you might want to look at: HELP XQ CONFIG SET THROTTLE

You may also want to show us the simh VAX configuration file you are using



- Mark

From: Simh [mailto:simh-***@trailing-edge.com] On Behalf Of Dave Wade
Sent: Wednesday, July 18, 2018 10:14 AM
To: 'Hunter Goatley' <***@goatley.com>; 'Simh' <***@trailing-edge.com>
Subject: Re: [Simh] Cluster communications errors

Hunter,
Is it set to Autosense Speed and Duplex? Is it getting confused? Can it be set to a fixed speed.
Dave

From: Simh <simh-***@trailing-edge.com<mailto:simh-***@trailing-edge.com>> On Behalf Of Hunter Goatley
Sent: 18 July 2018 16:58
To: Simh <***@trailing-edge.com<mailto:***@trailing-edge.com>>
Subject: Re: [Simh] Cluster communications errors

My mistake. I'm not running V4.0, I'm running V3.10-0 RC1.

After posting, it dawned on me that I should have tried SIMH V3.9.0, but it fails to boot:

(BOOT/R5:0 DUA0







2..

-DUA0

1..0..



HALT instruction, PC: 00004C02 (HALT)

sim>
I'm not sure why. I'm using the KA655x.bin that came with V3.9.0 and a new nvram.bin file, but everything else is the same as the V3.10-0 RC1 instance.

I just downloaded the current GitHub sources and compiled them (15fd71b<https://github.com/simh/simh/commit/15fd71b97c8aaec29dc1bbbd3473c3f0d582c9ff>). It boots, but I see the same behavior of losing connection to the cluster.

I also should have mentioned that this dedicated Ethernet card is plugged into the same switch as all of the other cluster members, so that shouldn't be an issue.

Thanks.

Hunter
Hunter Goatley
2018-07-18 20:54:41 UTC
Permalink
Hi, Mark.
Post by Mark Pizzolato
Meanwhile, from what you’ve mentioned it seems you’ve got a simh
instance talking to real DEC hardware.
Yes, the switch isn't DEC, but most of the other nodes in the cluster
are real DEC hardware.
Post by Mark Pizzolato
Using the 4.0 Current codebase, you might want to look at: HELP XQ CONFIG SET THROTTLE
Interesting. Thanks. I just read that and enabled throttling, but just
with a sample SET XQ THROTTLE=ON. I'll see what that does.

Not enough. It has lost connection several times during the boot....  I
haven't studied the timings enough to have any idea what I might specify
for TIME, BURST, or DELAY.
Post by Mark Pizzolato
You may also want to show us the simh VAX configuration file you are using

Something else I meant to include:

load -r /usr/local/vax/data/ka655x.bin
attach nvr /usr/local/vax/data/nvram.bin
set cpu 256m
set rq0 ra92
attach rq0 /usr/local/vax/data/darth.vdisk
set rl disable
set ts disable
attach xq eth0
set xq throttle=on
dep bdr 0
boot cpu

Thanks!

Hunter
Warren Young
2018-07-18 21:33:15 UTC
Permalink
Post by Mark Pizzolato
The simh Ethernet layer has dramatically more internal packet buffering
(maybe 50 X) than anything real DEC hardware ever had. This might account
for the relatively smooth behavior I’m seeing.
More buffering can also mean more delay in the feedback loop that controls
the underlying protocols, leading to *worse* performance as buffer space
goes up.

This is called Buffer Bloat in the TCP sphere:

https://www.bufferbloat.net/

Perhaps the low-level protocols involved in VAX clustering have the same
issue? They may be expecting to get some kind of feedback response, which
is getting delayed through the buffering, which causes the real VAXen to
kick the fake one out, thinking it's gone MIA.
Mark Pizzolato
2018-07-18 21:44:41 UTC
Permalink
There are no deliberate buffering delays in the Ethernet layer. There merely is one thread which receives (and filters) packets and queues the potentially interesting ones. The available queue gets drained as fast as the simulated system happens to read the available data. This might affect some worst case situations, but overruns due to speed mismatches and limited capacity of the old physical hardware are much more likely to blame. Like I said, I’ve got multiple simulated LAVC nodes that can all talk just fine without the errors Hunter is seeing which if Bufferbloat was a factor might be worse there



- Mark

From: Simh [mailto:simh-***@trailing-edge.com] On Behalf Of Warren Young
Sent: Wednesday, July 18, 2018 2:33 PM
To: ***@trailing-edge.com
Subject: Re: [Simh] Cluster communications errors

On Wed, Jul 18, 2018 at 12:21 PM Mark Pizzolato <***@infocomm.com<mailto:***@infocomm.com>> wrote:

The simh Ethernet layer has dramatically more internal packet buffering (maybe 50 X) than anything real DEC hardware ever had. This might account for the relatively smooth behavior I’m seeing.

More buffering can also mean more delay in the feedback loop that controls the underlying protocols, leading to *worse* performance as buffer space goes up.

This is called Buffer Bloat in the TCP sphere:

https://www..net/<https://www.bufferbloat.net/>

Perhaps the low-level protocols involved in VAX clustering have the same issue? They may be expecting to get some kind of feedback response, which is getting delayed through the buffering, which causes the real VAXen to kick the fake one out, thinking it's gone MIA.
Hunter Goatley
2018-07-18 20:38:28 UTC
Permalink
Thanks, Dave. I meant to try to double-check the settings. I don't have
physical access to the system, so I'll ask someone to double-check the card
and the switch.

I know it's currently as set to autosense. I'll try forcing the speed and
duplex.

Thanks!

Hunter
-------
Post by Dave Wade
Hunter,
Is it set to Autosense Speed and Duplex? Is it getting confused? Can it be
set to a fixed speed.
Dave
Sent: 18 July 2018 16:58
Subject: Re: [Simh] Cluster communications errors
My mistake. I'm not running V4.0, I'm running V3.10-0 RC1.
(BOOT/R5:0 DUA0
2..
-DUA0
1..0..
HALT instruction, PC: 00004C02 (HALT)
sim>
I'm not sure why. I'm using the KA655x.bin that came with V3.9.0 and a new
nvram.bin file, but everything else is the same as the V3.10-0 RC1 instance.
I just downloaded the current GitHub sources and compiled them (
<https://github.com/simh/simh/commit/15fd71b97c8aaec29dc1bbbd3473c3f0d582c9ff>
15fd71b). It boots, but I see the same behavior of losing connection to the
cluster.
I also should have mentioned that this dedicated Ethernet card is plugged
into the same switch as all of the other cluster members, so that shouldn't
be an issue.
Thanks.
Hunter
Hunter Goatley
2018-07-18 21:03:57 UTC
Permalink
Post by Hunter Goatley
I know it's currently as set to autosense. I'll try forcing the speed
and duplex.
I was told:

The router is reporting that the port auto-sensed 1Gbit duplex, but
I just manually forced it to that to be sure.

No change in behavior, unfortunately.

Hunter
Paul Koning
2018-07-18 21:18:57 UTC
Permalink
I know it's currently as set to autosense. I'll try forcing the speed and duplex.
The router is reporting that the port auto-sensed 1Gbit duplex, but I just manually forced it to that to be sure.
No change in behavior, unfortunately.
Hunter
You mentioned that some of this is real hardware and some is simulated. It might be helpful to post a map showing the setup, including interface models, link speeds, and switch models.

Are the interface speeds all the same? LAVC was built for 10 Mbps Ethernet, and while running it faster should be ok, running mixed speeds may create more congestion than the protocol is comfortable with. While any Ethernet protocol has to handle packet loss, some protocols assume packet loss is rare. DECnet wouldn't, but the cluster protocols (and LAT, for that matter) do.

Is there any way to show packet loss counts? Can you run DECnet, and if you put a significant load on DECnet connections, do the DECnet counters show any errors?

paul
Mark Pizzolato
2018-07-18 21:27:29 UTC
Permalink
I know it's currently as set to autosense. I'll try forcing the speed and duplex.
The router is reporting that the port auto-sensed 1Gbit duplex, but I just
manually forced it to that to be sure.
It might be better to hard set the Linux simh host system's port to 10Mbit
on the switch. That would help with the potential for overrunning the original
DEC hardware...

- Mark
Paul Koning
2018-07-18 21:32:11 UTC
Permalink
Post by Mark Pizzolato
I know it's currently as set to autosense. I'll try forcing the speed and duplex.
The router is reporting that the port auto-sensed 1Gbit duplex, but I just
manually forced it to that to be sure.
It might be better to hard set the Linux simh host system's port to 10Mbit
on the switch. That would help with the potential for overrunning the original
DEC hardware...
DEC hardware tends to handle line rate traffic; a lot of other Ethernet hardware does not, especially not earlier models. I remember arguing with the DECnet/DOS folks that no, we would not modify the DECnet architecture to handle the single buffer "design" of the 3c501.

But if you have speed mismatches, you're likely to have congestion loss, unless the bursts are less than the switch buffer quota. Some switches have thousands of buffers; other (inexpensive) ones have only a surprisingly small number and can easily give you congestion loss.

paul
Mark Pizzolato
2018-07-18 21:36:15 UTC
Permalink
Post by Paul Koning
Post by Mark Pizzolato
Post by Hunter Goatley
I know it's currently as set to autosense. I'll try forcing the speed and duplex.
The router is reporting that the port auto-sensed 1Gbit duplex, but
I just manually forced it to that to be sure.
It might be better to hard set the Linux simh host system's port to
10Mbit on the switch. That would help with the potential for
overrunning the original DEC hardware...
DEC hardware tends to handle line rate traffic; a lot of other Ethernet
hardware does not, especially not earlier models. I remember arguing with the
DECnet/DOS folks that no, we would not modify the DECnet architecture to
handle the single buffer "design" of the 3c501.
But if you have speed mismatches, you're likely to have congestion loss, unless
the bursts are less than the switch buffer quota. Some switches have
thousands of buffers; other (inexpensive) ones have only a surprisingly small
number and can easily give you congestion loss.
Well, not all systems and hardware can actually handle back to back
packets even at 10Mbits. The XQ THROTTLING is based on the throttling that
Johnny Billquist implemented in his bridge which was needed to allow his
physical systems to be able to communicated with simulated systems without
crazy packet loss...

- Mark
Kevin Handy
2018-07-18 22:09:31 UTC
Permalink
When you are looking for packet loss/errors, are you just looking inside
the simulator, or are you also checking the host machine?
Your host OS may be hiding errors, giving you "cleaned up" traffic.
Post by Hunter Goatley
Post by Paul Koning
Post by Mark Pizzolato
Post by Hunter Goatley
Post by Hunter Goatley
I know it's currently as set to autosense. I'll try forcing the
speed and duplex.
The router is reporting that the port auto-sensed 1Gbit duplex, but
I just manually forced it to that to be sure.
It might be better to hard set the Linux simh host system's port to
10Mbit on the switch. That would help with the potential for
overrunning the original DEC hardware...
DEC hardware tends to handle line rate traffic; a lot of other Ethernet
hardware does not, especially not earlier models. I remember arguing
with the
Post by Paul Koning
DECnet/DOS folks that no, we would not modify the DECnet architecture to
handle the single buffer "design" of the 3c501.
But if you have speed mismatches, you're likely to have congestion loss,
unless
Post by Paul Koning
the bursts are less than the switch buffer quota. Some switches have
thousands of buffers; other (inexpensive) ones have only a surprisingly
small
Post by Paul Koning
number and can easily give you congestion loss.
Well, not all systems and hardware can actually handle back to back
packets even at 10Mbits. The XQ THROTTLING is based on the throttling that
Johnny Billquist implemented in his bridge which was needed to allow his
physical systems to be able to communicated with simulated systems without
crazy packet loss...
- Mark
_______________________________________________
Simh mailing list
http://mailman.trailing-edge.com/mailman/listinfo/simh
Johnny Billquist
2018-07-18 23:18:21 UTC
Permalink
Post by Mark Pizzolato
Post by Paul Koning
Post by Mark Pizzolato
It might be better to hard set the Linux simh host system's port to
10Mbit on the switch. That would help with the potential for
overrunning the original DEC hardware...
DEC hardware tends to handle line rate traffic; a lot of other Ethernet
hardware does not, especially not earlier models. I remember arguing with the
DECnet/DOS folks that no, we would not modify the DECnet architecture to
handle the single buffer "design" of the 3c501.
But if you have speed mismatches, you're likely to have congestion loss, unless
the bursts are less than the switch buffer quota. Some switches have
thousands of buffers; other (inexpensive) ones have only a surprisingly small
number and can easily give you congestion loss.
Well, not all systems and hardware can actually handle back to back
packets even at 10Mbits. The XQ THROTTLING is based on the throttling that
Johnny Billquist implemented in his bridge which was needed to allow his
physical systems to be able to communicated with simulated systems without
crazy packet loss...
It's probably worth pointing out that the reason I implemented that was
not because of hardware problems, but because of software problems.
DECnet can degenerate pretty badly when packets are lost. And if you
shove packets fast enough at the interface, the interface will
(obviously) eventually run out of buffers, at which point packets will
be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how
to improve that software, but I have not had enough time to actually try
fixing it. And it is especially noticeable when doing file transfers
over DECnet.

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Paul Koning
2018-07-19 00:07:26 UTC
Permalink
...
It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
All ARQ protocols suffer dramatically with packet loss. The other day I was reading a recent paper about high speed long distance TCP. It showed a graph of throughput vs. packet loss rate. I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop. Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput. Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.

The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.

So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.

paul
Johnny Billquist
2018-07-19 00:22:26 UTC
Permalink
Post by Paul Koning
...
It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
All ARQ protocols suffer dramatically with packet loss. The other day I was reading a recent paper about high speed long distance TCP. It showed a graph of throughput vs. packet loss rate. I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop. Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput. Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.
The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.
So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.
Right. That said, TCP behaves extremely much better than DECnet here. At
least if we talk about TCP with the ability to deal with out of order
packets (which most should do) and DECnet under RSX. The problem with
DECnet under RSX is that recovering from a lost packet because of
congestion essentially guarantees that congestion will happen again,
while TCP pretty quickly comes into a steady working state.

I have not analyzed other DECnet implementation enough to tell for sure
if they also exhibit the same problem.

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Paul Koning
2018-07-19 00:29:18 UTC
Permalink
Post by Paul Koning
...
It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
All ARQ protocols suffer dramatically with packet loss. The other day I was reading a recent paper about high speed long distance TCP. It showed a graph of throughput vs. packet loss rate. I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop. Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput. Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.
The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.
So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.
Right. That said, TCP behaves extremely much better than DECnet here. At least if we talk about TCP with the ability to deal with out of order packets (which most should do) and DECnet under RSX. The problem with DECnet under RSX is that recovering from a lost packet because of congestion essentially guarantees that congestion will happen again, while TCP pretty quickly comes into a steady working state.
Out of order packet handling isn't involved in that. Congestion doesn't reorder packets. If you drop a packet, TCP and DECnet both force the retransmission of all packets starting with the dropped one. (At least, I don't think selective ACK is used in TCP.) DECnet described out of order packet caching for the same reason TCP does: to work efficiently in packet topologies that have multiple paths in which the routers do equal cost path splitting. In DECnet, that support is optional; it's not in DECnet/E and I wouldn't expect it in other 16-bit platforms either.
I have not analyzed other DECnet implementation enough to tell for sure if they also exhibit the same problem.
Another consideration is that TCP has seen another 20 years of work on congestion control since DECnet Phase IV. But in any case, it may well be that VMS handles these things better. It's also possible that DECnet/OSI does, since it is newer and was designed right around the time that DEC very seriously got into congestion control algorithm research. Phase IV isn't so well developed; it largely predates that work.

paul
Johnny Billquist
2018-07-19 00:53:52 UTC
Permalink
Post by Paul Koning
Post by Paul Koning
...
It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
All ARQ protocols suffer dramatically with packet loss. The other day I was reading a recent paper about high speed long distance TCP. It showed a graph of throughput vs. packet loss rate. I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop. Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput. Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.
The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.
So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.
Right. That said, TCP behaves extremely much better than DECnet here. At least if we talk about TCP with the ability to deal with out of order packets (which most should do) and DECnet under RSX. The problem with DECnet under RSX is that recovering from a lost packet because of congestion essentially guarantees that congestion will happen again, while TCP pretty quickly comes into a steady working state.
Out of order packet handling isn't involved in that. Congestion doesn't reorder packets. If you drop a packet, TCP and DECnet both force the retransmission of all packets starting with the dropped one. (At least, I don't think selective ACK is used in TCP.) DECnet described out of order packet caching for the same reason TCP does: to work efficiently in packet topologies that have multiple paths in which the routers do equal cost path splitting. In DECnet, that support is optional; it's not in DECnet/E and I wouldn't expect it in other 16-bit platforms either.
This is maybe getting too technical, so let me know if we should take
this off list.

Yes, congestion does not reorder packets. However, if you cannot handle
out of order packets, you have to retransmit everything from the point
where a packet was lost.
If you can deal with packets out of order, you can keep the packets you
received, even though there is a hole, and once that hole is plugged,
you can ACK everything. And this is pretty normal in TCP, even without
selective ACK.

So, in TCP, what normally happens is that a node is spraying packets as
fast as it can. Some packets are lost, but not all of them. Including
some holes in the sequence of received packets.
TCP will after some time, or other heuristics, start retransmitting from
the point where packets were lost, and as soon as the receiving end have
plugged the hole, it will jump forward with the ACKs, meaning the sender
does not need to retransmit everything. Even more, if the sender does
retransmit everything, loosing some of those retransmitted packets will
not matter, since the receiver already have them anyway. At some point,
you will get to a state where the receiver have no window open, so the
transmitter is getting blocked, and every time the receiver opens up a
window, which usually is just a packet or two in size, the transmitter
can send that much data. But this much data is usually less than the
number of buffers the hardware have, so there are no problems receiving
those packets, and TCP gets into a steady state where the transmitter
can transmit packets as fast as the receiver can consume them, and apart
from a few lost packets in the early stages, no packets are lost.

DECnet (at least in RSX) on the other hand will transmit a whole bunch
of packets. The first few will get through, but at some point one or
several are lost. After some time, DECnet decides that packets were
lost, and will back up and start transmitting again from the point where
the packets were lost. Once more it will soon blast more packets than
the receiver can process, and you will once more get a timeout
situation. DECnet is backing off on the timeouts every time this
happens, and soon you are at a horrendous 127s timeout for pretty much
every other packet sent, meaning in effect you are only managing to send
one packet every 127s. This is worsened, I think, by something that
looks like a bug in the NFT/FAL code in RSX, where the code assumes it
is faster than the packet transfer rate, and can manage to do a few
things before two packets have been received. How much is to blame on
DECnet in general, and how much on NFT/FAL, I'm not entirely clear. Like
I said, I have not had time to really test things around this.
But it's very easy to demonstrate the problem. Just setup an old PDP-11
and a simh (or similar) machine on the same DECnet, and try to transfer
a larger file to the real PDP-11, and check network counters and observe
how thing immediately go to a standstill.

Which is why I implemented the throttling in the bridge, which Mark
mentioned.

As far as path splitting goes, it is implemented in RSX-11M-PLUS, but
disabled. I tried enabling it once, but the system crashed. The manuals
have it documented, but I'm wondering if DEC never actually completed
the work.
Post by Paul Koning
I have not analyzed other DECnet implementation enough to tell for sure if they also exhibit the same problem.
Another consideration is that TCP has seen another 20 years of work on congestion control since DECnet Phase IV. But in any case, it may well be that VMS handles these things better. It's also possible that DECnet/OSI does, since it is newer and was designed right around the time that DEC very seriously got into congestion control algorithm research. Phase IV isn't so well developed; it largely predates that work.
Well, this isn't really about congestion control so much as just being
able to handle out of order packets. Although congestion control could
certainly also be applied to alleviate the problem.

I know that OSI originally stated the same basic assumption DECnet have
- links are 100% reliable and never drops or reorder packets.
A very bad assumption to build protocols on, and OSI eventually also
defined links and operations based on technology where these assumptions
were not true. So I would hope/assume that DECnet/OSI eventually got
better. But I strongly suspect it was not the case from the start.

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Hunter Goatley
2018-07-20 03:18:20 UTC
Permalink
Another data point. After more playing around and several reboots, I can
confirm that with tunneling using the host system's Ethernet device,
communications with other cluster members /only/ drops when DECnet is
started.

%%%%%%%%%%% OPCOM 19-JUL-2018 23:14:55.58 %%%%%%%%%%%
Message from user DECNET on DARTH
DECnet starting

%CNXMAN, lost connection to system QUEST
%CNXMAN, lost connection to system GALAXY
%CNXMAN, re-established connection to system FASTER
%CNXMAN, quorum lost, blocking activity
%CNXMAN, re-established connection to system VADER
%CNXMAN, re-established connection to system QUEST
%CNXMAN, quorum regained, resuming activity

That's not a full log, but as soon as I see the OPCOM message about
DECnet starting, I get the "lost connection" messages, then the
"re-established" messages, and then everything is fine afterward.

Hunter
Mark Pizzolato
2018-07-20 03:34:14 UTC
Permalink
Post by Hunter Goatley
Another data point. After more playing around and several reboots,
I can confirm that with tunneling using the host system's Ethernet
device, communications with other cluster members only drops
when DECnet is started.
%%%%%%%%%%% OPCOM 19-JUL-2018 23:14:55.58 %%%%%%%%%%%
Message from user DECNET on DARTH
DECnet starting
%CNXMAN, lost connection to system QUEST
%CNXMAN, lost connection to system GALAXY
%CNXMAN, re-established connection to system FASTER
%CNXMAN, quorum lost, blocking activity
%CNXMAN, re-established connection to system VADER
%CNXMAN, re-established connection to system QUEST
%CNXMAN, quorum regained, resuming activity
That's not a full log, but as soon as I see the OPCOM message about
DECnet starting, I get the "lost connection" messages, then the "re-established"
messages, and then everything is fine afterward.
The improvement by setting the port speed to 10Mbit suggests
that packet loss/overruns are happening and they are reduced
by limiting the wire speed.

If this wasn't a cluster, I say that DECnet starting might have
caused XQ device's MAC address to be changed around that
time to reflect the DECnet Phase IV address switch that is done.
Which might then have some effect on the switch's learning
of MAC addresses... However, in a cluster this change is done
when the LAN device is first brought online with info in
SYSGEN parameter (SCS_SYSTEMID).

The arrival of DECnet's traffic might be causing a burst of traffic
that still ends up overrunning another systems ability to receive
it. Do things change if you throttle the simh VAX down?

sim> SET CPU NOIDLE
sim> SET THROTTLE 25%

- Mark
Hunter Goatley
2018-07-20 12:30:42 UTC
Permalink
Post by Mark Pizzolato
The improvement by setting the port speed to 10Mbit suggests
that packet loss/overruns are happening and they are reduced
by limiting the wire speed.
Agreed, though nothing ever indicated any errors or overruns: not the
switch, not NCP or LANCP on any nodes.
Post by Mark Pizzolato
The arrival of DECnet's traffic might be causing a burst of traffic
that still ends up overrunning another systems ability to receive
it. Do things change if you throttle the simh VAX down?
sim> SET CPU NOIDLE
sim> SET THROTTLE 25%
Wow. That was a flashback to 1987, when I was working on a VAX 11/730
with four other developers at the same time. ;-) We all got lots of
pleasure-reading done waiting for product builds....

Continued this morning: I ended up going to bed, it was taking so long.
I woke this morning to find that the startup took about four hours to
complete, and it had spent the next three hours losing and
re-establishing communications every 40 seconds. I'm guessing the system
was /so/ slow that it didn't respond fast enough to suit the other members.

So I took it down again and did SET THROTTLE 80%.  Still considerably
slower, but workable. And as soon as DECnet started, it lost
communication and re-established it. It's now two minutes farther into
the boot with no further drops.

It drops between the "Starting DECnet" OPCOM message and the first
"adjacency up" OPCOM message. After that, all is well.

Thanks.

Hunter
Mark Pizzolato
2018-07-20 15:42:32 UTC
Permalink
[...]
So I took it down again and did SET THROTTLE 80%.  Still considerably
slower, but workable. And as soon as DECnet started, it lost
communication and re-established it. It's now two minutes farther
into the boot with no further drops.
FYI: Throttling is merely part of identifying what is causing the problem.

Unless your simh VAX cluster member is VERY busy, throttling at 80%
will probably use at least 15 times more host system CPU cycles than
Idling. The 80% number will really use 80% of one CPU core continuously
even when nothing at all is going on in the running VMS environment.

- Mark
Paul Koning
2018-07-20 12:58:35 UTC
Permalink
Another data point. After more playing around and several reboots, I can confirm that with tunneling using the host system's Ethernet device, communications with other cluster members only drops when DECnet is started.
%%%%%%%%%%% OPCOM 19-JUL-2018 23:14:55.58 %%%%%%%%%%%
Message from user DECNET on DARTH
DECnet starting
%CNXMAN, lost connection to system QUEST
%CNXMAN, lost connection to system GALAXY
%CNXMAN, re-established connection to system FASTER
%CNXMAN, quorum lost, blocking activity
%CNXMAN, re-established connection to system VADER
%CNXMAN, re-established connection to system QUEST
%CNXMAN, quorum regained, resuming activity
That's not a full log, but as soon as I see the OPCOM message about DECnet starting, I get the "lost connection" messages, then the "re-established" messages, and then everything is fine afterward.
Is that the Ethernet interface down/up that happens when DECnet sets the MAC address? I assume you don't have a card that supports multiple MAC addresses.

On the USB thing: USB bridge things are often consumer grade devices, and while they may "work" in the sense that you can get a packet in and out, I would not necessarily expect them to behave sanely under any nontrivial load. The same way I would not expect to run a cluster on a $50 Ethernet switch.

Good to hear things are looking better now.

paul
Hunter Goatley
2018-07-20 13:05:20 UTC
Permalink
Post by Paul Koning
Is that the Ethernet interface down/up that happens when DECnet sets the MAC address? I assume you don't have a card that supports multiple MAC addresses.
It probably is. That makes total sense, and I should have realized that.
Post by Paul Koning
On the USB thing: USB bridge things are often consumer grade devices, and while they may "work" in the sense that you can get a packet in and out, I would not necessarily expect them to behave sanely under any nontrivial load. The same way I would not expect to run a cluster on a $50 Ethernet switch.
True. We have some USB dongles we've used with CHARON-VAX for years
without incident, but I don't even know if these are the same brand
dongles. Even if they are, that doesn't mean anything, of course.

I'm not a hardware kind of guy, so I tend to miss some of the obvious
things, like remembering that the "dedicated card" is a USB dongle of
unknown make. ;-)
Post by Paul Koning
Good to hear things are looking better now.
Thank you all for your help!

Hunter
Mark Pizzolato
2018-07-20 14:20:23 UTC
Permalink
Post by Hunter Goatley
Post by Paul Koning
Is that the Ethernet interface down/up that happens when DECnet sets the
MAC address? I assume you don't have a card that supports multiple MAC
addresses.
It probably is. That makes total sense, and I should have realized that.
I'm quite sure that the MAC address change happens much earlier than
the DECnet startup when a VMS cluster is configured on the system. As
soon as the booting system gets its SYSGEN parameters and knows its
SCSSYSTEMID it 1) has enough info to set the DECnet MAC address and
2) is capable of engaging in cluster communications using this ID (and MAC).
Post by Hunter Goatley
Post by Paul Koning
On the USB thing: USB bridge things are often consumer grade devices, and
while they may "work" in the sense that you can get a packet in and out, I
would not necessarily expect them to behave sanely under any nontrivial load.
The same way I would not expect to run a cluster on a $50 Ethernet switch.
True. We have some USB dongles we've used with CHARON-VAX for years
without incident, but I don't even know if these are the same brand
dongles. Even if they are, that doesn't mean anything, of course.
I'm not a hardware kind of guy, so I tend to miss some of the obvious
things, like remembering that the "dedicated card" is a USB dongle of
unknown make. ;-)
I agree with Paul completely here and I wonder, at least for the sake of proving
the USB device is or is not a factor, why not merely share the host system's
primary LAN. Nothing special to get this to work except changing the
ATTACH XQ argument in the configuration file. When using that LAN interface,
without jumping through hoops configuring internal bridging, the host won't
be able to talk to the simh VAX instance, but I suspect that may not be a high
priority.
Post by Hunter Goatley
So I took it down again and did SET THROTTLE 80%. Still considerably slower,
but workable. And as soon as DECnet started, it lost communication and
re-established it. It's now two minutes farther into the boot with no further
drops.
It drops between the "Starting DECnet" OPCOM message and the first
"adjacency up" OPCOM message. After that, all is well.
I would be quite surprised if a USB LAN device actually provided reliable
status/statistic information to the host it is connected to beyond the
basics of link connection state and/or speed settings.

- Mark
Hunter Goatley
2018-07-20 15:58:57 UTC
Permalink
Post by Mark Pizzolato
I agree with Paul completely here and I wonder, at least for the sake of proving
the USB device is or is not a factor, why not merely share the host system's
primary LAN. Nothing special to get this to work except changing the
ATTACH XQ argument in the configuration file. When using that LAN interface,
without jumping through hoops configuring internal bridging, the host won't
be able to talk to the simh VAX instance, but I suspect that may not be a high
priority.
No, it's not, and I hadn't done that at first because I didn't remember
seeing the host system's primary LAN device when I first started. I must
have just overlooked it, because it's there now, and I just booted using
it (attach xq eth0). (I was probably so bent on using the dedicated
device that I overlooked the primary device when I did SHOW ETHER.)

I was also mistaken about the dedicated device. It's not a USB device.
It's actually an Intel PCI-X card that it's in the host system. Which
now makes me even more confused. ;-)

System booted using the system's primary device and is running fine,
though it still had the drop/re-establish when DECnet was started. But
everything else is working just fine. No subsequent drops, DECnet,
TCP/IP, and clustering all working as expected.

We're just going to pull the second card and run off the primary LAN device.

Thanks again for all your help!

Hunter
Johnny Billquist
2018-07-20 20:53:33 UTC
Permalink
Post by Mark Pizzolato
Post by Hunter Goatley
Post by Paul Koning
Is that the Ethernet interface down/up that happens when DECnet sets the
MAC address? I assume you don't have a card that supports multiple MAC
addresses.
It probably is. That makes total sense, and I should have realized that.
I'm quite sure that the MAC address change happens much earlier than
the DECnet startup when a VMS cluster is configured on the system. As
soon as the booting system gets its SYSGEN parameters and knows its
SCSSYSTEMID it 1) has enough info to set the DECnet MAC address and
2) is capable of engaging in cluster communications using this ID (and MAC).
Well, I can at least confirm on a real 8650, the network also briefly
goes down and up again when DECnet is started. If I remember it even
happens independent of if you have the machine in a cluster or not.

If someone is really interested I can boot the machine up to VMS and
capture the output. I can also do some other tests and checks if anyone
is interested.

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Hunter Goatley
2018-07-24 15:18:08 UTC
Permalink
Post by Johnny Billquist
Well, I can at least confirm on a real 8650, the network also briefly
goes down and up again when DECnet is started. If I remember it even
happens independent of if you have the machine in a cluster or not.
Thanks.

As a followup, the SIMH instance has been running without incident,
other than the single drop when DECnet is started, for over four days
now. Everything has been rock-solid. Apparently, that Intel network card
had some issue that was causing what I was seeing.

Thanks again for all the replies. It was a most enlightening discussion,
and I have a much better handle on how SIMH works now.

Hunter

Dave L
2018-07-20 08:05:11 UTC
Permalink
Hi Hunter

I run an ESXi host on a USDT system and use a USB3 LAN dongle to give me a
seperate network for user/management traffic so I can use the onboard one
for iSCSI. This was done following the artivle here:

https://www.virtuallyghetto.com/2016/03/working-usb-ethernet-adapter-nic-for-esxi.html

I note that that USB interface can be dropping packets all the time, not a
big problem if the protocols can handle that and RDP etc suffers no real
issues. But running something like TotalNetworkMonitor on a VM there you
do see that there are up to 50% or so ping packets lost in its probes.

Could be that you are seeing a similar behaviour where the protocol
doesn't handle lost packets too well...

regards
Dave



On Fri, 20 Jul 2018 03:15:34 +0100, Hunter Goatley
Here's where we stand on our cluster communications errors: nothing we
did worked. We tried different ports on the switch. We tried forcing
1Gbps. >We tried forcing the port down to 10 Mbps. That actually seemed
to help slightly, in that we only lost communications every 63 seconds
or so, >instead of every 15--60 seconds. But it would lose and
re-establish connection to the cluster every 63 seconds.
So I decided to try setting up and using a TAP device, just to see what
would happen.
Using the dedicated Ethernet card, it made no difference. It still lost
communications every 63 seconds.
When I say dedicated Ethernet card, I probably should have stated
earlier that it's a USB -> Ethernet device plugged into the system. I
don't know >what brand or model, but I can find out, if anyone wants to
know.
So I decided to try tunneling through the "real" Ethernet port used by
the Linux system. After figuring out what to do for the missing tunctl
command >under CentOS, I was able to set up a tunnel, and I did "attach
xq tap:tap0". I then booted the system and wonder of wonders, miracle of
miracles, it >was seven minutes into the boot (yes, it takes a long
time, mounting a slew of disks that needed to be rebuilt) before it lost
communications. But it re->established them immediately, and as of my
typing this, it was been twenty-nine minutes since that happened. No
further drops. Normally, I wouldn't >think twenty-nine minutes is enough
to prove anything, but when it was dropping every 15--63 seconds for two
solid days, this sounds like a fix to >me.
So what does it mean? One thing it suggests is that the USB Ethernet
device may be buggy or bad. I mean, it seems to work OK for TCP/IP
communications, etc, but it sure sounds like it may be the part
responsible for the problems. Especially since tunneling through the
built-in Ethernet >card seems to work and tunneling through the USB
device did not.
brctl addbr br0
ifconfig eno1 0.0.0.0 ; eno1 is the host's Ethernet device
ifconfig br0 XXX.XX.XX.XX up ; the IP address of the host system
brctl addif br0 eno1
brctl setfd br0 0
#tunctl -t tap0
ip tuntap add tap0 mode tap ; Replacement for tunctl on CentOS 7
brctl addif br0 tap0
ifconfig tap0 up
I then just did "xq attach tap:tap0" in the init file. I guess I should
set up a special MAC address, but I haven't yet, and so far, nothing
seems amiss.
While I thought having a dedicated Ethernet device would be the simplest
thing, I can live with tunneling it through the shared Ethernet device,
especially since it works and the former does not. ;-)
Thank you for all of your input over the past couple of days, and thank
you for all of your work on SIMH!
Hunter
--
Hunter Goatley
2018-07-20 12:36:52 UTC
Permalink
Hi, Dave.
Post by Dave L
I run an ESXi host on a USDT system and use a USB3 LAN dongle to give
me a seperate network for user/management traffic so I can use the
https://www.virtuallyghetto.com/2016/03/working-usb-ethernet-adapter-nic-for-esxi.html
Thanks for the link!
Post by Dave L
I note that that USB interface can be dropping packets all the time,
not a big problem if the protocols can handle that and RDP etc suffers
no real issues. But running something like TotalNetworkMonitor on a VM
there you do see that there are up to 50% or so ping packets lost in
its probes.
Could be that you are seeing a similar behaviour where the protocol
doesn't handle lost packets too well...
Yeah, that's what it sounds like. I'll try to run some other tests on
the USB dongle to see if I see anything else odd.

Thanks!

Hunter
Hunter Goatley
2018-07-18 21:31:32 UTC
Permalink
Post by Paul Koning
You mentioned that some of this is real hardware and some is simulated. It might be helpful to post a map showing the setup, including interface models, link speeds, and switch models.
I'll have to see about getting that. I think I mentioned that I'm not
physically located with the equipment.
Post by Paul Koning
Are the interface speeds all the same? LAVC was built for 10 Mbps Ethernet, and while running it faster should be ok, running mixed speeds may create more congestion than the protocol is comfortable with.
Good point.
Post by Paul Koning
Is there any way to show packet loss counts? Can you run DECnet, and if you put a significant load on DECnet connections, do the DECnet counters show any errors?
The counters I've checked via NCP and LANCP show no errors, no
collisions, no overruns.

Mark wrote:

It might be better to hard set the Linux simh host system's port to 10Mbit
on the switch. That would help with the potential for overrunning the original
DEC hardware...

I just asked my colleague to try forcing that.

And I take that back about turning on throttling not making a
difference. It has made a difference---the system is no longer coming
all the way up. I'm not sure why, as the reasons long ago scrolled away
because of all of the "lost connection" messages. I didn't think to
record them.

Thanks!

Hunter
Larry Baker
2018-07-21 01:48:48 UTC
Permalink
Message: 1
Date: Fri, 20 Jul 2018 22:53:33 +0200
Subject: Re: [Simh] Cluster communications errors
Content-Type: text/plain; charset=utf-8; format=flowed
Post by Mark Pizzolato
Post by Hunter Goatley
Post by Paul Koning
Is that the Ethernet interface down/up that happens when DECnet sets the
MAC address? I assume you don't have a card that supports multiple MAC
addresses.
It probably is. That makes total sense, and I should have realized that.
I'm quite sure that the MAC address change happens much earlier than
the DECnet startup when a VMS cluster is configured on the system. As
soon as the booting system gets its SYSGEN parameters and knows its
SCSSYSTEMID it 1) has enough info to set the DECnet MAC address and
2) is capable of engaging in cluster communications using this ID (and MAC).
Well, I can at least confirm on a real 8650, the network also briefly
goes down and up again when DECnet is started. If I remember it even
happens independent of if you have the machine in a cluster or not.
Yeah, when I read Mark's comments I was thinking DECnet would still do what DECnet always does, which is change the MAC address to match the DECnet node number. There is no particular advantage in checking first whether that has already been done. The SCSSYSTEMID has to match the DECnet node number is all. Whether one or both actually set up the hardware MAC address is not specified, from what I recall.
If someone is really interested I can boot the machine up to VMS and
capture the output. I can also do some other tests and checks if anyone
is interested.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
pdp is alive! || tryin' to stay hip" - B. Idol
Larry Baker
US Geological Survey
650-329-5608
***@usgs.gov
Loading...