TCP connections don't time out

We have two apps running on a solaris x86 host which connects via TCP/IP to two serial ports (configured prn with no flow control) on a TS4. This is a protocol link to a third party.

Tty termtype dev sess uid edelay auto bin group dport dest

1 vt100 prn 4 none 1 off off none none none
2 vt100 prn 4 none 1 off off none none none
3 vt100 term 4 none 1 off off none none none
4 vt100 term 4 none 1 off off none none none

If the solaris host is powered off suddenly or reset, when it comes up again it will not connect to the TS4, because it now has different ephemeral port numbers. (a SYN from the host is replied with an RST from the TS4).

Also the problem sometimes occurs on only one port connection, with unknown cause.

The TS4 shows (using ‘who’) that it already has two existing tty processes running and connected to ports 2101, 2102, I assume waiting for data (that will never come) from the old ephemeral ports on the host.

>>> These tty processes never time out, and I see no config option for this, eg such as inactivity time-out. <<<

I have configured TCP keepalives to reset the connection when the host socket disappears, and this works, but the minimum recovery time I can configure is about 50 secs, which is too long.

TCP KeepAlive
Active : on
Byte : on
Idle : 00:00:10 (Minimum time)

TCP Probe
Count : 5 (Minimum)
Interval : 10 seconds (Minimum)

Is there another way to do this which will result in recovery times < 10 seconds?
Is this a bug, or, can you enhance your firmware to provide a timeout config option, or change the keepalive minimum values?

I have tried this on release_82000716_K and release_82000716_J firmware.

If I remember correctly, anything less than 50 is a violation of RFC.

You may want to consider generating some sort of script to kill the socket connections in the event the Solaris server reboots.

I typically use expect, it is the easiest for compiling a simple script.

Thanks,
RFC1122 (4.2.3.6 TCP Keep-Alives) specifies the default must be no less than 2 hours, but I couldn’t find an RFC that specified minimum configurable parameters. Do you have the reference?

RFC1122 and RFC2525 point to potential problems with small timeouts but these are more concerned with devices on a large network experiencing delays and the connections being prematurely terminated.

In our application, the TS4 is on a dedicated 10/100baseT link to the Solaris host; it is acting more as a local peripheral to provide serial ports, and bandwidth and delays aren’t a big issue.

The script idea was considered, but we thought it would be better solved using configuration if possible.

We are using 3x TS16 units, and 12x TS4 units in this configuration in our client’s system.

Sorry, I do not know the specific RFC involved.

I do know that a value less than 50 seconds is not possible in the configuration of these products.

One other possibility might be the use of “port sharing” so that a socket doesn’t block if its been orphaned by a rebooted server/lost connection/etc. See the “set sharing” command in the command reference. You’d probably want to change the number of clients from the default of 1 to 2.

I believe the TS4 does not comply with RFC793 section 3.4 under “Half-open connections and other anomalies”. http://www.faqs.org/rfcs/rfc793.html [1]

Half open connections are supposed to autorecover as follows:

In figure 10 of [1], the crashed host (TCP-A) attempts to reconnect, and sends a SYN, (eg with SEQ=0, ACK=0 in our case) - step 3 in fig 10
The sequence number 0 is outside the current TS4 TCP window, so the TS4 (TCP-B) should respond with an acknowledgement indicating what sequence it next expects to hear (eg, ACK=1234) - step 4
The host TCP-A sees that this segment does not acknowledge anything it sent for any existing connection, detects this as a half-open connection, and sends an RST, SEQ=1234 to the TS4 which aborts the connection.

What actually happens is, instead of the ACK in step 4, the TS4 sends an immediate RST, ACK=0, SEQ=0, and prevents the host from detecting the half open port as it never learns the next sequence number and in any case the RST from the TS4 aborts the whole sequence.

I assume the TS4 is resetting the connection on the basis that it specifically dis-allows port-sharing, ie one exclusive instance of the tty process which is “mapped” to the specific user port on the host. The TS4 cannot create a new connection because its port is in exclusive use, and sends an RST without aborting the current tty process.

However, this method defeats the detection and recovery of half-open ports as per RFC 793.

Thanks Michael,
I thought of that too, (possibly in conjunction with keepalives, the host would re-connect on the new socket while the old one was timing out,) but the TS4 doesn’t support port sharing. Only TS8, TS16, and the MEI types support it.

I still think the tty task inside the portserver should be configurable with a connection idle timeout set by the user. Surely this is a common issue?

Looks like a script will have to be the solution, or put up with the 50 sec keepalive delay.

You’re right, I was thinking TS4 MEI. Sorry about the misinformation.