RCM4300 and loss of TCP connection

Hello,

My configuration is :

  • RCM4300
  • Dynamic C 10.72E

On a new system delivered to one of our customers last year, we are experiencing problems with TCP connection losses.
The programme is fairly simple, as it receives commands via TCP or the embedded website and changes the relay status.

We use a TCP connection (port 23) to send commands.
When first used, everything works normally: reading and writing, TCP session opening/closing.
The connection can be restarted several times without any impact.
Likewise, the Ethernet cable can be disconnected and reconnected, and the connection restarted easily.

After several days of operation (sometimes more than a week) :
We can no longer establish TCP communication.
Pinging is functional from the PC.
The web interface remains accessible and allows control.
TCP connection (port 23) is refused (e.g. PuTTy, our dedicated software or other equivalents).
Unplugging/plugging the network cable or closing the software on the PC does not change anything.
The only way to re-establish communication is to switch OFF/ON, followed by a restart of the PC software.

Do you have any idea what the problem is?
Is there a way to restart the TCP/IP connection in the event of inactivity, or to clean up the TCP/IP stack?
How does the TCP/IP stack behave in the event of inactive or badly closed connections?

Thanks in advance.

Since your HTTP server is unaffected and pings work, we know that the TCP/IP stack is still working.

Can you share your state machine for managing the telnet socket? My guess is that something is happening and the socket is stuck in limbo. It thinks it’s connected, but it isn’t, and there isn’t any code to detect that state and reset the socket.

The sock_established() API should tell you when you’ve lost a connection. At that point, you can call tcp_listen() on the socket again to accept inbound connections. Samples/tcpip/telnet/rxsample.c is a simple sample demonstrating a state machine to reset the socket to listening status after losing a connection.

If you can run your program from the Dynamic C debugger and reproduce the failure, you could stop program execution and look at some of your state variables, including the socket structure, to possibly identify what’s going on. Or if you have a serial debug output, you could have a routine that dumps as much state information as possible and maybe trigger that from an unused input on your hardware.

The tcp_Socket structure has elements related to tracking keepalive packets, and it should detect when the remote end has disappeared without properly closing the socket. But it looks like you need to manually enable that for the socket, using the tcp_keepalive() API. I think that enabling keep alive messages will probably resolve the issue, and you should start there. Make sure you can reproduce the failure so you know whether your changes actually fix the problem.

For testing, you might be able to reproduce this failure by breaking the network connection between the Rabbit and the client PC while the connection is open.

You could also make use of the pd_havelink() API to check for network disconnections, and automatically reset your telnet socket in that case. That’s probably overkill in this situation.