Thank you for replying, it is much appreciated. I should have indicated in my original post that the unexpected behavior can NOT be reproduced by rebooting the Digi device from the web interface. While I am not 100 % certain about the underlying mechanism here, it seems that when you reboot via the web interface or CLI the Digi device signals the driver that it is going down. If I am monitoring serial output through cu when the Digi device reboots I get a message saying “Got hangup signal. Disconnected.”. Communication between the driver and Digi device is then properly restored once the Digi device comes back up.
To reproduce the unexpected behavior you need to interrupt network connectivity between the Digi device and the server without rebooting any of the involved devices and without shutting down/disabling any network interfaces directly on the either device.
For instance, if I am running the driver on a CentOS VM under VMware ESXi and interrupt network connectivity by disconnecting the VM’s virtual network card (pretty much equal to pulling out the network cable from a real NIC), the driver behaves as expected and reconnects once the NIC is reconnected.
However, if I instead interrupt network connectivity through firewall rules or simply shutting down the WAN interface on the router that the server uses to get to the internet (and thus the Digi device that is on a mobile broadband connection), I get the unexpected behavior where the tty device becomes unusable until re-initialized. If I’m watching serial output through cu when this happens I do NOT get the message saying “Got hangup signal. Disconnected.”, cu just hangs until I kill it. Restarting cu and attempting to read from the tty device again causes it to immediately hang again.
The tty device breaks also if nothing is actively reading from/writing to it when the network connection drops. For instance, I can open up cu, see serial data come in, shut down cu, interrupt network connectivity between the server and Digi device, restore it, and cu will then block forever if I attempt to read from the affected tty device. Same goes for the application we’re using to gather serial data. It does not matter if it was running or not at the time when the network connection dropped, once it attempts to read from a tty device that has had a network drop it simply hangs.
Since all our Digi devices are on mobile broadband connections, and most in areas with mediocre cellular coverage, they do lose internet connectivity every now and then. Every time this happens we have to go in, shut down the application that is gathering serial data as the tty device cannot be re-initialized while that application is blocking, re-initialize the device with dgrp_cfg_node and restart the monitoring application. Automating the re-initialization would be a possibility, but it is pretty painful to implement in practice since we have to kill the monitoring application every time it happens.
Under Windows the driver reconnects just fine no matter how the network connection is interrupted, which is the behavior that I believe is intended. I have tried interrupting the Windows machine’s internet connectivity while monitoring serial output from the Digi device, and the output picks up within seconds of internet connectivity coming back. Doing the exact same thing under Linux breaks the tty device until it is re-initialized.
The monitoring application is written in Python, so for now we’re simply moving it and the Digi driver to a Windows server. I would really like to get this resolved, though, as the database that the monitoring application is dumping data to is running on a Linux server and I’d rather just keep everything on Linux.