On a socket connection, Windows is randomly generating a bad packet (bad TCP checksum), then retransmits with a malformed packet that is too small. When this happens, the ME 9210 (NetOS) locks up and dies.
Has anyone seen this before?
I recoded with a select() that will throw out any non-wanted or mis-sized packets. I hope that will work, but I am not holding my breath.
I’m wondering whether one of our customers, with a CC9P 9215-based unit, is seeing this. He’s pinging our unit continuously (presumably at the Windows standard rate of about once/second), and the unit randomly locks up - within about 5-10 minutes max. We use a select() on our own code, but no knowledge of what Digi might do.
Was the Windows machine attempting to talk to the ME9210 that locked?
As in most sockets-based APIs, the select call uses some resources. As in many real-time embedded systems, resources are limited. If you are calling the select API, in any kind of a loop and if on error or timeout the select could be called continuously ensure that you have a small delay before your code calls the select API again (trap for timeout, delay, go back to head of loop). This can give the stack time to recover the resources.
Well, so far so good. It has lasted a week on four units without locking up.
I am calling select() with a 10 second timeout. Then I am calling it with a 50ms timeout to get the rest of the data. I haven’t had any select errors, but I have seen several recv() errors. It recovered nicely.
I am also running this from an accept() with multiple threads. It is nice that NetOS fixed the thread safe issues with NetOS 7. (NetOS 5 and 6 were not thread safe with the network API calls.)
SteveD, yes it was a Windows machine. I used WireShark to watch the network packets, and Windows is messing up and sending a bad packet or two. The standard recvfrom() and select()/recv() loops couldn’t handle it. I recoded to do a double select()/recv() loop with a quick timeout and throw away of the data. The loop also throws out oversized packets. Works a lot better.
A note, NOTHING responds well to a constant ping. That is a common part of a DOS [Denial Of Service] attack. I would change the Windows App to ping only every 5-10 seconds or so.