Ethernet connectivity problem ME9210

Hi,
I have developed application which communicate over Ethernet and use SPI communication. This application is successfully working for aprox. a day. After some time it forgets its IP and it want communicate any more. When I re plug the Ethernet cable I can see in wireshark how the DIGI asks for its fixedIP. But the digi module is not responding any more on ethernet (no ping, no netosprog /discovery). The internal application is still communication over SPI, so its problem only on ethernet side.

Any ideas where may I search for bug?

Jirka

I had a similar problem and it turned out to be caused by overflowing the Ethernet TX queue. Apparently there isn’t any good size checking internally in the TCP stack.

I solved it by adding a select() check for TX empty before adding my packet to the queue.

-Erik

Interesting, Im going to test some sample application for long time stability.

Can you publish a sample for checking TX empty?

Jirka

When I get in Monday morning I’ll post some code. select() is standard though and there are plenty of examples out there.

-Erik

I know how to use select, but i dont know how to check ethernet TX queue.

Sorry about the formatting, but this new forum software is for the birds…



bool IsSocketReady(SOCKET sock)
{
    fd_set fdr, fdw;
    struct timeval tv;
        
    tv.tv_sec = 0;
    tv.tv_usec = 10;
            
    FD_ZERO(&fdw);
    FD_SET(sock, &fdw);
    int s = select(FD_SETSIZE, NULL, &fdw, NULL, &tv);
    if( s > 0 )
    {
         // Ready to go?
        if( FD_ISSET(sock, &fdw) )
        {
             // Send any data
            return true;
        }
    }
    return false;
}

I checked my modbus server and Im not using select. Im following the socket API example for modbus server. My app sending data over TCP only when the client is connected.

I find out that the module is working fine until I tried to connect to modbus server, after this the TCP communication hangups. The server thread is waiting for new connection on accept function so it is not sending any messages to TX queue.

Jirka

Did you manage to resolve this?

I am having a similar issue; my application has a TCP listening socket and also performs communications using the UART (currently using FIM). After around an hour, the module stops accepting any TCP connections; the UART side of things continues to operate normally. My server thread is stuck waiting for accept. FTP and Webserver threads are similarly stuck. I have noticed that when in this state if I disconnect and reconnect the network cable the module normally resets after attempting to get ip address through DHCP.

Does it accept multiple TCP sockets? You may be running out of sockets. Double check that you are closing the sockets when you are done. You need to close them even if the other end drops the connection.

-Erik

Well im after long investigation 90% sure that in my case it causes often calling naTimeZoneInstall function (for setting current time). I can confirm it in next few days. In my case the application fail after one day.

When I reconnect the Ethernet cable nothing happens, the module stays stuck. Helps only power cycle or reboot.

Jirka

My problem appears to be related to SNTP also. I had chosen to include this when creating the project initially but hadn’t got round to doing anything with it. In this state, the device stopped accepting any TCP connections after 1 hour.

Since adding:-
in bsp_sys.h
#define BSP_INCLUDE_SNTP TRUE

in appconf.h
#define APP_USE_NETWORK_TIME_PROTOCOL

and some operational time servers, the problem seems to be fixed; I have now had a module running without problem for around 2 days.

Are you calling a time server in your time callback? Make sure you are cleanly closing all the sockets.

I am not having any problem now. The problem appears to have been related to choosing to add sntp into the project (using the wizard) but then not actually using it. Once I added in the defines, mentioned in my earlier post, the problem appears to have gone away.

Previously, the device consistently stopped accepting new connections at almost exactly 1 hour of up time, irrespective of the amount of activity on the network.

I cant check it now, but i have disabled SNTP support - Im always start project prom empty basic sample. But it seems that there is some problem with time support.

Im my case I reading the RTC (in GMT) value via SPI and set time zone to 0 set new time to time structure and set back the timezone to +2 GMT wit daylight saving. Im doing this once per sec.

Will be good if some one from DIGI can check the implementation. I tried to check the code, but I i ended at the library function…

Well I can at 95% confirm that calling the naTimeZoneInstall causes my system hangup. Im going to run some independent test and if it confirms this problem Im going to post it to digi.

Jirka