SNTP Error recovery

Has anyone established the default error handling process when none of the configured SNTP time servers are available?

At the moment I’ve implemented RDeabill’s suggestion of restarting the SNTP process on server timeout. It does work, but can clog up the network a bit - on my LAN it restarts five times/second.

I think its usually the case that ‘server not available’ indicates a network problem (in my case, a cable fallen out), so we don’t really want to try and reach the server too frequently. Equally, we don’t (necessarily) want to wait a whole synchronisation interval before retrying. For my application, I would go for a retry interval in the range 1-10 minutes.

Any thoughts on how to control the retry process a bit?

(Using Net+OS 7.1 ATM; moving to 7.3/7.4 shortly)

Hi,

I have not really looked at what happens when the restart does not work, but I would think a solution here is to use a ThreadX timer.

When initialising the module with the SNTP routines create the timer but do not start it. Set up a simple timer function to do the actual restart. In the callback routine when you get the timeout start the timer rather than the restarting SNTP directly. When the callback routine indicates success disable the timer.

I have not checked this but it’s one way you may be able to do it.

Just had another go using Net+O/S 7.4 - no change from my previous post.

Seems to be no way to prevent SNTP either flooding the network connection, or locking up completely, in the event of a network or time server error.

If you want to delay the retry process, just put a tx_thread_sleep() in your callback before you restart the SNTP process. I.e. something like:

if(status = NASNTP_SERVER_TIMEOUT)
{
tx_thread_sleep(NABspTicksPerSecond * 60 * 10); /* ten minute delay */
NArestartSntpServer(priAddr, secAddr);
}

Finally got to play with this, and initial results not looking too promising.

Testing by simply unplugging the network cable. The ME duly recognises that its lost the network connection.
Next SNTP sync request fails, and I start a timer in the callback routine.

When the timer expires I try to restart the SNTP task (using same time servers) - it returns with error -1 (NASNTP_INVALID_STATE).

Everything then locks up (possibly not helped by the absence of retries in my code). Even when the network connection is restored, SNTP doesn’t appear to restart.
I’ve been using quite fast times for testing - 2-minute sync interval and 1-minute retry - and nothing happens even after an hour or two.

Anyone else tried something similar?

I believe the correct way to handle this is as follows:

In the callback routine, return success (even though the status sent to the callback routine told you that no sntp server is available). This tells the sntp thread to keep trying to find a server.

Also, there have been remedial changes made in this area. make sure you have the latest patches from Digi’s web site.

Been playing with this some more, and not getting very far!

If you return NASNTP_SUCCESS from the callback when the server’s not available, the SNTP handler immediately retries - exactly the condition I’m trying to avoid!

I also tried returning a negative value from the callback, which the docs say stops SNTP. I also triggered a timer, and on expiry attempted to start SNTP (‘start’ seeming logical, since it should have stopped). Gives a NASNTP_SYSTEM_FAILURE error. If I try to restart the SNTP server after the timeout, I get a NASMTP_INVALID_STATE error instead.

Still using NET+O/S 7.1 (with latest patches) ATM - anyone got something like this to work yet?

Works nicely thankyou.

(I tend to think of a ‘callback’ as something that should happen instantly, and not 10 minutes later!)