This question is targeted at determining a method of performing low level monitoring of the WiFi network “health” on a ConnectWiME-9210.
This would be similar to using the void GetEthStats(Stacktest_stats_t *stats) function if using a wired version.
History - We use both the ConnectME-9210 and ConnectWiME-210 products. We have had a number of occasions where the WiME shows all the correct indications of being connected (Amber light on Solid, wlan_stats showing good signal strength, SSID, IP address, netmask etc but all data activity is dead. No blinking green light, internal logging of connections show failure to access the internet (after reboot and viewing log files) and all other threads appear to be working correctly. Device may have been up for weeks and weeks or a few days.
A power cycle is required to correct the condition and it may not occur again for days for weeks or months. We have the watchdog timer enabled on the devices also, but whatever is happening does not interfere with the thread scheduler which would cause a reboot.
So as a intervening measure we would like to be able to come up with a method that can determine if the Redpine to Digi interface or some another pipe is no longer functioning correctly and reboot the device to restore network connectivity.
Ideally we would like to simply take down the WiFi driver and restart it as to not interfere with the other functionality and ongoing tasks running on the box. There is a Redpine reset function exposed however what is missing is the steps required to gracefully take the network down, reset the Redipine, and bring things back up.
The goal is not to reset things if we simply have no connection, but rather first determining that the feedback from the driver via the naWlnGetStatus function by examining the variables like wlnStatus.state, wlnStatus.rx_signal, wlnStatus.bss_addr[x], wlnStatus.ssid etc for a “valid” state, then examine data activity via IP data activity or other. If the two reported statistics conflict as to the state of the network then reset it. We would then log the event before the reset so that we will have history about the event etc.
We are still trying to determine if this is a hardware related issue or not as when units have been replaced in the field the problem seems to have been resolved. The catch is that we then set up the devices at our lab and they run without issue for weeks.
Another noteworthy piece is that the Wired 9210 devices never have this behavior running the same user software, but compiled for each version.
We are open to any and all advice and if anyone has any history using the Connect WiMe-9210 please share your experience.
I would much rather determine the cause of the issue than use the “wifi watchdog” approach as it tends to simply hide real issues and I am against that however reliability is foremost especially with devices that are being controlled remotely from great distances.
At a minimum a good 'hook 'in the data stream might provide the clues as to what is broken and where.
Thanks!
Brooks