Data flow stops after 30+ minutes

Hardware: xbee3 and grove PCB from Xbee Zigbee Mesh Kit, 802.15.4 FW with latest FW 2012.
Set-up: Simple uPy test code flashed into a radio end node and radio coordinator. End node test code just infinite loop using transmit() in a try-except sending some dummy data every 100ms. I am not doing any handling of errors in except, just a dummy variable assignment so an error doesn’t lock-up the uPY code. Coordinator in transparent mode with dummy variable assignment in infinite loop to keep radio running. Radios within a few feet on test bench, antennas attached, stable power supply for grove PCBs.
Settings: CH fixed (no A1,A2 auto channel), fixed ID, CA=0, CE set accordingly, A1=4 and A2=4 for auto association, sleep modes off, DH/DL and MY set properly. Coordinator always set as AP=0 and tried AP=0,1,2,4 on end nodes. Also ACKs enabled or disabled. In short no problem getting network up and running.

Problem: Data flows as expected from end node to coordinator but after 30 minutes to an hour (approximately) the data flow stops. The 30 minute mark is by far the most common. Repeated this test from power-up many times and is reproduceable. Press reset button on grove end node and up and running again and problem repeats and here the coordinator is never resetted ie button nor power-down.

Questions: In this overall set-up, is it expected to run longer than I’m experiencing before a glitch or since RF’ing is not perfect this is fairly normal? I understand I’ll need to handle errors in the except to keep things running many hours. Just looking to get settings in the best condition to reduce try-except errors due to “bad settings”. Any ideas on initial radio settings to test out? Playing with AP and ACKs and collision related settings seem to not change the outcome.
Is there something running in the FW (or uPY) that does “things” periodically on the many minutes time frame as a possible cause?
When AP is in one of the modes (0,1,2,4) does transmit() function in and of itself change the “frame” accordingly meaning say in AP=1 not escape frame is used but in AP=2 the escape framing is used.
Side question: When using autochannel in A1 and A2, as the network runs does the FW periodically change channels (say rescans periodically to stay on a “good” channel) or is it that once association/discovery is completed and autochannel selected the operating channel that all radios will remain on that channel.

Thank you a whole bunch as once figure this problem out my real project is ready to go. This is the only problem left. (yes real project problem identical and I narrowed it down to the test code here)

This is something that should be handled by submitting a case to Digi Support by creating a user account and logging into my.digi.com

Make sure when you do that you include all necessary steps to reproduce the issue including any python code you are running.

mvut,
Understand. Since almost the weekend is there anything you can comment on would be very thankful.

What radio is running your Micro Python running? It should have the AP set to 4.

Is it this node that is running the Micro Python app that is failing or a different one?

For the end node, I’ve tried all the AP setting seemingly with the same results however I’ll stay with AP=4 due to your comment as I do testing. The coord has always stayed with AP=0 but a question and remember that in coordinator infinite loop I am not using p = receive(), decode then put data on UART. Code is literally while: then x=1 under while. I thought with AP=0 FW takes care of RF IN and UART OUT. Will this same code work if set coord AP=4 or do I need to do receive() . . . Please comment on this.

Just now from testing pretty sure am getting “AI” 0x0c error (“end device failing to get an association request”) on the end node when data flow stops and l’ll look into that. Think we are closing in on figuring problem and think I can figure it out from here. Thanks mvut

mvut,
FIXED . . . down graded from FW2012 to FW200D and exact same code runs perfect ie FW2012 may have a bug. Some additional test info for you guys.
Test 1: 1 end node and 1 coord. Data flow stops in 1 hour (approx). Repeatable.
Test 2: 2 end nodes and 1 coord. Same code on ends. Data flow stops in 1/2 hour (approx). Repeatable.
Note: My bad in original post … 1/2 hour to 1 hour variation with only 1 end. I intermixed different setups and got my notes crossed.

The total amount of data transfer in Test 1 before data flow stops would equal the total data transfer in Test 2 before data flow stops meaning thru the coord. Points to a counting-like mechanism in FW2012 is my guess. A big thank you mvut. Project good to go.

mvut,
CHANGE - I needed to go back to FW2012 because need it’s 7db improvement hence I have this ‘end nodes disassociation after so many minutes’ issue to deal with. I’m going to assume there’s WDT or counter-like schemes or automatic channel shifting or the like within the FW that is common knowledge that I don’t have and I am suppose to handle periodically in my uPy code and if that is the case can you tell me what or how do that or what that might be? Am I suppose to implement uPy garbage? The code is simple in-line with simple math with ‘machine.I2C() and Transmit()’ being the only ‘function calls’.

My application and it’s uPy code in radio is as basic as possible. End node is ‘sensor–>I2C–>do sensor calibration calculations -->Transmit() and takes 40-50msec rate for this process to complete’, ACKs disabled (helps reduce collisions. Occasional or even somewhat frequent missing data has zero bearing for me. It’s a real-time application. Coord just needs to do ‘RF IN → transparent mode UART → FTDI IC uart to USB → PC computer’. I have tried lots of radios configurations with no luck. I appreciate any ideas. Thanks

I would suggest looking at the Micro Python I2C example.

Yes you should use the GC module.

mvut,
Below is short uPy slave and master test code flashed into the radios showing ‘Hello’ data flow from slave to master once per second. It will run for first 30+ minutes then freeze. Let it stay frozen for another 30+ minutes and data flow will resume. The only requirements is FW2012 and master AP=0 (because that code is written to work that way). Any other valid radio configurations that establish a network and slave association, the test results are the same. Maybe someone can try the code themselves and provide some insights. I have found no viable work-around. This test code is a resemblance to actual project code exhibiting the same exact behavior. Thanks mvut.

Coordinator/Master

import xbee, time

def network_status():
return xbee.atcmd(“AI”)

while network_status() != 0:
time.sleep(0.5)

node_list =
while len(node_list) == 0:
node_list = list(xbee.discover())

while True:
pass

Slave/End

import xbee, time

def network_status():
return xbee.atcmd(“AI”)

while network_status() != 0: # connect to network
time.sleep(0.5)

while True:
try:
xbee.transmit(xbee.ADDR_COORDNATOR,‘Hello\n’)
except Exception as err:
pass
time.sleep(1) # 1 sec delay

May I suggest you submit a case for this issue by logging into my.digi.com. When doing so, make sure you provide everything needed to reproduce the issue including any code and settings you may be using.