Data flow stops after 30+ minutes

Hardware: xbee3 and grove PCB from Xbee Zigbee Mesh Kit, 802.15.4 FW with latest FW 2012.
Set-up: Simple uPy test code flashed into a radio end node and radio coordinator. End node test code just infinite loop using transmit() in a try-except sending some dummy data every 100ms. I am not doing any handling of errors in except, just a dummy variable assignment so an error doesn’t lock-up the uPY code. Coordinator in transparent mode with dummy variable assignment in infinite loop to keep radio running. Radios within a few feet on test bench, antennas attached, stable power supply for grove PCBs.
Settings: CH fixed (no A1,A2 auto channel), fixed ID, CA=0, CE set accordingly, A1=4 and A2=4 for auto association, sleep modes off, DH/DL and MY set properly. Coordinator always set as AP=0 and tried AP=0,1,2,4 on end nodes. Also ACKs enabled or disabled. In short no problem getting network up and running.

Problem: Data flows as expected from end node to coordinator but after 30 minutes to an hour (approximately) the data flow stops. The 30 minute mark is by far the most common. Repeated this test from power-up many times and is reproduceable. Press reset button on grove end node and up and running again and problem repeats and here the coordinator is never resetted ie button nor power-down.

Questions: In this overall set-up, is it expected to run longer than I’m experiencing before a glitch or since RF’ing is not perfect this is fairly normal? I understand I’ll need to handle errors in the except to keep things running many hours. Just looking to get settings in the best condition to reduce try-except errors due to “bad settings”. Any ideas on initial radio settings to test out? Playing with AP and ACKs and collision related settings seem to not change the outcome.
Is there something running in the FW (or uPY) that does “things” periodically on the many minutes time frame as a possible cause?
When AP is in one of the modes (0,1,2,4) does transmit() function in and of itself change the “frame” accordingly meaning say in AP=1 not escape frame is used but in AP=2 the escape framing is used.
Side question: When using autochannel in A1 and A2, as the network runs does the FW periodically change channels (say rescans periodically to stay on a “good” channel) or is it that once association/discovery is completed and autochannel selected the operating channel that all radios will remain on that channel.

Thank you a whole bunch as once figure this problem out my real project is ready to go. This is the only problem left. (yes real project problem identical and I narrowed it down to the test code here)

This is something that should be handled by submitting a case to Digi Support by creating a user account and logging into my.digi.com

Make sure when you do that you include all necessary steps to reproduce the issue including any python code you are running.

mvut,
Understand. Since almost the weekend is there anything you can comment on would be very thankful.

What radio is running your Micro Python running? It should have the AP set to 4.

Is it this node that is running the Micro Python app that is failing or a different one?

For the end node, I’ve tried all the AP setting seemingly with the same results however I’ll stay with AP=4 due to your comment as I do testing. The coord has always stayed with AP=0 but a question and remember that in coordinator infinite loop I am not using p = receive(), decode then put data on UART. Code is literally while: then x=1 under while. I thought with AP=0 FW takes care of RF IN and UART OUT. Will this same code work if set coord AP=4 or do I need to do receive() . . . Please comment on this.

Just now from testing pretty sure am getting “AI” 0x0c error (“end device failing to get an association request”) on the end node when data flow stops and l’ll look into that. Think we are closing in on figuring problem and think I can figure it out from here. Thanks mvut

mvut,
FIXED . . . down graded from FW2012 to FW200D and exact same code runs perfect ie FW2012 may have a bug. Some additional test info for you guys.
Test 1: 1 end node and 1 coord. Data flow stops in 1 hour (approx). Repeatable.
Test 2: 2 end nodes and 1 coord. Same code on ends. Data flow stops in 1/2 hour (approx). Repeatable.
Note: My bad in original post … 1/2 hour to 1 hour variation with only 1 end. I intermixed different setups and got my notes crossed.

The total amount of data transfer in Test 1 before data flow stops would equal the total data transfer in Test 2 before data flow stops meaning thru the coord. Points to a counting-like mechanism in FW2012 is my guess. A big thank you mvut. Project good to go.