Deployed mesh network seems to be dropping messages

We are troubleshooting a deployed IoT application that is installed in a variety of sites like high-rises and closely clustered buildings. The IoT devices monitor values like temperature and publish this data through a mesh network when it has changed sufficiently (push) or if queried (polled). These IoT devices also receive commands like alarm setpoints or OTA updates. A gateway device is also installed at each site to monitor the network and relay data from the mesh network to and from the cloud. The mesh network is driven by XBee Pro S2C chips and their drop-in replacement XB3-24Z.

At small deployments (1-20 IoT devices per gateway) the system runs as expected. As we scale more IoT devices (50-150 IoT devices per gateway) we start observing inconsistent behaviour.

  • The IoT devices are polled every 5 minutes and their responses are stored in a cloud database. On inspection of the database, there is less data than expected (it could be as little as 1% of expected responses are received)
  • When cloud commands are sent to the remote IoT devices, they are not consistently received.
  • We have connected the gateway’s coordinator XBee to XCTU and allowed it to scan the network for about 15 minutes (after which it was halted due to time constraints). Not all IoT devices (~100) were discovered after this scan.

Questions
-Is it expected to take more than 15 minutes for an XCTU network discovery to find a mesh with ~100 nodes? What is a reasonable time estimate needed to fully discover a mesh network using XCTU and an XBee gateway?

  • It seems like messages are sporadically being dropped causing missed commands and less poll responses. Is it possible there is an issue with network that is causing messages to get dropped?
    • What happens if multiple IoT devices send push messages at the same time?
    • What is a reasonable time for a poll to take (both time for the request to arrive at the IoT device and for the IoT device’s response to be sent)? If there are multiple outstanding polls on the network can this cause a message collision?
    • Any other suggestions why we would see this behaviour?

This is something that really should be discussed with Digi support.

Separate to this forum post, we did reach out to Digi and have some other information:

Q: Is it expected to take more than 15 minutes for an XCTU network discovery to find a mesh with ~100 nodes? What is a reasonable time estimate needed to fully discover a mesh network using XCTU and an XBee gateway?
A: Depending on what settings you have, the amount of traffic on the network and interference, it may take up to 24 hrs for all of the nodes to respond to a Node Discovery.

Q: It seems like messages are sporadically being dropped causing missed commands and less poll responses. Is it possible there is an issue with network that is causing messages to get dropped?
A: Yes it is possible that this network is not optimized properly causing connection transmission issues. Options to help with this include:
- Adding additional routers may help.
- Ideally, a site survey of the location should be run before system deployment.
- For network’s that are already deployed, using a Zigbee sniffer in different locations can help see how much traffic is occurring is helpful.
- Also using the Network map function at a time in which the network is quiet can be helpful in determining which routes are available for the end devices to send the data.

Q: What is a reasonable time for a poll to take (both time for the request to arrive at the IoT device and for the IoT device’s response to be sent)? If there are multiple outstanding polls on the network can this cause a message collision?
A: If multiple devices try to send data at the same moment in time and if they are in close proximity to each other, a data collision can occur resulting in a random back off and retry of the data or loss of the data.

Answers for remaining questions require additional context. Digi support can be used to help with this but it is considered a large network troubleshooting and is a paid support function

1 Like