Are the End-Devices fully awake? or do they sleep? Are you using AT or API modes?
End-Devices are not part of the Mesh. In fact, no one can talk to them directly. When they wake, they query one and only one ‘parent’ for pending messages. So they pull requests and push responses via the parent.
So the parent reserves several full sized buffers for every associated child - a dedcated resource lost. So the parent uses SN/SP to decide when the child has left/died/been replaced. By default this some short time like 320 msec, so a parent flushes the child away in less than a second of inactivity. If they sleep that is. If they are awake I think they are active every 100msec by default.
This might NOT be the problem, but it worth trying.
SN/SP settings I commonly use are SP=0x07D0 (which represents 20 seconds), then SN of 0x03 for 1 minute, 0x0F for 5 minutes, 0x2D for 15 minutes and 0xB4 for 1 hour. The only downside to using larger Sn/Sp in routers (meaning the X4 and other routers) is when you DO remove an end-device, it takes longer for the parent to free up the old resource.
Talking in parallel to 3 or 4 devices shouldn’t be a problem. trying to talk to 20 at the same instant would be a problem first because it congests the RF channel (bunches up messages), but also because an X4 won’t like running 20 threads at the same time.
My own designs I tend to decouple the request/response as much as possible. So I send a request and do NOT wait for the response. At some time in the future I see a response, which (if the protocol permits) I just treat as a magic unsolicated answer. I suppose you are using AT-transparent mode, which complicates multi-slave behavior.
With API mode, one could for example send a request to slave #1, then 100msec later send one to #2 and so on to all 20. At some point you’d have potentially 20 responses each tagged with the MAC address of the slave (in API mode). so the single task then just sorts the responses to match them up to the appariproate slave.
With AT mode, the most you can do is raise the baud rate (to reduce the serial shift time), plus keep the RO setting low. The Xbee shouldn’t really buffer data very long, but i don’t think there is a way to (for example) tell the Xbee to discard what it has, and that won’t be very useful because either 1) the buffer is complete and within msecs it will be sent, or 2) the serial devcie is still sending data and just flushing 1/2 of the message won’t stop the slave from finishing the response.