Why do I have random WR21 units with nigh identical settings refuse to tunnel?

I have an odd issue with my WR21 units that I am hoping someone can shed some light on. We have over 400 units currently setup (all reporting back to Digi VC7400) and we seem to always get a random unit that will stop creating a tunnel back to us for reasons we cannot understand. We have verified that (a) the settings (minus name and Local IP for VPN) are identical on the WR21 and the master device and (b) that the device was communicating but just stops at some point without us changing anything in the configuration. When I run the analyser on the device I see the following:

----- 14-11-2018 12:48:01.640 -----
IKE DEBUG: Locating PH1 SA with ID: initiator, and IP: xx.xx.xxx.xxxx

----- 14-11-2018 12:48:01.640 -----
IKE DEBUG (1): Found existing uncompleted PH1 session

----- 14-11-2018 12:48:01.640 -----
IKE DEBUG: No PH1 SA available

----- 14-11-2018 12:48:01.640 -----
IKE DEBUG: Unable to process SA request

----- 14-11-2018 12:48:01.640 -----
IKE DEBUG: Resetting IKE context 1

It seems like the only way(s) I have been able to correct this issue are to
(1) change the configuration (for example the algorithms used) on both the master and the WR21 and reboot or
(2) sometimes removing the IP address for the master unit from the WR21, rebooting it and then re-entering that IP will make it work.

Does anyone have any ideas as to why it keeps saying it has this uncompleted PH1 which then leads to no PH1 SA available? Is there something I can do (preferably from the Remote Management portal) to force the device to remove this uncompleted PH1 so it will start negotiations from scratch? Please note that this info shown above is what is shown to me even after removing the tunnel from the device, rebooting it and then adding back in the tunnel information (I.e. to me it should have removed everything to do with the old tunnel connection but that does not appear to be happening).


as you have a trace from the initiator you have a limited idea of what is wrong

the vc7400 would be the place to look.

make sure that the vc7400 configuration does not have any random white space characters in the configuration.

does the vc7400 show events logs when the router gets into this state.

on the remote end have you tried to switch off ipsec on the wan interface for a few mins then re-enable

ppp 1 ipsec off / on

have you tried to ask Digi support about this