Digi Connect EM crash issue

Hi There,

I just wanted to find out if anyone is experiencing a similar problem to me. I am using the custom module not the integration kit. Also I am experiencing these problems under NetOS 6.3 with GNU tools.

The problem I am having is that my application runs fine sometimes for 1 hour sometimes for 24 hours (overnight), however periodically it crashes and there is no indication as to what happened. For example, none of the LEDs are flashing, etc. When the crash occurs I am unable to ping the device or access any of the other networking related functionality.

I am reasonably confident I have ruled out the following issues (in my code at least):

  • Memory leak - I have checked and double checked that everytime malloc is called there shall be an equivalent call to free
  • Deadlock - the threads are structured in a very simple way and so I cannot see any place where deadlock can occur.
  • Referencing memory that has been freed - I set my pointers to null after use so I would expect a crash immediately if I was doing this

So far I have been unable to duplicate the problem in the debugger (JTAG module running on development board) although I have not put alot of effort into this partly because we have a different carrier board than the dev board with some additional peripheral devices

At the time when the crash occurs the module is not doing alot of work, i.e. very little or no network activity.

As a means to help with debugging I have integrated a modified version of the sample shell application into my application. Since doing this I have been unable to duplicate the problem although I only just did this today so it may still yet happen.

I have about six of my own threads running all with minimum 4K stack sizes. After running the “ps” command from the shell application I see that there are a total of 23 threads, most either suspended, sleeping or waiting for an event.

I have checked the available NetOS 6.3 patches and none of them seem to describe the exact problem I am having. I am reluctant to just try patching my netos distribution in case I introduce other potential bugs for no real benefit.

Has anyone experienced a similar problem?

Thank you in advance for any help.

Finding a crash can be difficult, even more so if you don’t have have the EM on the debugger. The two biggest (and sometimes the most difficult things to do) things you can do is to reproduce the problem consistantly and reproduce the problem while running the debugger.

What services are you running? It would be worth while reading through the patches to see if anything you’re using has been fixed. I.E. if you’re creating a lot of sockets, you’d definiately want to get the TCP/IP patch. On top of that you can try increasing your network heap (APP_NET_HEAP_SIZE found in appconf.h) as well as your thread stack size (in case it’s a stack overflow issue).

Thanks charliek. For the benefit of other readers of the forum, I did what charliek suggested and the problem was resolved.

Thanks.