I’m working with an application on a DigiConnect ME with Net+OS 6.0. The DigiConnect is connected to my board, and is a standard module (no jtag). I am getting an exception in my code when I run several threads that I cannot for the life of me find. If I run a single thread with some dummy network traffic, along with the naftpapp, it runs ok. As soon as I add a second thread that monitors the serial port and does some further network traffic (gets data from com port, sends to server, gets response from server, sends to com port) I will get sporadic crashes. If I enable the advanced web server, the crashes are more frequent. If I add a server (get messages from network, send to com port, get response from com port, send to network), then it crashes every time I try to get data off of the network (using recv). I believe this is in part due to the fact that the network code in 6.0 is non-reentrant. But, because these are older ME modules, I can’t upgrade to 6.3 where this is fixed. Also, I have no control over the FTP or AWS networking code, since the source for these is not provided.
I’ve modified the exception handler to use syslog (taken from the weather app demo) to send a message that includes the exception number (usually 4, sometimes 3) to my syslog server. This doesn’t tell me where the crash occurred, however. Is there any way to examine the call stack at the time of the exception so I can include this information in my syslog message? Even if it is just the addresses of the call stack, I can use the map file to figure out what routine was executing when the exception occurs, which is some help.
Or is it likely that I will simply be unable to run multiple network threads on these older modules under 6.0?
One thing I’ve thought of is making one thread handle all network traffic, and using queues to pass that to to all of the other threads, but this would require a major redesign of the application, and I’d like to avoid it, especially since I have no idea if that will even solve the problem.
I don’t think your problem comes from your 6.0 version. I use NETOS6.0f and it works fine with several threads, several TCP/IP and FIP stacks, WEB server…
Maybe the bug is more trivial?
It may be, but it seems to always crash in recv(), and that’s not my code. Also, I’m using non-blocking sockets, which may be different. I needed to do that because of the possibility of the host system being down caused the connect() call to take three minutes to timeout, which was way, way too long.
I’m really having a tough time, because each piece works fine on its own, it is only when they are all put together that there are problems. There is no interaction between the threads other than a semaphore to restrict access to the com port.
The application is rather big, to my way of thinking (over 300K when all pieces are included). Is there an easy way to figure out if perhaps I’m using too much ram/rom/stack which might be causing the problem?
I tried using tx_thread_identify() and tx_thread_info_get() in my exception handler, and this indicated that the thread named “ace thread” was running, but I don’t know if that is because that thread handles the exceptions or if that is the thread that caused the exception. The tx_thread_identify indicates that if called in an interrupt handler it is the thread that was interrupted, but I don’t think an exception handler is quite the same as an interrupt handler.
this does not mean that the recv() routine has a bug in it. it could be that your code has a bug and writes to some memory parts which are not reserved for you, and then the NET+OS routines crash.
I didn’t necessarily mean it did, but if I could see what memory it was accessing, then perhaps I could see what I might be stomping on, if in fact I am.
I assume that your threads transfer their datas to each other. Do you use the same buffers for all the threads (for example a global structure) ? Or do you transfer the data between the threads by copy?
I had your problem (exception handler) when I had several threads trying to read and write in the same buffers. I corrected this by using buffers copy.
I use common buffers, but with a semaphore to insure that only one thread uses them at a time. I do not transfer data with queues.
Since every process revolves around serial port access, and the serial port access was protected by a semaphore that meant each thread had to operate sequentially for the largest part, I simply removed the threading and call what was the meat of each service loop in the main thread. Only AWS and FTP remain in their own threads (which I have no choice about). After doing this, I am no longer crashing. Basically, I had a bunch of threads with a “while (1)” loop before, now I have one “while (1)” with a call to each piece of code that was in the old ones, and nothing else changed. Since the code had to operate that way anyway, it is probably better not to have the threads anyway. I worry that the AWS and FTP processes will cause the same sorts of crashes, but so far they haven’t.