Abort Exception, customizeExceptionHandler not being reached

I’m having a strange issue with Digi Connect ME. After some amount of running time, the following message is displayed in the STDIO Serial Console:

Abort Exception
CONTEXT: THREAD
Thread Name:

This message gets displayed over and over again (like it’s in a loop) and then the board reboots, presumably due to the watchdog not being serviced.

I’ve set a breakpoint in customizeExceptionHandler, but the breakpoint is never reached.

Since I can’t examine the registers, I have no idea where this abort is occurring.

Is there a reason that customizeExceptionHandler would not be called?

Hello

My initial guess would be that memory is so corrupted that you are not getting to customizeExceptionHandler

Here is what I do in such situations:

In src\bsp\init\arm7\INIT.s set breakpoints at the following places:

Undefined_Handler
Prefetch_Handler
Abort_Handler
Address_Error_Handler
crash

If you are developing in the IDE (Digi ESP) you may have to load INIT.s into ESP as follows (In the debug perspective)
File->Open File
IN the file widget find INIT.s (located described above).
When loaded into the debugger set breakpoints as normal.

Hopefully that will help you track down the bug.

One additional thought I had. I wounder if you somehow crunched the exception vectors. They are down at the beginning of RAM. Around address 0. If those are crunched you will get weird crash results. Look for an underflow of a buffer, string or other that could have wiped out the exception vectors.

Thank you for the responses. Using the GDB console, I set breakpoints at those places, but I’ve I didn’t do it in INIT.s.

After I posted the original question, I “dumbed down” my software a bit in that now, I’m basically calling recvfrom on a non-blocking UDP socket every 10ms. It’s still crashing, but seems to be providing better information.

I will setup the breakpoints as you’ve suggested and see if I can get get some more useful info.

In the meantime, though, is there a known problem with non-blocking UDP sockets?

Here is the initialization code I’m using:
int optval=1;
udpsock=socket(AF_INET,SOCK_DGRAM,0);
if(udpsock==-1)
{
printf("udpInit: socket failed
");
return;
}
if(setsockopt(udpsock,SOL_SOCKET,SO_NONBLOCK,(char *)&optval,sizeof(optval))==-1)
{
printf("udpInit: setsockopt SO_NONBLOCK failed
");
return;
}
if(setsockopt(udpsock,SOL_SOCKET,SO_BROADCAST,(char *)&optval,sizeof(optval))==-1)
{
printf("udpInit: setsockopt SO_BROADCAST failed
");
return;
}
addr.sin_family = AF_INET;
addr.sin_port = udpPort;
addr.sin_addr.s_addr = INADDR_ANY;
if (bind (udpsock, (struct sockaddr *) &addr, sizeof (struct sockaddr)) < 0)
{
printf("udpInit: bind failed
");
return;
}

Nevermind, I had a “dense moment” about the breakpoints. Sorry.

To answer your question, there are no KNOWN problems with non-blocking UDP sockets.

A couple of things to be mindful when using non-blocking sockets:

I’ll assume you are using select calls before any reads/writes to the non-blocking sockets.

Presumably you have a timeout on the select, in case there is nothing to do.

If you get an error but it is either EWOULDBLOCK or EAGAIN, that means there is noting to do, for now. Any other error is a real error.

Presumably you are in a infinite loop performing a select and a ISSET on the nonblocking socket. If it times out or you get either EWOULDBLOCK or EAGAIN, perform a small delay (tx_thread_sleep(NS_MILLISECONDS_TO_TICKS(some ms delay)):wink: before calling continue and returning to the beginning of the infinite loop. (in case you need it NS_MILLISECONDS_TO_TICKS is defined in bsp_api.h).

BTW for completeness, what module and what version of NET+OS are you using?

I’m using a Digi Connect ME, running Net+OS 7.5.

Actually, no, I’m not using select. I’m simply calling recvfrom periodically. Every 100ms via a timer callback. The timer was created with the following code:

status=tx_timer_create(&amp;udp_timer,"my_udp_timer",udp_timer_func,0x00,NS_MILLISECONDS_TO_TICKS(100),NS_MILLISECONDS_TO_TICKS(100),TX_AUTO_ACTIVATE);

tx_timer_create does not get called until the udpInit function has completed successfully.

The previous crash info looks like it may have been in the NET+OS timer calls. Following is from my image.map file:

0x0016837c                _tx_timer_system_clock_hi

Is there, maybe, a problem with calling recvfrom on a timer like this?

I have a breakpoint set in customizeExceptionHandler at the point where naCustomizeExceptionHandlerClearToContinue = TRUE;. However, it appears as though breakpoint is being disabled!

This is being shown in my GDB server log whenever the breakpoint is reached. customizeExceptionHandler is at 0x00019b38 in image.map.

…Breakpoint reached @ address 0x00019B4C
Reading all registers
Removing breakpoint @ address 0x00019B4C, Size = 4
Removing breakpoint @ address 0x00018544, Size = 4

I hope you are NOT calling recvfrom from within the timer’s function. That would be very bad.

Well, the timer function calls into my “work” function which does call recvfrom. If that’s wrong (why is that, BTW?), then what is the right thing to do? Should I be calling it from a thread? Should I be using a mutex to control reentrancy of the work function?

This comment was the key. As you pointed out, I should not have been calling recvfrom from a tx_timer_function. Instead, I created a thread with the appropriate amount of stack space, and put the calls to my “work” functions there. Viola. No crash. Thank you so much!!!

Hello

When you are running at interrupt context (timer functions, ISRs ....etc) there are a number of things you must not do. One is call ANYTHING that might have a delay. Now, I know you are calling recvfrom on a non-blocking socket, but that is non-blocking from the socket perspective. That says nothing about how recvfrom is actually implemented. I am sure there are mallocs, or other calls within the tcp/ip stack that are inappropriate for use within interrupt context.

So what SHOULD you do? For example have a separate thread, in a loop, that waits on a global flag, for example. Have the timer task set the global flag to 1 to tell the separate thread to perform a recvfrom. Have the separate thread set the flag back to 0 (zero).

You could also (for example) dispense with the timer task, and use select along with a timeout to control your recvfrom calls. If you are unfamiliar with the select call, there are a ton of web sites that describe its use. Generally a non-blocking I/O call is called form within a select/FS_ISSET pair.

But to revert back to the beginning, I am super glad to hear that we solved your crash issue. I, for one, hate hearing about crashes.

If you look in the NET+OS kernel user’s guide in the chapter entitled NET+OS Kernel Managed Interrupts is the list of kernel calls that are allowed at interrupt context. Generally speaking calling anything else should be considered forbidden. As stated earlier, to have your interrupt/timer task cause something to happen, you should have the service routine signal to another thread to actually perform the work. Do not do the work in the service routine.

Good luck.

One additional item. 99% of what I stated is true. It turns out that the timers are not running at interrupt context but they are run from the timer task, which run with an extremely small stack. Thus calling anything that will call in lots of stuff, such as calls to the C library or socket calls are a bad idea. Best recommendation is to treat such calls as though they ran in interrupt context. Do as little as possible while in the timer task, and have a separate thread with additional heft perform the actual work.

Thank you! This answered the question, precisely. I wasn’t aware that the timer tasks allocated only a small amount of stack. It makes total sense to me that the timers should be considered “close” to an interrupt. I just had not made the mental connection between tx_timer_create and an interrupt.