non-reentrant NetOS and threads

I have multiple threads running several copies of select() simultaneously (Telnet client, SSH Server, several internal pipes). I was just told by Digi that NetOS is not reentrant which is a very serious problem (I get 2-3 thread crashes a night). Anyone have experiance with this? I wonder how Digi does it with their canned routines?

-Erik

Hey Cameron,

Any word from Digi/NetSilicon on how the problem with reenterancy was solved with the canned FTP and HTTP servers?

-Erik

I have a solution for this if anyone is interested.

-Erik

Hey, I’m not sure if I have the same problem. But I think any peace of code would help me to understand how threads work.

What is reentrancy anyway?

mik

Reenterant code means that more than one thread can call it at once. Non-reenterant code will potentially hang if it is called by more than one thread at once.

This chunk of code makes select() reenterant, just call __select() in place of select().

(Note: This forum blocks out tabs, that is why this looks funny.)

/////////////////////////////////////

static BOOL _bSelectInitialized = FALSE;
static TX_MUTEX _mutexBuffer;

#ifndef howmany

define howmany(x,y) (((x)+((y)-1))/(y))

#endif

int qselect(int width, fd_set * readset, fd_set * writeset, fd_set * exceptset)
{
struct timeval tv;
int ret;

      // Make sure the mutex is initialized
if( !_bSelectInitialized )
{
	UINT status = tx_mutex_create(&_mutexBuffer, "SELECTMUTEX", TX_NO_INHERIT );
	_bSelectInitialized = TRUE;
}

 // Wait for my turn
tx_mutex_get(&_mutexBuffer, TX_WAIT_FOREVER);

tv.tv_sec = 0;
tv.tv_usec = 8;
errno = EAGAIN;
ret = select(width, readset, writeset, exceptset, &tv);

 // my turn is up
tx_mutex_put(&_mutexBuffer);

if( ret >= 0 )
	errno = 0;

return ret;

}

int __select(int width, fd_set * readset, fd_set * writeset, fd_set * exceptset, struct timeval * timeout)
{
long countdown;
int ret;

if( timeout ) 
{
	countdown = timeout->tv_sec*1000;
	if( timeout->tv_usec > 0 )
		countdown += timeout->tv_usec/1000;
}
else 
{
	countdown = 0x7FFFFFFF;
}

if( countdown <= 10 )
{
	return qselect(width, readset, writeset, exceptset);
}
 // Loop in wait
fd_set	rs, ws, es, *prs, *pws, *pes;
int		fdsetsz = howmany((int)width, NFDBITS) * sizeof(fd_mask);
do
{
	 // Preserve the calling fd_set
	prs = pws = pes = NULL;
	if( readset )
	{
		prs = &rs;
		memcpy(prs, readset, fdsetsz);
	}
	if( writeset )
	{
		pws = &ws;
		memcpy(pws, writeset, fdsetsz);
	}
	if( exceptset )
	{
		pes = &es;
		memcpy(pes, exceptset, fdsetsz);
	}

	 // quickie socket check
	ret = qselect(width, prs, pws, pes);

	 // Have something besides a timeout?
	if( ret != 0 )
	{
		 // copy 'em back
		if( readset )
			memcpy(readset, prs, fdsetsz);
		if( writeset )
			memcpy(writeset, pws, fdsetsz);
		if( exceptset )
			memcpy(exceptset, pes, fdsetsz);

		 // return it
		return ret;
	}

	 // wait 50ms to let other threads do their thing
	tx_thread_sleep(5);
	countdown -= 50;
}
while( countdown > 0 );

if( readset )
	memcpy(readset, prs, fdsetsz);
if( writeset )
	memcpy(writeset, pws, fdsetsz);
if( exceptset )
	memcpy(exceptset, pes, fdsetsz);

return 0;

}

Message was edited by: egawtry

How can this help if you are running the FTP server and HTTP server? You can’t control their calls to select, and thus they can cause all the same problems your own threads and connections can cause, can they not?

No, I meant that it solves the user reenterant code.

According to Digi, the reenterant problem is fixed in version 6.3 of NetOS. My solution is for 6.0. I am testing 6.3 when I get time (porting my app to 6.3). If there is still a problem I will post it.

-Erik

Ah. I have 6.3 as well, but a bunch of old modules that I can’t use with 6.3. So I’m stuck with 6.0 at the moment.

The documentation for select() (NET+OS 7.1) says this:
“If two tasks attempt to use select on the same socket for the same conditions, an error occurs.”

So can I conclude that the select() in NET+OS 7.1 is reentrant?

That just goes back to the whole ‘You can use two sockets in two different threads at the same time’. Select itself should be re-entrant (i.e. using two different groups of sockets, one for each thread).

From that quote, it appears that the problem is still there.

-Erik

I think you are allowed to use select() from two different threads on the same socket, as long as you don’t wait for the same condition. So one thread could wait for reading and another for writing. That would be ok, according to the documentation.