non-reentrant NetOS and threads

egawtry · May 27, 2004, 4:49pm

I have multiple threads running several copies of select() simultaneously (Telnet client, SSH Server, several internal pipes). I was just told by Digi that NetOS is not reentrant which is a very serious problem (I get 2-3 thread crashes a night). Anyone have experiance with this? I wonder how Digi does it with their canned routines?

-Erik

egawtry · June 11, 2004, 6:58pm

Hey Cameron,

Any word from Digi/NetSilicon on how the problem with reenterancy was solved with the canned FTP and HTTP servers?

-Erik

egawtry · June 20, 2004, 2:55am

I have a solution for this if anyone is interested.

-Erik

mikelis · June 22, 2004, 7:23am

Hey, I’m not sure if I have the same problem. But I think any peace of code would help me to understand how threads work.

What is reentrancy anyway?

mik

egawtry · June 22, 2004, 4:02pm

Reenterant code means that more than one thread can call it at once. Non-reenterant code will potentially hang if it is called by more than one thread at once.

This chunk of code makes select() reenterant, just call __select() in place of select().

(Note: This forum blocks out tabs, that is why this looks funny.)

/////////////////////////////////////

static BOOL _bSelectInitialized = FALSE;
static TX_MUTEX _mutexBuffer;

#ifndef howmany

define howmany(x,y) (((x)+((y)-1))/(y))

#endif

int qselect(int width, fd_set * readset, fd_set * writeset, fd_set * exceptset)
{
struct timeval tv;
int ret;

      // Make sure the mutex is initialized
if( !_bSelectInitialized )
{
	UINT status = tx_mutex_create(&amp;_mutexBuffer, "SELECTMUTEX", TX_NO_INHERIT );
	_bSelectInitialized = TRUE;
}

 // Wait for my turn
tx_mutex_get(&amp;_mutexBuffer, TX_WAIT_FOREVER);

tv.tv_sec = 0;
tv.tv_usec = 8;
errno = EAGAIN;
ret = select(width, readset, writeset, exceptset, &amp;tv);

 // my turn is up
tx_mutex_put(&amp;_mutexBuffer);

if( ret &gt;= 0 )
	errno = 0;

return ret;

}

int __select(int width, fd_set * readset, fd_set * writeset, fd_set * exceptset, struct timeval * timeout)
{
long countdown;
int ret;

if( timeout ) 
{
	countdown = timeout-&gt;tv_sec*1000;
	if( timeout-&gt;tv_usec &gt; 0 )
		countdown += timeout-&gt;tv_usec/1000;
}
else 
{
	countdown = 0x7FFFFFFF;
}

if( countdown &lt;= 10 )
{
	return qselect(width, readset, writeset, exceptset);
}
 // Loop in wait
fd_set	rs, ws, es, *prs, *pws, *pes;
int		fdsetsz = howmany((int)width, NFDBITS) * sizeof(fd_mask);
do
{
	 // Preserve the calling fd_set
	prs = pws = pes = NULL;
	if( readset )
	{
		prs = &amp;rs;
		memcpy(prs, readset, fdsetsz);
	}
	if( writeset )
	{
		pws = &amp;ws;
		memcpy(pws, writeset, fdsetsz);
	}
	if( exceptset )
	{
		pes = &amp;es;
		memcpy(pes, exceptset, fdsetsz);
	}

	 // quickie socket check
	ret = qselect(width, prs, pws, pes);

	 // Have something besides a timeout?
	if( ret != 0 )
	{
		 // copy 'em back
		if( readset )
			memcpy(readset, prs, fdsetsz);
		if( writeset )
			memcpy(writeset, pws, fdsetsz);
		if( exceptset )
			memcpy(exceptset, pes, fdsetsz);

		 // return it
		return ret;
	}

	 // wait 50ms to let other threads do their thing
	tx_thread_sleep(5);
	countdown -= 50;
}
while( countdown &gt; 0 );

if( readset )
	memcpy(readset, prs, fdsetsz);
if( writeset )
	memcpy(writeset, pws, fdsetsz);
if( exceptset )
	memcpy(exceptset, pes, fdsetsz);

return 0;

}

Message was edited by: egawtry

jwormsley · September 5, 2006, 9:33pm

How can this help if you are running the FTP server and HTTP server? You can’t control their calls to select, and thus they can cause all the same problems your own threads and connections can cause, can they not?

egawtry · September 8, 2006, 3:18pm

No, I meant that it solves the user reenterant code.

According to Digi, the reenterant problem is fixed in version 6.3 of NetOS. My solution is for 6.0. I am testing 6.3 when I get time (porting my app to 6.3). If there is still a problem I will post it.

-Erik

jwormsley · September 11, 2006, 1:20pm

Ah. I have 6.3 as well, but a bunch of old modules that I can’t use with 6.3. So I’m stuck with 6.0 at the moment.

compie · June 29, 2007, 11:50am

The documentation for select() (NET+OS 7.1) says this:
“If two tasks attempt to use select on the same socket for the same conditions, an error occurs.”

So can I conclude that the select() in NET+OS 7.1 is reentrant?

charliek · June 29, 2007, 1:34pm

That just goes back to the whole ‘You can use two sockets in two different threads at the same time’. Select itself should be re-entrant (i.e. using two different groups of sockets, one for each thread).

egawtry · June 29, 2007, 2:54pm

From that quote, it appears that the problem is still there.

-Erik

compie · July 2, 2007, 10:55am

I think you are allowed to use select() from two different threads on the same socket, as long as you don’t wait for the same condition. So one thread could wait for reading and another for writing. That would be ok, according to the documentation.

Topic		Replies	Views
Reentrancy of NetOS API NET+OS	1	412	February 7, 2006
Problem with select() NET+OS	5	567	January 16, 2006
Digi crashing when bad packet received NET+OS	3	765	August 24, 2010
Is the TCP/IP stack re-entrant (thread-safe)? NET+OS	2	1205	July 10, 2007
select error netos NET+OS hardware	3	656	September 1, 2004

non-reentrant NetOS and threads

define howmany(x,y) (((x)+((y)-1))/(y))

Related topics