ConnectPort X4 python app: "error: can't start new thread"

Hi guys,

I write a python program to implement functions in a ConnectPort X4 gateway as following:
(1) A TCP server for listening new clients connect request
(2) Once a new request detected, TCP server accepts it and start a new thread for this client socket connection. TCP clients send requests via independent sockets and the client handling threads process the requests and put them into ZigBee server (see (3)) request queue.
(3) a ZigBee server scan request queue (data put by TCP client threads) and forward the request to specified nodes in ZigBee network.

The reason I use multithread is to support multi-clients connected simultaneously. Because the client connection may be created and disconnected frequently (just like http connection), the TCP server will create a lot of threads after the system run for a couple of minutes. Then the error is comming:

Exception in thread Thread-1:
Traceback (most recent call last):
File “./threading.py”, line 442, in __bootstrap
File “D:\Users\joeytian\workspace\lms_cpx4_server runk\lms_cpx4_server\src ulip_tcp_server.py”, line 61, in run
File “D:\Users\joeytian\workspace\lms_cpx4_server runk\lms_cpx4_server\src ulip_tcp_server.py”, line 88, in listen
File “./threading.py”, line 416, in start
error: can’t start new thread

I think this may be caused by the improper thead termination when each TCP client connection closed. I write the code of the TCP client thread like this(just the run() shown here, self.close is set by TCP server thread when a disconnect empty byte received from listening socket):


    def run(self):
        # only check data rx/tx 
        self.close = False

        r_list = [self.c_skt]
        w_list = r_list

        while (not self.close):    
            try:
                s_read, s_write = select(r_list, w_list, [])[:2]
                
                # scan readable list
                # if(self.c_skt in s_read):  
                if(len(s_read) > 0):
                    self.__rx_handler__()
    
                #if(self.c_skt in s_write):   
                if(len(s_write) > 0): 
                    self.__tx_handler__()
                    
                #request process
                self.process_request()
                
            except:
                print "Thread quit because of exception!"
                raise

Another problem may be some resources are not released when this thread exits because they may be used by other thread (such as Zigbee server thread uses a reference of the TCP client thread variables).

So my question is simple but not easy to give a short answer:
How to terminate or quit from a thread?

I will hazard a guess that the ‘crashed’ thread doesn’t free up all resources. Just exiting the run() cleanly after clearing all resources would be safer. Plus using multi-threads, you must deal with concuranncy issues, passing data via Queue or other ‘safe’ methods. So a lot of wasted RAM and CPU time.

Given that you are just handling small transactions & not state-machines, I would leave everything in 1 thread and build up the r_list with clients for a single select. You do need a slightly more complex data structure to enable returning ZB responses to the correct client, but given you have no promise that ZB req/rsp cycles remain in order you need this anyway (meaning rsp #2 may return faster than rsp #1 … unless you block thread #2 and don’t send any more requests until thread #1 rsps or times out).

The reason a Linux ‘server’ uses 1 thread per client is Linux includes clean-up which absolutely frees up resources for a dying thread - therefore it is safe & easy. The X4’s OS doesn’t have such support, so THAT particular value of the client threads is not there.

Yes, after do some digging with ConnectPort X4’s built-in tools (login with telnet and input “display memory” to watch the memory usage during my python programming running), I can confirm that you’re right, the memory is used up and at that time my program throw a eception “cannot start new thread”.

I do use the Queue to pass data from TCP client thread to another Zigbee service thread. Actually I defined a ZBNodeRequest class and this class contains a field indicating the tcp socket instance related with. So once this kind of request from a TCP client pushed into ZigBee Queue, the TCP socket id is tracked. Then ZigBee servering thread forward the request data to specified node and waiting the node response (here I assume the request is processed one by one), if in a period of time, there is no response from specified node, the request is maked timeout and will be pop from request queue(that is to say it is removed).

I spent a lot of time thinking how to handle the TCP clients and ZigBee nodes communication in a proper way and once before I believed this kind of task is so common in many applications and there must be an existed design pattern for us to solve it with no pain. But every kind of solutions has limitations. Take select() in single thread for example. I don’t know how to handle the blocking issue. If one client blocks the thread then other clients will not be served. This is what I did at the beginning.

Multi-thread is a classic solution for many paltforms and advised by many books I read. I’m glad to here from you about “resource free up” staff and I will check that in detail in my program. Anyhow, a person like me who learn programming in python just a month has lots of chance to make these issues.