We are running dgrp-1.9-24 on an x86-64 linux-2.6.31 kernel (Intel Xeon X3430 processor) to an Etherlite 80.
Following a successful call to select(2) which indicates that the file descriptor is writable, a call to write(2) to write 10 bytes sometimes blocks seemingly indefinitely (at least 24 hours). The application is multi-threaded but all access to the serial port are protected by a mutex, which the thread concerned has obtained. Other ports on the Etherlite (to the same application) continue to work normally. Immediately prior to issuing the select & write, the application sets DTR to on using the TIOCMSET ioctl.
What firmware version is being used?
What is shown within dpa.dgrp when the port is in this condition for the active port signals?
How is the port recovered (i.e. killing the application, rebooting the Etherlite, etc…)?
Is it possible there was a short network outage that was undectected by the application? Check the /var/log/messages file for any corresponding messages.
Firmware version V1.6
In this condition dpa.dgrp shows a floating point exception and crashes when I try to access the details of the ‘hung’ port (other ports show correctly)
Using the /sys/ interface, ‘cat baud_info’ generates a segfault
,cat state_info’ shows Open
‘cat msignals_info’ shows RTS CTS DTR DSR
‘cat cflag_info’ shows 0
‘cat digiflag_info’ shows 80
As a result of the ‘cat baud_info’, there is a stack trace in the kernel log
divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/tty/tty_dgrp_aa_2/baud_info
CPU 1
Modules linked in: dgrp
Pid: 1294, comm: cat Not tainted 2.6.31-gentoo-r10 #1 PowerEdge R210
RIP: 0010:[] [] dgrp_tty_baud_show+0x59/0x70 [dgrp]
RSP: 0018:ffff88009b8cfe88 EFLAGS: 00010246
RAX: 00000000001c2000 RBX: fffffffffffffffb RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88023c4ba800
RBP: ffff88009b8cfe88 R08: ffff88009ba85000 R09: 0000000000000017
R10: ffffea000220cd18 R11: 0000000000000246 R12: ffffffffa00185e0
R13: ffff88023d93d320 R14: ffff8801aa42d5c0 R15: ffff88023c4ba810
FS: 00007f8bfc63e6f0(0000) GS:ffff88002804d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8bfd61d038 CR3: 000000009da67000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process cat (pid: 1294, threadinfo ffff88009b8ce000, task ffff88023a6dc230)
Stack:
ffff88009b8cfea8 ffffffff812edfba ffffffffffffffed ffff8801aa42d5a0
<0> ffff88009b8cff08 ffffffff8111aeb0 ffff88009b8cfee8 ffff88009b8cff48
<0> 0000000000008000 00007f8bfd615000 ffffffff8176cd10 ffff88023dd5cd80
Call Trace:
[] dev_attr_show+0x2a/0x60
[] sysfs_read_file+0xb0/0x170
[] vfs_read+0xc8/0x1a0
[] sys_read+0x50/0x90
[] system_call_fastpath+0x16/0x1b
Code: 18 01 a0 be 00 10 00 00 4c 89 c7 31 c0 e8 30 71 21 e1 c9 48 98 c3 0f 1f 40 00 0f b7 b2 70 01 00 00 ba 00 20 1c 00 89 d0 c1 fa 1f fe 89 c1 eb cb 90 31 c0 c9 c3 66 66 66 2e 0f 1f 84 00 00 00
RIP [] dgrp_tty_baud_show+0x59/0x70 [dgrp]
RSP
---[ end trace 9af1b5ef220e0966 ]---
The port is recovered by restarting the application. In all the cases where it has failed, it has been when sending the AT command to the attached modem in preparation for answering an incoming call. Because this is the only place where DTR is raised before sending data, I suspected it might be a race condition so I tried adding a 0.5s delay after sending the IOCTL to raise DTR. Following this i went longer before failing - but this may just have been a coincidence.
A wake up in select() on a write fd only guarantees that at most 1 byte can be written.
If you send in 10 bytes, the write could possibly block forever, depending upon whether the tty is in nonblocking mode or not.
From the WRITE(P) man page, which gives a better description of blocking/nonblocking on write IO:
When attempting to write to a file descriptor (other than a pipe or FIFO) that supports non-
blocking writes and cannot accept the data immediately:
* If the O_NONBLOCK flag is clear, write() shall block the calling thread until the data can
be accepted.
* If the O_NONBLOCK flag is set, write() shall not block the thread. If some data can be
written without blocking the thread, write() shall write what it can and return the number
of bytes written. Otherwise, it shall return -1 and set errno to [EAGAIN].
Did you open the tty in nonblocking mode, or set nonblocking during the open runtime?
If not, could you add this and try again?