missing bytes (serial / RS232)

I have connected a Digi Connect ME to an Elmo Whistle motion controller via RS232.

Sometimes I’m missing a number of bytes (For example: I should receive 5 bytes but I’m only receiving one).

I have also connected a PC with HyperTerminal to the Tx pin of the Elmo Whistle and the PC does receive all bytes! So I’m 100% sure that the problem is on the Digi side.

I have checked the structure filled by tcgetcounters() but there are no overruns or errors.

What should I do to solve this problem?

Notes:
I’m using a low baudrate (19200), 8N1, no flow control.
I’m missing bytes only occasionally, most of the time I receive all bytes.
I have tried blocking and non-blocking I/O, but this doesn’t matter.

#define BSP_SERIAL_PORT_API BSP_SERIAL_API_TERMIOS
#define BSP_SERIAL_PORT_1 BSP_SERIAL_UART_DRIVER

Update: I’ve discovered this:

I keep polling read() to receive bytes for three seconds long. It keeps returning -1 and getErrno() == EWOULDBLOCK. If a that point I write one more byte to the serial port then read() is working again! It returns the ‘missing’ bytes.

That’s really strange, because I polled read() for 3 seconds and it keeps saying there are no bytes in the buffer, while there ARE bytes in the buffer.

So my code looks like this:

  • keep polling read() for three seconds long
  • no bytes are received
    tcgetcounters() -> rbytes = 213, tbytes = 86
    tcgetbuffers() -> rxbuf = 0, txbuf = 0
  • write(1 byte)
    tcgetcounters() -> rbytes = 217, tbytes = 87
    tcgetbuffers() -> rxbuf = 4, txbuf = 0
  • read() returns the missing bytes!

I think we need a fix/patch from Digi for this problem.

Hello all, I thought I might make a few comments. It looks like there’s at least 5 different serial problems all attached to the same thread.

Compie’s issue is tied to a NET+OS 7.1 serial driver issue where 1 - 3 bytes could get stuck in the FIFO because of character gap timings. The best fix is to move up to 7.2, the second best is to play with your character gap timings (i.e. directly mess with the registers) after you’ve setup the serial port.

nfgaida issue is with his code, it’s just bad (sorry). I’d use the attached code as a good reference point on how to use the serial port with select (and a TCP socket connection).

Joris’s first issue is that by default we enable software flow control. This causes 0x11’s and 0x13’s to be stripped from the data stream (and the data in between). Disable software flow control if you don’t plan on using it (again the attached example demonstrates how to use it).

Joris’s second issue is very likely bad coding or a transceiver that’s been put to sleep (see this article if you’re using the Digi Connect ME on the old dev board: http://www.digi.com/support/kbase/kbaseresultdetl.jsp?id=751), data doesn’t just get ‘shifted’ by a couple of bits as it comes out the serial port.

sofjk’s issue is actually pretty interesting. Over run errors are described as:

Indicates that a receive overrun error condition has
been found. An overrun condition indicates that the
FIFO was full while data needed to be written by the
receiver. When the FIFO is full, any new receive data
will be discarded; the contents of the FIFO before the
overrun condition remains the same.

Which means the serial driver wasn’t able to service the FIFO quickly enough. If the driver you’re using supports DMA (O_DMA when opening the port) I would recommend using it to pull the data off the FIFO quicker. But this is also dependant on the module and NET+OS version you’re using (for example: I would only use DMA with the Connect ME in NET+OS 7.4)

Have any of the patches released by Digi addressed this?

Try to use function: select() for reading. It is better than pooling and it also works better.

Jirka

Select seems to be for sockets. Are sockets usable with serial UART communication?

As of NET+OS V6.3, the select() API supported BOTH sockets and serial ports. So yes select() is support on a serial port.

Yes it is possible to use for UART.

Here is some exaple:

int fd;
fd_set read_set;
int ccode;
struct serial_buffer_t serial_buf;
struct timeval wait;

wait.tv_sec = 1;
wait.tv_usec = 0;

init_RS()
{…}

Read_data(){
FD_ZERO (&read_set);
FD_SET (fd, &read_set);
ccode = select (FD_SETSIZE, &read_set, (fd_set *) 0, (fd_set *) 0, &wait);

if ( ccode == 0) // timeout
{}
if (!FD_ISSET(fd, &read_set)) // some other error
{}

tcgetbuffers(fd, &serial_buf); //get num of recieved data
read(fd,buf,serial_buf.rxbuf);

}

Jirka

Huh. Thanks. Wouldn’t have gotten that from the API doc page on select.

That piece of missing documentation is addressed in NET+OS V7.4. Digi was made aware of it after V7.3 shipped. It is included in the description of select() under internetworking\sockets\functions\select

Where can I get NET+OS v7.4? I don’t see it on digi’s website. (at least not in the options for what version of net+os I have)

Please find the URL to the API reference guide for V7.4. As far as access to the kit, you’d have to talk with your distributor or Digi sales type.

http://ftp1.digi.com/support/patches/CurrentApiReference.zip

Is the “kit” in this case referring to the Digi ESP/eclipse environment, or the actual software + hardware?

I’d only be interested in getting the latest 7.4 software

(it would be awesome if they either upgraded eclipse or made it easy to use the digi esp environment in the latest version of eclipse).

Also, are the changes between 7.3 and 7.4 documented somewhere? The API doc doesn’t seem to have that information.

Thanks

If you are covered by a support contract then Digi sends you an upgrade to the software. You mention eclipse so I’ll assume you are running the GNU version. So I believe the upgrade includes NET+OS + ESP. Now what version of ESP and with what version of eclipse ESP inter-relates, I do not know.

If you are not covered by a support contract then you would need to contact your digi sales type or distributor to purchase an upgrade.

If you are covered by the support contract that comes with a jumpstart kit, then there is some limited window in which you can get an upgrade as described above.

I hope that helps.

Roughly following your example, I have the same problem I had without using select. Basically, I’m waiting for 208 bytes.

After first power-on, I send my data out, and wait for the response. I read() the number of bytes that tcgetbuffers says is there. However, that number of bytes is less than I’m expecting. The most recent example was 36bytes. I send again, and this time, I have 172bytes bytes waiting (36+172 = 208). All sends after this had 208 bytes. Then, if I turn off the other end and do a send/receive one more time, there are 208 bytes waiting for me again (obviously they were sitting there from the previous send). After that all send/receive attempts end up with no bytes received.

Thoughts?

Can you publish your receive procedure?

For some reason, Digi has enabled some settings by default. Look if the missing data is one of the control bits and are therefore not received or transmitted.
Mostly the values of 0x11, 0x12, 0x13 will be missing.

I’ve attached the basic routine I am using for the receive thread.

Advance thanks for looking.

Hi,
I quickly looked to you code. I found one possible problem:

In this part of code:

nBytes_Received = read( (int)fd, pRecvBuffer, serial_buf.rxbuf );

if( nBytes_Received == -1 ){
everything in this part o code is bad
}

See documentation - Read function.
When the read() returns error you have to check for errno but not to start select() again.

I will take look at you code at night …

Jirka

Those don’t seem to be the missing characters. In fact, there aren’t any characters that seem to go “missing”, just that the serial buffer lies about how many bytes are waiting.