flash corruption

I have a multithreaded application which makes use of the flash driver. All access to the flash driver is done by a single thread. My problem is that sometimes a byte or 2 is written in the wrong segment of the flash device. It seems to be random where and how often this happens. One thing which is consistent is that it is always bits going from ‘1’ to ‘0’. I have yet to find any other consistencies about the issue. I have tried so many different things, but all ends up the same. I hope some of you have an idea of what is causing this problem.

Bits in a flash will always go from 1 to 0. You cannot write a 1 over a 0. So, this does not give any indication, it’s just normal behaviour (except that it is written in the wrong sector, of course). What platform are you using? Being multithreaded, are you sure that only one thread is writing to the flash at a time? You should explain more on how you tested this problem. Are you sure that the error is at writing, and not when you read the flash?

I am positive that only 1 thread is accessing the flash device at a time.All access goes thorugh a singleton class, which is semaphore protected. All writing is done by a low priority thread owned by the singleton class. This thread is using the same singleton to ensure I dont read before I am done writing or the other way around. The test I perform is that I write about 3 blocks in the flash every 10-15 minutes. After a few hours if I reboot the module or read out the contents of the flash ram, I can determine that changes have been made in places where I have not written ( at least not on purpose ). I have attempted to catch the calls to LWriteWord in the naflash.c file once they are written outside my accepted area of the flash. I have been unable to catch any calls outside the area. Hope this is enough info to help.

What do you mean by “read out the contents of the flash ram”? If the thread is low-priority, maybe the write procedure is interrupted by other high-priority threads and the data gets corrupted. You have to write a full page in a timely manner. You said you checked after few hours. What is the result if you check immediately? Is the data correct? If yes, it maybe a problem with the flash memory and it degrades in time. It can be erased up to about 100,000 times, which is a big number, but if you do it in a programm, you can easily reach this figure.

“read out the contents of the flash ram” is done as follows: I use the debugger’s memory manipulation routine for dumping memory contents to a file. I dump the program part of the flash ( about 2.3Megabytes ). I then perform a binary compare of the dumped contents against the binary file I originally wrote in the flash. If I only write 1 or 2 times, it is normally ok. It appears to be random how many times I have to write before the problem occurs. I am fairly sure I have not reached the 100.000 times the flash can be programmed ( I have veriied that it is actually 100.000 times for the flash device I use ). You write that I have to write a full page in a timely manner. Can you tell me more about a “full page” means?. Because there is no doubt that my writing thread is interrupted by other threads during the write process. I was under the assumption that was not a problem, but perhaps I am mistaken. Any extra information about this will be much appreciated. Thanks for all the help so far Adrian, it is excellent to be able to get competent help so fast:)

Some flash memories require to write a full page at a time and, if you interrupt it before finishing, the data is corrupted. Depends on what memory you have. But I don’t think this is the case as you probably have a Netsilicon Dev board, where memory can be written byte by byte. Anyway, the memory can be erased only by block, so, if you want to write just one byte and preserve the others, the block is copied into RAM then erased, the byte is modified into RAM, then the block is rewritten completely. Same for more bytes. If the data in RAM is corrupted somehow, it will write corrupted data into flash. So, this is another thing to check. If you have a different flash than those from dev boards, you should re-write the bsp

The board I am running off is a custom board developed by my compagny, based on the reference design by Netsilicon. We use a flash device not originally supported by Netsilicon, and we have therefore modified the BSP to accomodate this flash. This flash operates as you descripe, with erase being a full sector and write is 1 word ( 2 bytes ) at a time. The routine used always erases, writes all bytes and verifies. During this whole process, no other flash activity is present. In my latest test which ran for the last couple of days was raising the priority of the flash thread above all but a simple watchdog thread. Since this I have not had any corruption of the flash device. But the software crashes from time to time ( It was never designed for this priority so am not too surprised ).