New bootloader image won't correctly recover using DHCP/TFTP

I’ve been working on testing the ability to recover bad image.bin uploads using the bootloader’s DHCP/TFTP recovery mechanism. I’m using tftpd32, which didn’t work on Vista, but on XP it worked to upload a new image on modules that had an OLD bootloader image. But after updating the bootloader using a newly compiled (from NET+OS v7.3) rom.bin file, the recovery sequence does send out a DHCP discovery request, and the DHCP server program offers up a valid IP address, but the module ignores the offer and tries again 2 more times before failing and moving on to attempt a serial recovery.

When capturing debug code during the recovery attempt, I see “TFTP…”, indicating the the downloadImageUsingTftp() routine is being called in blmain.c. Then after the 3 attempts at getting IP parameters via DHCP, I get the message “DHCP fails”, indicating that the board_initialize_dhcp() call failed.

Has anyone else had this problem using a bootloader built from v7.3?

Two questions

  1. Have you pulled down, applied and rebuilt with all patches currently available on the Digi Web Site for NET+OS V7.3?

  2. On what device are you testing?

Hi, I’ve been having the same troubles as nfgaida :solid green and yellow lights from start up and serial output as follows:

Starting Recovery
TFTP…

Abort Exception

after that, nothing. Has anyone got anywhere with this?

No I haven’t yet applied any patches. I am close to release with this product and if possible I would like to be able to release it without any patches applied, since having to apply patches would add to the complexity of reproducing the design environment. Of course it looks like I may need a patch to fix this problem. I did look through the available patches and I didn’t see any that addressed my exact problem, although I did see that the bsp updates patch addresses a problem with storing an image downloaded by TFTP, which I’ll probably need anyway.

I am developing on the Connect ME. I have gotten this result on three different Connect ME modules, two of which are JTAG modules, one is a regular -C module.

Did you solve this problem? I’m running into the same situation.

Actually, after extensive troubleshooting and learning about DHCP and BOOTP standards I ended up learning that the boot file name can be specified either in the header OR as an option (option 67). Previous Connect ME bootloaders worked when the boot file name was only given in the header, but a bootloader built with NET+OS 7.3 will only pull the file if the bootfile name option is included.

I drilled down in the NET+OS code and I can’t figure out what changed, because the code in blmain.c and in blDhcp.c doesn’t seem to have changed. In fact the hasTftpInfo() function in blDhcp.c seems to look for the filename as an option, and then in the header, which seems like it should work.

I am using Tftpd32, as suggested in another post, and under “additional option” I have “67” and “image.bin”. This will add the boot file option to the DHCP response, and this did trigger my NET+OS 7.3 bootloader to pull the image.bin file from the TFTP server.

Note that I am running Tftpd32 under (a virtual install of) Windows XP. I first attempted using it with Windows Vista, but found that there are issues with the program under Vista.

Also note that I never actually recovered the module, but that may be due to the other known issue (which is addressed with a patch) regarding TFTP recovery. But adding the boot file as an option did get the bootloader to actually retrieve the boot file from the TFTP server.

The “additional option” under Tftpd32 was what was missing for me. Setting that and triggering a reset from my digi board, it downloaded the image, applied and reset itself. I almost missed the download, it happened so fast I barely saw a flash as the progress window opened and closed.

I do have all of the patches applied, so I have the TFTP recovery bug covered I guess.

Now to figure out which pins match to GPIO so I can force an update via TFTP if certain pins are set.

Thanks for the response.

By default (starting at NET+OS v7.2 I think), it checks for pins 18 and 20 pulled low during startup to automatically trigger an image download. This is done in the blmain.c file, which is in the workspace, which means of course that you can customize it if you wish.

Hi,
i had almost the same problem last month. Is wery important to install all patches to Net+OS!!! There is several updates of bootloader and etc. After this upload rom.bin to digi module (this may update bootloader) After this try upload image.bin.

But image recovery via tftp is not normal process for upload new firmware to digi module - its only failsafe recovery. For uploading new firmware you have to add FTP upload capability.

I have all the patches applied, and with that it seems to work.

The rom.bin file is the bootloader (as I understand it) so uploading that file will replace the bootloader on the module.

I wanted the TFTP update to be working in case I needed to update the application image. (Lets say in case my FTP upload capability wasn’t working correctly, etc).

I seem to be having the same issue.

My app was working with 7.0, but Digi began shipping new modules (distinguishable by having a T under the barcode) and when you load a 7.0 app into one of these, it doesn’t work. Digi said I needed to upgrade to 7.3, so I did. That seemed to work, but apparently it will work for a while, then fails again. I power up, yellow and green both on, after a bit, yellow goes off for a sec, then comes back on, and from then on, both lights on solid. A working unit will have the green turn off when the yellow comes back on, then flash as it begins communicating.

Since this was done on straight 7.3, I’ve since loaded all the patches, and rebuilt my image.

Trying to recover these modules, I got Tftpd32 running, and it acts like it is pulling in the fresh image, but nothing changes.

I suppose I could try the 7.4 that just came out, but it was a PITA just having to update to 7.3 a few weeks ago.

I think you have a different issue there.

My unit had solid orange+green lights from power-on. Putting the connectme in the DevBoard and looking at the serial output, I could see that there was an abort exception happening almost instantly after power-on.

If you are able to get a tftp32 response, that is more than I got. I believe that the bootloader was corrupted somehow, though Digi says that if it was it wouldn’t print anything. However their recovery method failed to work as well, it just went right to the abort exception. (on a working module, I was able to force the recovery every time).

Have you checked to make sure you are providing an updated image to tftpd32? I know I’ve done that a few times: build new release image, force module into recovery, wait as it downloads image and reboots, only to realize that I had forgotten to put the newly compiled image.bin in the tftp root folder.

I don’t know for sure, but I suspect that the “factory fresh” modules ship with a bootloader compiled from the latest NET+OS. And as the NET+OS 7.3 TFTP recovery issue patch documentation explains, a bootloader built with v7.3 does not use the image uploaded via TFTP (e.g. recovery doesn’t work!). So I’m betting that there are a set of modules out there that have that bad bootloader image on them.

I’m currently planning to upload my own bootloader image immediately after uploading my application image just to be sure that I know that the installed bootloader image will work properly (unless hardware changes).

Message was edited by: jfichtner

Yeah, maybe my problem is different. I do have a new image, but…

The release notes of one of the patches says this:


Title
TFTP recovery failure
Case: none
Date Fixed: 04/18/08
Description
Image download via TFTP completes, but does not execute.

Solution
Changed all members of the tftpc_conn_t structure to “volatile” as this structure is
accessed both from the bootloader code and the Ethernet interrupt processing code.


This tells me that even though I am downloading the new code, chances are I’m never executing it. If this is the case, I now have a ton of modules that have code in them that can’t be replaced (they are soldered onto a PCB and can’t use the serial recovery console, at least not without a ton of work to make my app pass through serial data from that port to another).

Jeff.

It seems like that bug would have to be present in the pre-loaded bootloader (scary?), not in your image. Have you tried uploading an app with FTP (like the sample) and uploading the bootloader (rom.bin) built with your application, and then uploading your new image via recovery?

Though, if this were the case even the FTP app wouldn’t take, and your modules would basically all need to be RMA’d, as no image would “take”.

Well, the original bootloader present when the modules were shipped accepted my 7.3 no patch image.bin. It would appear that, via TFTP, those modules will not accept any other bin. And since the firmware is locked for some other reason that prevents me from doing a normal FTP, it looks like I’m well and truly screwed.

The bootloader that comes on the modules is not what’s used to upload a new image.bin. There is an application running that allows image uploads via FTP, or via the netosprog.exe program. The problem (it seems) is that the normal TFTP bootloader recovery method doesn’t work with these modules, so since you’ve uploaded an image that doesn’t work, the module is now a paper weight!

This news does scare me, though, since it seems to mean that Digi actually changed something in hardware that is not compatible with images compiled under previous NET+OS versions. And they did so without any warning. So in a year when they do that again, my production image will now have to be updated to the newest NET+OS? And every year after that? I really hope that is not the case. Especially if there’s no advanced notice. This would mean that my company would have to stop production until I ported to the new OS and fully tested! That is very, very bad news indeed.

Have either of you confirmed this with Digi (the hardware change that is).

It seems like that would be a pretty major change.

The ConnectMe modules that we just purchased came with the FTP app from 6.X (at least, that is what it said when I logged in). I was able to upload my initial firmware via the FTP, and (on a working module) upload emergency firmware via TFTP. So maybe the modules I have aren’t the “new new” ones that have this issue that you are seeing? It seems odd that something that huge would make it past quality control.

I definately have not confirmed, nor have I seen the suspected hardware change issue. Only the TFTP recovery issue.

The original issue I had revolved around DNS lookups. My 7.0 app worked fine on older modules, up until the module which has an R under the end of the bar code. The next module we got had a T under the end of the bar code. With these modules, DNS lookups no longer worked. This was confirmed and I worked with Charlie to get a fix. The fix was two-fold. One, I had to make a few changes to my app to disable secondary interfaces, and two was to upgrade to 7.3. However, with my 7.3 apps, I am finding that they will work for a while, then lock up as described earlier (and that may or may not be related to 7.3, or my code) , and when that happens, I can’t recover because the TFTP recovery method doesn’t work. If I could recover, I could probably figure out if the lockup I am experiencing is caused by 7.3 or my code. But since I can’t recover, I’m doing nothing but making $50 bricks.