ESP32 CAN Web Interface can't recover from failed update

Discussion about various user interfaces such as web interface, displays and apps
Post Reply
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

ESP32 CAN Web Interface can't recover from failed update

Post by johu »

The STM32 bootloader enables the independent watchdog and the firmware is expected to feed that at least once a second or so. If it fails to do so, the STM32 is reset.

This is used to recover from a failed update. If a firmware is just partially flashed it is usually not runnable and won't feed the dog. The STM32 ends up in a reset loop where the boot loader asks for an update like every 2s. The works fine with the update over uart (both python script and ESP), it also works with the CAN python script but it doesn't work with the ESP32 CAN.

Now I simulated the situation by pulling down the STMs reset pin, starting an update and then releasing the pin. Everything looks good:
grafik.png
The boot loader sends it 0x33 hello message with serial number, the ESP32 reflects the serial number, the STM asks for size (S), the ESP replies with 0x8C or 140 pages, the STM asks for the first page (P) and then the ESP32 successfully transmits 1024 bytes.

It becomes dodgy afterwards:
grafik.png
The last 8 bytes are requested (P) and transmitted, the CRC is requested (C) and transmitted but then instead of requesting the next page with "P" the boot loader reports update finished "D". Consequently the ESP also goes into done state.

This is the code snippet in question:
grafik.png
grafik.png (13.59 KiB) Viewed 7296 times
Decrement remaining number of pages and if we reached 0 go to done state. So it seems like numPages is set to 1 instead of 140.

Here is the file: https://github.com/jsphuebner/stm32-CAN ... loader.cpp

I suspect some timing issue as I don't see what else should be different between sending via PC or ESP.
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Found it! It's indeed a timing issue:
grafik.png
The CAN messages come in so fast that "if (state == PAGECOUNT || state == PAGE)" hasn't been reached. Instead we are in state CRC, the entire block is skipped and we proceed straight to sending "D".

Ok, will do something about it in the boot loader but also in the ESP for backward compatibility
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Easy fix. Just included the PROGRAM state in the if

Code: Select all

if (state == PAGECOUNT || state == PAGE || state == PROGRAM)
The PROGRAM state can only be left by the actual programming routine, so that is now safe. Now just have to worry about a backward compatibility hack.
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Added a sub version number. The new version now advertises '3', '1' instead of '3', '0'. When '0' is advertised the ESP will insert an additional delay before sending the page count.

Created a new release: https://github.com/jsphuebner/stm32-CAN ... s/tag/v1.2
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
davefiddes
Posts: 288
Joined: Mon Jan 18, 2021 12:39 pm
Location: Edinburgh, Scotland, UK
Has thanked: 69 times
Been thanked: 88 times

Re: ESP32 CAN Web Interface can't recover from failed update

Post by davefiddes »

Interesting. I'm in the middle of writing a new upgrade client for opeinverter_can_tool so will need to take this into account. I've never found either upgrade mechanism particularly reliable and this might explain why.

It may be a silly question what is the official upgrade mechanism to upgrade the bootloader? Just openocd?
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Indeed the CAN Bootloader can only be upgraded via openocd. bootupdater (bootloader in application space that writes to boot loader space) only supports uart so far. https://github.com/jsphuebner/tumanako- ... ootupdater
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
davefiddes
Posts: 288
Joined: Mon Jan 18, 2021 12:39 pm
Location: Edinburgh, Scotland, UK
Has thanked: 69 times
Been thanked: 88 times

Re: ESP32 CAN Web Interface can't recover from failed update

Post by davefiddes »

I think I've spotted another flaw in the state machine which could break upgrades. The "D" frame used to indicate the upgrade is done is always emitted by the bootloader. This could cause problems if a second board boots in the middle of an upgrade. The upgrade could see this as coming from the first board being upgraded and terminate the process early.

It seems to me that boards should only emit HELLO frames on boot and only a single identified device should then be allowed to send S, P, C and D frames. This cleanly separates the discovery from the upgrade parts of the process. During an upgrade clients obviously have to ignore any HELLO frames they see.

I don't think that fixing this would affect any existing upgrade clients.

As I write my client I'm trying to document the protocol and write tests to exercise it. I realise you are pushed for flash space so need to take a few short cuts. My ultimate aim is to write a C2000 MCU bootloader for the Tesla M3 DU where we have 2MiB of flash space... :twisted:
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Excellent idea, will change that
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
Mitchy
Posts: 116
Joined: Sun Nov 14, 2021 12:16 pm
Has thanked: 4 times
Been thanked: 62 times

Re: ESP32 CAN Web Interface can't recover from failed update

Post by Mitchy »

One quick documentation update, The initial readme refers to the 0x08002000 start address, but in fact it is 0x08001000
Notes:
- By checksum I mean the one calculated by the STMs integrated CRC32 unit.
- The actual firmware has a reset command the cycle through the bootloader
- The main firmware must be linked to start at address 0x08002000
- The bootloader starts at address 0x08000000 and can be 4k in size
(right now its 3.9k)
Checked a few forum threads, and it was a bit unclear the true start address.
so correcting the github likely should resolve it if anyone else is wondering.
User avatar
johu
Site Admin
Posts: 6708
Joined: Thu Nov 08, 2018 10:52 pm
Location: Kassel/Germany
Has thanked: 367 times
Been thanked: 1536 times
Contact:

Re: ESP32 CAN Web Interface can't recover from failed update

Post by johu »

Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
Post Reply