ESP32 CAN Web Interface can't recover from failed update
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
ESP32 CAN Web Interface can't recover from failed update
The STM32 bootloader enables the independent watchdog and the firmware is expected to feed that at least once a second or so. If it fails to do so, the STM32 is reset.
This is used to recover from a failed update. If a firmware is just partially flashed it is usually not runnable and won't feed the dog. The STM32 ends up in a reset loop where the boot loader asks for an update like every 2s. The works fine with the update over uart (both python script and ESP), it also works with the CAN python script but it doesn't work with the ESP32 CAN.
Now I simulated the situation by pulling down the STMs reset pin, starting an update and then releasing the pin. Everything looks good:
The boot loader sends it 0x33 hello message with serial number, the ESP32 reflects the serial number, the STM asks for size (S), the ESP replies with 0x8C or 140 pages, the STM asks for the first page (P) and then the ESP32 successfully transmits 1024 bytes.
It becomes dodgy afterwards: The last 8 bytes are requested (P) and transmitted, the CRC is requested (C) and transmitted but then instead of requesting the next page with "P" the boot loader reports update finished "D". Consequently the ESP also goes into done state.
This is the code snippet in question: Decrement remaining number of pages and if we reached 0 go to done state. So it seems like numPages is set to 1 instead of 140.
Here is the file: https://github.com/jsphuebner/stm32-CAN ... loader.cpp
I suspect some timing issue as I don't see what else should be different between sending via PC or ESP.
This is used to recover from a failed update. If a firmware is just partially flashed it is usually not runnable and won't feed the dog. The STM32 ends up in a reset loop where the boot loader asks for an update like every 2s. The works fine with the update over uart (both python script and ESP), it also works with the CAN python script but it doesn't work with the ESP32 CAN.
Now I simulated the situation by pulling down the STMs reset pin, starting an update and then releasing the pin. Everything looks good:
The boot loader sends it 0x33 hello message with serial number, the ESP32 reflects the serial number, the STM asks for size (S), the ESP replies with 0x8C or 140 pages, the STM asks for the first page (P) and then the ESP32 successfully transmits 1024 bytes.
It becomes dodgy afterwards: The last 8 bytes are requested (P) and transmitted, the CRC is requested (C) and transmitted but then instead of requesting the next page with "P" the boot loader reports update finished "D". Consequently the ESP also goes into done state.
This is the code snippet in question: Decrement remaining number of pages and if we reached 0 go to done state. So it seems like numPages is set to 1 instead of 140.
Here is the file: https://github.com/jsphuebner/stm32-CAN ... loader.cpp
I suspect some timing issue as I don't see what else should be different between sending via PC or ESP.
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Found it! It's indeed a timing issue:
The CAN messages come in so fast that "if (state == PAGECOUNT || state == PAGE)" hasn't been reached. Instead we are in state CRC, the entire block is skipped and we proceed straight to sending "D".
Ok, will do something about it in the boot loader but also in the ESP for backward compatibility
The CAN messages come in so fast that "if (state == PAGECOUNT || state == PAGE)" hasn't been reached. Instead we are in state CRC, the entire block is skipped and we proceed straight to sending "D".
Ok, will do something about it in the boot loader but also in the ESP for backward compatibility
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Easy fix. Just included the PROGRAM state in the if
The PROGRAM state can only be left by the actual programming routine, so that is now safe. Now just have to worry about a backward compatibility hack.
Code: Select all
if (state == PAGECOUNT || state == PAGE || state == PROGRAM)
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Added a sub version number. The new version now advertises '3', '1' instead of '3', '0'. When '0' is advertised the ESP will insert an additional delay before sending the page count.
Created a new release: https://github.com/jsphuebner/stm32-CAN ... s/tag/v1.2
Created a new release: https://github.com/jsphuebner/stm32-CAN ... s/tag/v1.2
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
-
- Posts: 288
- Joined: Mon Jan 18, 2021 12:39 pm
- Location: Edinburgh, Scotland, UK
- Has thanked: 69 times
- Been thanked: 88 times
Re: ESP32 CAN Web Interface can't recover from failed update
Interesting. I'm in the middle of writing a new upgrade client for opeinverter_can_tool so will need to take this into account. I've never found either upgrade mechanism particularly reliable and this might explain why.
It may be a silly question what is the official upgrade mechanism to upgrade the bootloader? Just openocd?
It may be a silly question what is the official upgrade mechanism to upgrade the bootloader? Just openocd?
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Indeed the CAN Bootloader can only be upgraded via openocd. bootupdater (bootloader in application space that writes to boot loader space) only supports uart so far. https://github.com/jsphuebner/tumanako- ... ootupdater
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
-
- Posts: 288
- Joined: Mon Jan 18, 2021 12:39 pm
- Location: Edinburgh, Scotland, UK
- Has thanked: 69 times
- Been thanked: 88 times
Re: ESP32 CAN Web Interface can't recover from failed update
I think I've spotted another flaw in the state machine which could break upgrades. The "D" frame used to indicate the upgrade is done is always emitted by the bootloader. This could cause problems if a second board boots in the middle of an upgrade. The upgrade could see this as coming from the first board being upgraded and terminate the process early.
It seems to me that boards should only emit HELLO frames on boot and only a single identified device should then be allowed to send S, P, C and D frames. This cleanly separates the discovery from the upgrade parts of the process. During an upgrade clients obviously have to ignore any HELLO frames they see.
I don't think that fixing this would affect any existing upgrade clients.
As I write my client I'm trying to document the protocol and write tests to exercise it. I realise you are pushed for flash space so need to take a few short cuts. My ultimate aim is to write a C2000 MCU bootloader for the Tesla M3 DU where we have 2MiB of flash space...
It seems to me that boards should only emit HELLO frames on boot and only a single identified device should then be allowed to send S, P, C and D frames. This cleanly separates the discovery from the upgrade parts of the process. During an upgrade clients obviously have to ignore any HELLO frames they see.
I don't think that fixing this would affect any existing upgrade clients.
As I write my client I'm trying to document the protocol and write tests to exercise it. I realise you are pushed for flash space so need to take a few short cuts. My ultimate aim is to write a C2000 MCU bootloader for the Tesla M3 DU where we have 2MiB of flash space...

- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Excellent idea, will change that
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9
Re: ESP32 CAN Web Interface can't recover from failed update
One quick documentation update, The initial readme refers to the 0x08002000 start address, but in fact it is 0x08001000
so correcting the github likely should resolve it if anyone else is wondering.
Checked a few forum threads, and it was a bit unclear the true start address.Notes:
- By checksum I mean the one calculated by the STMs integrated CRC32 unit.
- The actual firmware has a reset command the cycle through the bootloader
- The main firmware must be linked to start at address 0x08002000
- The bootloader starts at address 0x08000000 and can be 4k in size
(right now its 3.9k)
so correcting the github likely should resolve it if anyone else is wondering.
- johu
- Site Admin
- Posts: 6708
- Joined: Thu Nov 08, 2018 10:52 pm
- Location: Kassel/Germany
- Has thanked: 367 times
- Been thanked: 1536 times
- Contact:
Re: ESP32 CAN Web Interface can't recover from failed update
Fixed https://github.com/jsphuebner/stm32-CAN ... 04194bd795 and fixed https://github.com/jsphuebner/stm32-CAN ... 7f8f1e7dc4
Support R/D and forum on Patreon: https://patreon.com/openinverter - Subscribe on odysee: https://odysee.com/@openinverter:9