Page 1 of 1

ESP32 CAN Web Interface can't recover from failed update

Posted: Thu Jul 25, 2024 3:52 pm
by johu
The STM32 bootloader enables the independent watchdog and the firmware is expected to feed that at least once a second or so. If it fails to do so, the STM32 is reset.

This is used to recover from a failed update. If a firmware is just partially flashed it is usually not runnable and won't feed the dog. The STM32 ends up in a reset loop where the boot loader asks for an update like every 2s. The works fine with the update over uart (both python script and ESP), it also works with the CAN python script but it doesn't work with the ESP32 CAN.

Now I simulated the situation by pulling down the STMs reset pin, starting an update and then releasing the pin. Everything looks good:
grafik.png
The boot loader sends it 0x33 hello message with serial number, the ESP32 reflects the serial number, the STM asks for size (S), the ESP replies with 0x8C or 140 pages, the STM asks for the first page (P) and then the ESP32 successfully transmits 1024 bytes.

It becomes dodgy afterwards:
grafik.png
The last 8 bytes are requested (P) and transmitted, the CRC is requested (C) and transmitted but then instead of requesting the next page with "P" the boot loader reports update finished "D". Consequently the ESP also goes into done state.

This is the code snippet in question:
grafik.png
grafik.png (13.59 KiB) Viewed 7299 times
Decrement remaining number of pages and if we reached 0 go to done state. So it seems like numPages is set to 1 instead of 140.

Here is the file: https://github.com/jsphuebner/stm32-CAN ... loader.cpp

I suspect some timing issue as I don't see what else should be different between sending via PC or ESP.

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Thu Jul 25, 2024 4:14 pm
by johu
Found it! It's indeed a timing issue:
grafik.png
The CAN messages come in so fast that "if (state == PAGECOUNT || state == PAGE)" hasn't been reached. Instead we are in state CRC, the entire block is skipped and we proceed straight to sending "D".

Ok, will do something about it in the boot loader but also in the ESP for backward compatibility

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Thu Jul 25, 2024 4:24 pm
by johu
Easy fix. Just included the PROGRAM state in the if

Code: Select all

if (state == PAGECOUNT || state == PAGE || state == PROGRAM)
The PROGRAM state can only be left by the actual programming routine, so that is now safe. Now just have to worry about a backward compatibility hack.

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Thu Jul 25, 2024 4:48 pm
by johu
Added a sub version number. The new version now advertises '3', '1' instead of '3', '0'. When '0' is advertised the ESP will insert an additional delay before sending the page count.

Created a new release: https://github.com/jsphuebner/stm32-CAN ... s/tag/v1.2

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Sat Jul 27, 2024 4:32 pm
by davefiddes
Interesting. I'm in the middle of writing a new upgrade client for opeinverter_can_tool so will need to take this into account. I've never found either upgrade mechanism particularly reliable and this might explain why.

It may be a silly question what is the official upgrade mechanism to upgrade the bootloader? Just openocd?

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Sat Jul 27, 2024 4:53 pm
by johu
Indeed the CAN Bootloader can only be upgraded via openocd. bootupdater (bootloader in application space that writes to boot loader space) only supports uart so far. https://github.com/jsphuebner/tumanako- ... ootupdater

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Sun Jul 28, 2024 1:12 pm
by davefiddes
I think I've spotted another flaw in the state machine which could break upgrades. The "D" frame used to indicate the upgrade is done is always emitted by the bootloader. This could cause problems if a second board boots in the middle of an upgrade. The upgrade could see this as coming from the first board being upgraded and terminate the process early.

It seems to me that boards should only emit HELLO frames on boot and only a single identified device should then be allowed to send S, P, C and D frames. This cleanly separates the discovery from the upgrade parts of the process. During an upgrade clients obviously have to ignore any HELLO frames they see.

I don't think that fixing this would affect any existing upgrade clients.

As I write my client I'm trying to document the protocol and write tests to exercise it. I realise you are pushed for flash space so need to take a few short cuts. My ultimate aim is to write a C2000 MCU bootloader for the Tesla M3 DU where we have 2MiB of flash space... :twisted:

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Sun Jul 28, 2024 5:44 pm
by johu
Excellent idea, will change that

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Mon Jul 29, 2024 10:48 am
by Mitchy
One quick documentation update, The initial readme refers to the 0x08002000 start address, but in fact it is 0x08001000
Notes:
- By checksum I mean the one calculated by the STMs integrated CRC32 unit.
- The actual firmware has a reset command the cycle through the bootloader
- The main firmware must be linked to start at address 0x08002000
- The bootloader starts at address 0x08000000 and can be 4k in size
(right now its 3.9k)
Checked a few forum threads, and it was a bit unclear the true start address.
so correcting the github likely should resolve it if anyone else is wondering.

Re: ESP32 CAN Web Interface can't recover from failed update

Posted: Mon Jul 29, 2024 1:04 pm
by johu