[Battlemesh] BattleMeshV7: testing incremental fw upgrade feature

Sat Mar 29 20:20:07 UTC 2014

On 29/03/2014 20:06, l0aCk3r [matteo] wrote:
> 
> On Mar 29, 2014, at 6:05 PM, John Crispin <john at phrozen.org>
> wrote: [CUT]
>> Any volunteers that would like to help me work on this ? ideally
>> we get organized before the event as not to loose too much time
>> on site setting things up, but can get straight to the actual
>> testing.
> 
> Really interesting feature, I can help to make the setting.

thanks, here is how it works...

normally we have squash + jffs2 overlay. what we do instead is this ...

we dont use rootfs_data for jffs2 anymore but instead use it to store
a chain of erasesize aligned blocks. each block has a header and a tar
file inside it. header has size, hash, and type. there are snapshot
and volatile blocks. the volatile entry can only exist once and as the
last sentinel of the block chain.

upon boot, the unit mounts squash, does an overlayfs mount, but uses a
tmpfs instead of the now not existing jffs2. we unpack the tar files
in the order found inside the chain. once all snapshot blocks are
unpacked, we do another stacked overlayfs mount with a tmpfs and
unpack our volatile block into it. we now have a stacked overlay root
with /snapshot holding the delta of the first and /overlay holding the
delta of the second mount.

the system is now essentially similar to an initramfs image, where the
changes in /etc/config/ get lost on boot. so there is a mechanism to
write the content of /overlay into the volatile block.

if at any point i am happy with my current config, i can also snapshot
the system. this will essentially convert an existing volatile entry
to a snapshot entry.

to ensure that while overwriting blocks (i.e. writing a new volatile
over an existing one or while converting a volatile to a snapshot) we
use the last few sectors of the flash as a back buffer. this way we
have a valid copy if the "data to be deleted" while we are overwriting
it. this allows us to always be able to fallback to the last known
working version.

you can test this in trunk images right now. the tool is deployed in
the default config. once your system has booted and jffs2 init is
done, simply call "/sbin/snapshot convert". this will turn your
current jffs2 overlay delta into a block chain with a single volatile
block.

there is also "/sbin/snapshot upgrade". this will try to pull opkg
updates from the server, for example a security fix, and if it does
find one or more it will automagically install the updates and then
safe this as a snapshot entry in the chain.

there are many use cases where this make sense, however it should not
be seen as the new replacement for jffs2. this stuff is aimed at
deployments, where many units run the same firmware and should all get
the same updates at the same time. we aim to store config and small
fixes in the block chain. after all, it is a tmpfs we run on.

the cool thing is, that you can simply rollback to an existing
snapshot if anything fails. a trigger for a rollback could be "cant
connect to mesh anymore" etc

i have so far only tested this in single unit deployments, so i am
hoping to gather some experience at the wbm to find out what software
stack we need to build to remotely manage this for N units.

	John