[Battlemesh] Battlemesh v5 tests

Fri Mar 9 09:56:36 CET 2012

Hey Gabriel,

thanks for bringing the discussion to the batman ml and giving some constructive
input. I've written this bonding/alternating feature some time ago, and we released
it at WBMv3 together with this little documentation to be found in the wiki. Actually,
I considered the feature rather simple and therefore I did not write too much about it
- because there is not really much to write about, or so I thought. Obviously, there
were some things unclear, so thanks for pointing me/us to that. 

When implementing, it is easy to miss some things that are not that obvious
for outsiders, so please feel free to ask or suggest things. We'll rework the
bonding/interface alternating part in the next days, and would be
happy to include your suggestions. :)

Usually, we create the protocol documentation for the purpose of review and
documentation for other batman-adv devs - and we don't expect that they all
fall on the head at the same time. They are meant to describe the concept and
not the actual implementation with all their nasty details.

On Wed, Mar 07, 2012 at 11:18:48PM +0100, Gabriel Kerneis wrote:
> [CC: b.a.t.m.a.n at lists.open-mesh.org, see note 3 in particular]
> 
> Antonio,
> 
> On Wed, Mar 07, 2012 at 06:17:52PM +0100, Antonio Quartulli wrote:
> > Technical details about what? Interface-alternating? It is there!
> > Gabriel wrote the link.  
> 
> No. Please re-read my email carefuly.  The wiki contains a rough explanation of
> the general principle (ie. “same interface = bad, different interface = good”).
> Not the actual algorithm used by batman-adv (quoting from the wiki: “the
> algorithm tries to avoid forwarding packets on the interface which just received
> the packet”).
> 
> Note that the wiki has been updated since then, by Simon with a few more
> details [1], and by Marek with benchmark results from WBMv3.
> 

Maybe "algorithm" is a big word for a little feature like that. The bonding
and interface alternating basically work in two steps:

 1) detect that a neighbor is reachable via two different links
 2) use the two different links for various manipulations (bonding, interface alternation)

1) The detection part is batman-specific, we use the the PRIMARIES_FIRST_HOP flag
to do that. As a reminder (that might be documented somewhere else):

 * OGMs from the primary interface are broadcasted on ALL interfaces and are spread over
   the mesh (big TTL) --> these get the PRIMARIES_FIRST_HOP flag, which is cleared
   when forwarded by other nodes
 * OGMs from the secondary interfaces are only broadcasted on their respective interface
   and are only used for local link sensing (TTL = 1)

When we receive OGMs with PRIMARIES_FIRST_HOP flags on different interfaces, we know
that it came from the same neighbor, just from different interfaces. We have two
links to this neighbor.

2) the manipulation step is independent of the routing protocol, as long as the routing
protocol routes packets based on their destination and does not care about on which
interface it comes in.

Because we already made our routing decision (we have chosen a neighbor), it does not
matter on which link we send the frame. We use this freedom to either use another
interface where the frame came in (interface alternation) or round-robin over the
available, detected links (bonding). Note that this would work on any routing protocol
and is independent of the BATMAN routing.

However, we need the fact that we are on layer 2 and can decide on the packet link usage
in batman-adv. This would not work so easily with static layer 3 routing tables, I suppose.

> > Gabriel said he has not enough time to look into it. I'm sorry, but I don't think
> > this is a good reason to blame batman-adv devs :P
> 
> I finally decided to settle this issue and spent my breakfast reading
> batman-adv/routing.c [2] instead of my favorite newspaper.  Here is what I
> understood:
> 
>     At all times, batman-adv maintains a list of "bonding candidates" for each
>     node (bonding_candidate_add, called from bat_iv_ogm.c:699).
>     Some node "neigh" is a bonding candidate for another node "orig" if and only
>     if:
>     - neigh and orig have the same primary address, ie. are in fact the same
>       router,

that's right - we are talking about one neighbor, and the bonding candidates are the
available links to this neighbor.

>     - the links to reach them have the same quality up to some additive
>       constant (BONDING_TQ_THRESHOLD = 50) [3],

Yep, it would be useless if we can reach one link perfectly and the other one
is dropping all the packets. We want similar TQ quality.

>     - orig does not already have another bonding candidate for the same
>       interface, because it could interfere – but what if the neigh has a better
>       link quality, isn’t it a pity to ignore it?

If it had a better quality, it would have been chosen as router already - at least
we expect that here. Maybe this is a little rough, but using the same interface/frequency
is far worse, IMHO.

> 
>     Then, assuming that "interface alternating" is enabled, the list of bonding
>     candidates is used on every route selection (find_ifalter_router, called
>     from routing.c:769).

Thats right. Interface alternating is always enabled, BTW.

>     More precisely, once batman has chosen a next-hop router for a packet based
>     on its classical routing algorithm, it walks the list of the bonding
>     candidates associated to the primary interface for this router [4].  It
>     selects the actual next-hop on the following criteria:
>     - it must not be on the same interface as the packet came in,
>     - its quality must be as high as possible (given the previous constraint).
> 
> This is the kind of explanation I would have loved to find on the wiki.  By the
> way, consider it public domain and feel free to copy/paste/correct it if you
> wish.

Thanks for sharing your explanation. I will happily include it on the rework of
this section. 

> 
> It is still not clear to me exactly why this works, but I believe this is what
> the code does, and is definitely easier to discuss than generic, unsubstantiated
> claims.
> 
> Best regards,
> Gabriel
> 
> [1] “Interface alternating is only performed if the two candidate links to the
>     next hop have a similar quality.”
>     http://www.open-mesh.org/wiki/batman-adv/Multi-link-optimize
> 
> [2] http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/entry/routing.c
> 
> [3] By the way, there is something I don’t understand: neigh_node->tq_avg will be
>     accepted event if it is far greater than router->tq_avg + BONDING_TQ_THRESHOLD.
>     Shouldn’t it be: abs(neigh_node->tq_avg - router->tq_avg) > BONDING_TQ_THRESHOLD?
>     http://www.open-mesh.org/projects/batman-adv/repository/revisions/master/entry/routing.c#L166

We expect that router->tq_avg is already the highest, so neigh_node->tq_avg shouldn't
be (far) higher than router->tq_avg.
> 
> [4] Why the primary and not the chosen router directly? Is the bonding
>     candidates list always associated to the primary interface?

We might have chosen the originator of a secondary interface, but should also
have the originator of the primary interface (as explained above, we receive
this over the secondary interfaces as well). The primary orig will have all
neighbors from secondary interfaces as well, and yes, the bonding candidates are
only associated to this primary originator (to avoid duplication of the same 
information), so this is the proper originator to choose for bonding/alternation.
This is merely a implementation issue, and does not change the routing
decision.

Thanks again for your comments - I'll notify you when we have updated 
the protocol documentation for your review, if thats okay?

Cheers,
	Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ml.ninux.org/pipermail/battlemesh/attachments/20120309/d94e8540/attachment-0001.sig>