[Battlemesh] Routing Tables of Death

Wed Mar 8 00:19:42 UTC 2017

On 08/03/17 00:29, Dave Taht wrote:
> On Tue, Mar 7, 2017 at 2:29 PM, Pau <pau at dabax.net> wrote:
>> On 07/03/17 22:56, Dave Taht wrote:
>>> On Tue, Mar 7, 2017 at 1:29 PM, Pau <pau at dabax.net> wrote:
>>>> Would be interesting to see if bmx7 (with security extensions [1]) is
>>>> able to minimize the impact of such kind of attack. Of course the
>>>> network flooding will still be a problem but if the routing protocol can
>>>> survive to it the attack impact would reduce drastically.
>>>>
>>>> [1] http://bmx6.net/documents/30
>>>
>>> My principal observation on this front at the moment is that routers tend
>>> to bottleneck on serving packets, and starve userspace daemons under
>>> loads like that.
>>>
>>> More testing is needed in these circumstances. I was using the flent
>>> tools to stress the forwarding path, and observing many anomalies on
>>> routing behaviors, of late. Here for example is what happened to me
>>> with 10k routes on an apu2, which as a quad core x86_64, is fully
>>> capable of forwarding well over a gbit/sec - but as the routing daemon
>>> ate more than a core and couldn't keep up, it would lose connectivity
>>> periodically.
>>>
>>> http://www.taht.net/~d/apu_router_failure/apu2_out_of_cpu.png
>>>
>>> I've seen the same things happen at smaller scales on weaker boxes
>>> (I'd been saying for years that many routing problems were congestive
>>> in nature, but had not got around to reliably simulating it til this
>>> past month, figuring that with fq_codel and ATF, many would go away.
>>> They didn't - because we haven't fixed wifi multicast yet).
>>>
>>> I was also recently doing myself in with some bad interactions between
>>> odhcpd, network-manager, and babeld in multiple circumstances.
>>>
>>> There's a ton of mostly crappy work to try and handle error handling
>>> better, as well as deal with cpu overload and network bufferbloat
>>> better going on in my rabeld repo that I'm in the process of
>>> organizing and cleaning up. Don't look there. It's a mess, currently,
>>> with only 6 or so good ideas out of 200+ commits.....
>>>
>>> As for bmx7:
>>>
>>> Adding userspace crypto into the mix may make things worse, rather
>>> than better, and perhaps punting more of the work to the kernel, and
>>> adopting a more hard RT perspective in multiple daemons to getting out
>>> vital packets like hellos and ihus while doing other processing, will
>>> help.
>>
>> I've not got deeply into your research so correct my if I'm saying
>> something incorrect.
> 
> Until recently I was mostly working on implementing the ideas
> discussed at battlemesh two years back[1,2] - which have all mostly
> landed in lede and in the upstream linux kernel.
> 
> but we didn't fix multicast. Analyzing that and fixing it is next, but
> I'm not sure if we'll get around to implement the obvious fixes before
> this years' bmesh.
> 
> (answer: short, fq'd drop head mcast/powersave queues)
> 
> ...
> 
> I've run out of time to pursue some of the ideas generated on the
> rabeld effort, I will try to write up the ideas that worked, and those
> that didn't over the coming week.
> 
>> As far as I understand from bmx7 (not much yet) it uses crypto to very
>> and identify nodes. Then you can create trusted rings and different
>> routing policies/strategies. For instance you can set the daemon to not
>> relay hello or OGM messages from not trusted origins or from nodes doing
>> suspicious things. In that case, create 1k fake nodes would not have any
>> impact to the network (just the flooding). This is what I mean for
>> mitigate the impact of the attack.
> 
> A) The cpu overhead of validating each packet via authentication is high.
> 
> (anyone know what the pps is for bmx7?)

I've been following a bit the development of bmx7 and discussing with
Axel about some topics. He was very concerned about the CPU overhead and
it is one of its main focuses. Check the document in the link I posted
before, page 38. It scales quite linearly.

> As one example of a further mitigation, you could register that a
> given node is being evil and start dropping most of its packets (via
> netfilter? ipset? bpf) without doing the "verify signature step" at
> all.
> 
> B) If the router is trusted, but either overburdened itself or the
> network is, what happens?

Of course the node will not work properly but it won't affect the rest
of the network more than a common crash of a single node does.

> C) The flooding effects are significant, regardless. We can end up
> flooding the local mcast queue so much, that in one test, on one
> chipset, I had 11 minutes of queuing there. (no kidding)

Of course, I'm not saying the network flooding is not significant. WiFi
networks are weak by definition, you can saturate the channel and
nothing works. But in a (lets say) 100 nodes network, if the routing
protocol is secured in the way bmx7 is, a single attacker can only
disturb its collision domain. That makes a huge difference with other
routing protocols where a single attacker can collapse all the 100 nodes
by sending malicious routing packets.

In any case I don't want to enter in a discussion about comparing
routing protocols. My only thought here was "it would be interesting to
see if the security extensions of bmx7 are able to mitigate this kind of
attacks". So IMO it would be an interesting topic for the next
BattleMesh: "network attacks VS routing protocols".

Cheers.

> ...
> 
> 
> [1]  https://www.youtube.com/watch?v=Rb-UnHDw02o&t=242s
> [2]  https://arxiv.org/abs/1703.00064
> _______________________________________________
> Battlemesh mailing list
> Battlemesh at ml.ninux.org
> http://ml.ninux.org/mailman/listinfo/battlemesh
> 

-- 
./p4u

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://ml.ninux.org/pipermail/battlemesh/attachments/20170308/ed351a8e/attachment.pgp>