Fido.Net Status Updates

Network Instability 17.02.09

Following on from yesterday’s global outage caused by bogus BGP advertisements, we continued to experience a number of smaller blips as the route which had originally been filtered then re-appeared through smaller peers / transit links

Having applied a vendor patch to our border routers at 21:15 Monday evening we believed initially that the problem had been resolved, however at approximately 01:10 Tuesday morning we experienced a similar failure caused by an overflow in the route decision engine on 2 of our 3 border gateway routers.

We attempted to resolve the problem however the flaps which resulted from these failures, meant a number of routes were damped for a period of time – this became worse when the original problem then re-appeared causing a further crash on the now repeatedly patched routers.

By 07:15 we had applied heavy filtering, additional patches and more aggressive security techniques, had restored all sessions, and had all dampened routes expunged so that we were back to full service.

During the night, whilst we were operating on reduced capacity, we were still advertising and receiving a full-table – however some of these routes were then expunged owing to the nature of the “bad bgp spec traffic” and we appreciate that customers on our hosting / transit networks will have experienced a drop in traffic during this time.

We would like to apologise again to those customers who were affected, and assure you we are doing everything within our power to ensure these issues do not re-occur. Our software vendors have worked extremely quickly to provide patches and to help us to diagnose these issues – and for this we are extremely grateful.

More information on the initial problem can be found at the following links

At this time we believe we have now resolved all issues, however we are aware that through the night service would have been patchy owing to a large number of route flaps as peers all over the world recovered dropped and reloaded sessions.

More information on this problem can be found at the following links

http://www.renesys.com/blog/2009/02/the-flap-heard-around-the-worl.shtml

http://asert.arbornetworks.com/2009/02/ahh-the-ease-of-introducing-global-routing-instability/

http://www.merit.edu/mail.archives/nanog/

Jon

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.