At approximately 08:30 this morning one of our trusted peers started to experience network issues which translated as sending increased prefixes via our BGP interconnect (Border Gateway Protocol) (500,000 instead of the usual 10,000 routes).
Our automated systems detected this and shut the sessions down, however owing to a configuration timeout issue the sessions then re-established a few minutes later and the same 500,000 routes re-appeared causing a flap.
Engineers quickly spotted this and shut down the offending sessions manually, as well as resolving the timeout issue which means that this sort of thing should not happen in the future as now if a peer trips our “max prefix” limits those sessions will stay down until they are manually cleared.
As a result of the flapping routes, one of our core routers ran into what we assume was a memory problem and became unresponsive. Unfortunately routes that this router was primary for ended up in a stuck state on our route servers and despite the core router going offline (and a backup taking over) a number of routes to servers in IP House remained affixed to the broken router.
By 09:25 engineers were on site and started to investigate the issue which we initially thought was a break in part of our fibre ring. Once this had been ruled out we then set about power cycling the affected router and normal service was restored by approx 09:45
We would like to apologise for any inconvenience caused by this period of instability. Engineers are monitoring the situation and will take further action if necessary today should the network exhibit any further instability.