During the previously mentioned emergency maintenance to investigate and resolve issues with data errors on a line card in one of our border routers, affecting cross site links between Harbour Exchange and TeleHouse North there was an outage caused by an operational misconfiguration (human error).
The outage lasted for approximately 45 minutes and started at 21:51 UTC.
During the maintenance to isolate a faulty line card, one of our senior network engineers working on the problem mistyped a command sequence resulting in the wrong interface being disabled on one of the border routers. This resulted in both the primary interface (which was being isolated) on the Hex router and the secondary interface (which should have become the primary) on the TeleHouse router being disabled.
As a result there was a situation often referred to as a “Split Brain” where both sites were live, however the core networks became split resulting in no routes between the Telehouse and Hex networks. Traffic routes automatically across the network using OSPF and BGP however certain aspects of our network currently expect layer2 communication between all three London sites and with a split brain the sites are no longer able to see each other.
We are looking into alternative ways to provision the core network to minimise any similar issues, and also working to improve interface labelling schemes to try to avoid a similar event in the future.
On behalf of FidoNet (and as the engineer who made the typing error) I would like to apologise for any inconvenience this outage has caused. We take our commitment to service and performance seriously, and want to assure customers that we are taking every step to ensure this sort of error does not happen again in the future.
Jon Morby
Managing Director