2016 in Review; A Tale of Hackers, Human Error and Hardware Failure

It seems 2016 will be remembered as a very bad year, not only for the long list of notable people that passed away but also as a year where cyberattacks made front page news and high profile outages caused significant financial and reputational damage.

January 2016 began with outages for the banks, with RBS and Natwest customers unable to access their online banking accounts . However HSBC seems to have suffered particularly badly with an IT outage causing havoc early in the month followed by a major disruption of their services as a result of a DDoS attack just a few weeks later.

But cyberattacks were not confined to just the banking industry in January. Linode suffered 10 days of hell after a DDoS attack and Lincolnshire county councils systems were down for 4 days following a ransomware attack. The whole public sector took a major bashing in September when the National Audit Office issued a damning report of the UK government's approach to digital security.

Meanwhile in February, Russia and America traded insults about who carried out a cyberattack in Ukraine which cut power to thousands of people across the county. Some companies found themselves in hot water such as Seagate after their own staff decided to sue them in September over an earlier data breach and TalkTalk rounded out the year in October when they were handed a record fine for their data breach.

However, hackers had begun to turn their eye to the Internet of Things in April with news that malware had been found which targets smart home technology and skimping on security cost one Bangladesh bank more than money when a cheap router was comprised allowing hackers to steal money from their customers’accounts. But security experts were puzzled in June when a major spam and malware network inexplicably fell silent.

But hackers were not to blame for all of the outages last year.

Human error played its part when GPS users suffered 12hours of disruption after a satellite decommissioning went wrong in February. Kent County Council was left with egg on its face in the same month when an accidental discharge of the fire suppression systems in their data centre resulted in major downtime for their IT systems.

More human error was to blame later in the month when a single Telstra engineer caused a network wide outage, followed in March when Google engineers patched the wrong router resulting in downtime for the search giants services. ING fell victim in September when a fire suppression test in their data centre also went badly wrong.

It appears that even big names such a BT are not immune to data centre downtime when a faulty router caused problems for phone and broadband customers in February. They were in the firing line again in July when power failures inside their data centres at the Equinix owned TelehouseNorth caused more problems for their customers.

Towards the end of the year, It seemed as though someone had it in for data centres when Global Switch customers suffered downtime after power outages in their data centres and a Fujitsu data centre outage knocked the UKs FCA website offline for 3 days. And SSP decided to turn the lights off for good at its Solihul data centre in September after a power blackout which fried its broker systems in August.

Of course, I have left the biggest to last…

In August, Delta Airlines had an outage that grounded flights and caused disruption across America. This was the latest in a line of incidents affecting airlines including United, JetBlue and Southwest. This incident started early on Monday morning when a critical power control module at a Delta data centre malfunctioned, which caused a surge to the transformer and a loss of power, Delta COO Gil West said in a statement posted to the airline's website. Power was quickly restored, but "critical systems and network equipment didn't switch over to backups," he said, and the systems that did switch over were unstable. The outage cost the airline over $150million and raised serious concerns over legacy systems and unstable fail over services.

Although this is just a sample of some of the bigger events in 2016 in the technology industry, they do highlight 2 key issues.

Firstly, many organisations have DR and backup services but they don’t maintain and test them, so unsurprisingly they don't work properly when they need them. Secondly, while training staff with the technical skills to do their day to day job is seen as a high priority, there is often a lackadaisical approach to risk awareness among staff and emergency response training for technical teams.

Whether it’s a cyberattack or a power failure, if your systems are not ready or your staff are not trained on how to deal with them then two things are certain, failures will happen more frequently and they will have a greater impact on the organisation.

In these times where our society and, more importantly, the economy is increasingly dependent on the digital infrastructure, failure is simply not an option. Therefore, we need to ensure that all systems are properly maintained for maximum reliability, we regularly test them to ensure they function correctly and our people are trained properly so when the worst does happen, they will be able to deal with it. I know this sounds like common sense to many people but the evidence above would indicate that we have failed to learn those lessons.

If we continue to fail to learn these lessons, then the question that you should be asking yourself is ‘do I want to be the one that has to stand up in front of the CEO, or worse the media, and explain why this has happened?’

The Data Centre Blog

Search This Blog

2016 in Review; A Tale of Hackers, Human Error and Hardware Failure