Another 2017 data centre outage highlights the case for forensic engineering

by | Dec 15, 2017 | Articles, Consulting

A UK data centre operator suffered what we hope is the last major outage of 2017. Ensuring it doesn’t happen again will require a thorough root-cause analysis. It seems that despite a large number of high-profile outages to date in 2017, the year isn’t content to finish up without at least one more incident.

It was reported earlier this week that Manchester, UK, hosting and data centre provider UKFast suffered an outage at one of its Manchester facilities after the supply of grid power was apparently interrupted.

However as with most outage incidents it’s difficult to get a clear picture of what really happened and what equipment or processes were to blame.

Rather than speculate, we at Future-tech would suggest that the incident again highlights the importance of undertaking a thorough forensic engineering analysis after any outage.

As Future-tech’s chief executive James Wilman recently explained to industry publication Data Centre Dynamics, an outage is usually down to the interplay of multiple factors rather than one piece of faulty equipment.

“For example, a piece of aging equipment develops a fault but this in itself doesn’t cause an outage as the system has redundancy,” said Wilman.

“Then an operative attempts to isolate the faulty equipment but, due to out of date information or a lack of training/knowledge, incorrectly carries out the bypass procedure and this causes further issues with the result of dropping the critical load.”

To date in 2017 there have already been a number of high-profile outages at airlines, co-location operators, financial services companies and IT services providers. These resulted in reputational damage, as well as direct costs running into the tens of millions, for the businesses concerned and affected.

As we explained in our recent article on Forensic Engineering, we have seen a steady increase in demand for our services and have conducted half a dozen investigations across medium to large sites over the last 12 months.

Future-tech is usually called into sites where there has been an outage to initially establish the root-cause of the downtime. Most engagements also involve suggesting measures to harden a site, or specific infrastructure equipment, to prevent repeat events.

If your data centre has experienced an outage or, perhaps more importantly, if you wish to proactively assess a data centre’s resilience, design and infrastructure before a major outage occurs contact us on info@future-tech.co.uk for a confidential discussion with one of our Design Engineer or Consultants.