Top Mistakes When It Comes To Data Centre Maintenance Pt 1

by | Jun 14, 2023 | Articles, Maintenance & Management

‘What are the top mistakes you see when it comes to Data Centre maintenance?’ is a question I have been asked a lot, is researched a lot, and is often talked about anecdotally.

Having worked in the data centre industry for over 10 years, dealing with aspects of design, build, maintenance and operational management, my views are based on personal experience, things I have seen in facilities we look after, and those we don’t.

I’m not going to promise you will agree with everything I mention, but if you want a considered, honest, and open conversation about data centres, based on direct experience of making sure that facilities we manage run as efficiently and reliably as possible, for as long as possible, then read on.

This is the first part of a two-part article on data centre maintenance mistakes and how to rectify them, with what I consider to be industry best practices based on lessons I have learned over the last decade working in the data centre sector.

In no preferential order, here are the first set of top mistakes I have seen when it comes to Data Centre Maintenance and Management:

FM Providers

Future-tech have been designing, building, and maintaining data centres since 1982, long before anyone knew what data centres really were, before they were talked about in the media, and before the expanding data bubble began to grow at the exponential rate that we now see. Even just 10 years ago there were only 2 or 3 specialists, including Future-tech, based in the UK that really knew what they were doing.

Then, several ‘big companies’ jumped on the bandwagon, and overnight became ‘specialists’ in Data Centre Maintenance and Management.

The reason this model was deemed successful was primarily down to reduced cost. A lot of General Facilities Management (FM) Providers already had a presence on a site where a data centre may have also been housed, so many clients utilised the on-site ‘engineering’ team.

However, when a plumber is sent to isolate a UPS in a flooded data centre (true example), and the issue becomes life or death, you must take a serious look at what you think you are getting vs what you are actually getting. At the end of the day, it’s your money, your data centre, and your appetite for business risk.

A generic non-specialist FM model might work for you, BUT a screen door is still a door, yet you wouldn’t put it on a submarine to replace a hatch just because it’s cheaper… or would you?

Many enterprise data centre operators still follow this model using generalist FM providers because of the perceived ‘value’, but as a veteran of a well-known mission-critical engineering services provider once said to me, ‘do you really want chefs with spanners looking after your business-critical assets?’

Generator Testing

One of my absolute favourites.

You go to all the effort of installing a generator, and then refuse to test the performance when the utility power goes out, by initiating a black building test… even though generators absolutely must work and hold the critical load in the event of a power outage.

So, what’s supposed to happen?

Black tests are normally carried out to test for high availability, performance, business continuity plans, and recovery capabilities in a disaster-like scenario. A black building test is a simulated utility power outage, exactly the scenario that generators are designed to deal with. This testing results in the electrical power to the entire building being shut off, imitating a power outage, and forcing the UPS to hold the critical IT load on batteries for the brief period it takes for the generators to restart and be ready to take the building load.

This is exactly what your data centre is designed to do so why are people so scared to test it?

There are many reasons why, but realistically none of them are valid. If you regularly test this feature of your data centre, you’ll gain more confidence each time you do it. The first time you do a test, particularly if it has not been done in a while, is going to be squeaky bum time, BUT it is better to find out in a controlled scenario rather than at optimum business operation time.  

Black building tests should be frequently carried out as part of your data centre maintenance, to ensure your generators will start as intended and supply power to the systems if the mains power does fail. Simple.

As a data centre goes through its life cycle, upgrades and changes occur within the mechanical and electrical systems, and the IT load to be supported is likely to increase. Whilst generator systems are designed to have a certain amount of redundancy, if they are left untested additional loads can place a strain on the generator systems, to the point where they can fail.

If the generator fails to start or take the load, the risk of causing severe business disruption is high. Most companies will carry out this test outside of normal working hours to minimise any disruptions that may be caused as well as having properly qualified engineers standing by to react to any faults.

Black building testing is one of the most important but least recognised or even known, back-up plans in the UK; it’s recommended that full black building tests are carried out at least once a year. I have clients who do this monthly, as repeated testing inspires trust in the systems to do exactly as they are designed.

With increased testing and the resulting confidence, utility power outrages move from potential catastrophic failure to routine and anticipated events without impact on critical services. Alternatively, just wait for the next power cut and cross your fingers, and hope for the best.

Discover more mistakes in part 2.

….

Richard Stacey is the Director of Operational Infrastructure at Future-tech, an accredited Uptime AOS Specialist who’s been working in data centres for over 10 years.