Data Centre Best Practices Article 8 – General Data Centre Operational Best Practice

by | Jun 1, 2020 | Articles, Best Practice

This article is the eighth in a series of Data Centre Best Practice articles provided by Future-tech Ltd.

In this article we highlight general data centres best practices which are considered beneficial to all sites regardless of size or location. These practices are based upon lessons learned by Future-tech during decades of operational experience while delivering high availability services.

The overall aim of data centre operations management is to minimise risk of service interruption and to maximise IT service availability, while allowing the maximum possible utilisation of available site resources at the minimum cost. This article focusses on key tried and tested practices that help to achieve these goals.

General Data Centre Best Practice

Effective communication and co-operation between IT and Facilities Management (FM) / Mechanical & Electrical (M&E) Engineering is essential to service reliability and availability. Having both the IT and facilities teams trained in the basic principles of ITIL Service Delivery and Service Management framework is a good way to achieve the clear and unambiguous communications essential to ensuring robust and reliable data centre operations. This includes reliable and consistent reporting with timely delivery of accurate information from diverse areas of responsibility.

Clear communication fosters a close working relationship within the data centre and helps to establish clearly defined areas of responsibility between the disparate teams involved in operational reliability and consistent service delivery. This is vital to avoid misunderstanding, reduce risk, avoid potential conflict and eliminate dangerous assumptions.

If ITIL is deployed within an organisation facilities and engineering departments should wholeheartedly embrace ITIL language, concepts and goals in order to actively use the framework to apply rigour to IT deployment and management in the data centre. The same is true of corporate wide ISO standards, KPIs and Metrics.

ITIL can be also be used as first step to towards the implementation of a truly integrated IT and facilities management team by using a common language and integrated tools. This can unify site operations and eliminate potential issues related to interdisciplinary misunderstandings or poorly defined responsibilities.

Resource and Capacity Management

It is essential for any data centre operation that the wider business fully understands and endorses the chosen risk profile for the site(s). Understanding the business appetite for risk is an essential element in achieving stakeholder satisfaction and successful operational delivery. It is key that the business has fully understood and defined the appetite for risk for each of the services being delivered from the data centre as well as the true cost of either risk mitigation or acceptance.

It is imperative to talk to the application owners about deployed services and equipment. Application owners drive the demand for IT services and consumption of data centre resources, In far too many cases though, the application owner ‘forgets’ to inform operations when the equipment is no longer in use, or approaching end of life. Conversely anticipated projects or predicted increases in demand are often not passed to those responsible for managing data centre capacities.

Without effective and accurate asset tracking and resource management tools regular power and thermal audits should be performed to ensure that site resources are being efficiently utilised and that capacity is not being wasted. New systems, upgrades, and room changes can have unanticipated consequences if not planned correctly, so it is important to monitor and fully understand airflow, temperature, and other environmental factors. Extremely effective and intuitive tools incorporating expert knowledge are now available to minimise the overhead of performing this ongoing management exercise.

Financial Management

In addition to the above cost control should always be a significant consideration, especially in relation to improved efficiency and the reduction of power and cooling costs. Critical to achieving this is the maintenance of accurate IT and Mechanical & Electrical infrastructure asset registers. (95 -100% accurate), which also include projected lifecycle replacement plans and projected costs. In lieu of effective asset tracking tools and policies registers should be audited at least twice per year to maintain accuracy and contribute to a reliable ‘Single Source of Truth’ upon which disparate systems can rely for well informed decision making or even control where appropriate. The ‘Single Source of Truth’ is likely to be based on federated data sets which might involve multiple systems including those collecting real time information.

Critical operating and capital budgets should be separated from non-critical facilities and not pooled with other buildings or groups of buildings. This should include a documented process to ensure that funding levels are sufficient and available to support the site infrastructure according to business expectations at all times.

Effective Capacity Utilisation

Deployments in data centres should be in accordance with an established a master plan based on predicted capacity utilisation. This will potentially involve different deployment strategies based on specific usage and equipment density models. The effort required in this area can be reduced using intelligent tools to establish the optimum cabinet, cabling, IT, network and storage equipment layouts. These tools are also able to offer automatic provisioning, accurate recording of assets and locations and the creation of work orders to support and co-ordinate both local and remote provisioning activities.

Guidelines and procedures for power and thermal management should become an integral component of daily data centre operations. All elements from temperature and humidity settings to new system and cable deployments should follow well established and understood guidelines and policies that optimise available power utilisation as well as cooling efficiency and minimise airflow obstructions and hot/cold air mixing.

If a data centre contains equipment with significantly differing environmental and cooling requirements (E.G. Tape Storage, Tape Silos, Mainframes, Telecoms Equipment, Batteries etc.), locate this equipment in separate areas with individual environmental controls to avoid compromising the cooling across the entire data centre.

Co-ordinated Management

Prevent any unplanned installations and ensure that all equipment installation is only completed following Change Management approval preceded by detailed space planning and equipment specification. All under floor access should be subject to Change Management approval. Facilities should participate in IT Change Management planning and approval and vice versa. This should be accomplished through an integrated set of IT and Facilities Management (FM) / M&E Engineering Change Management, Incident Management, Capacity planning procedures. IT and FM / M&E Engineering should both be included in technical space capacity planning and sign off for all IT equipment installations.

From the outset IT, FM / M&E Engineering and management personnel should all be involved in the design process to achieve solutions that save energy and meet reliability, performance, cost control and other requirements. The wider teams built from this inclusive engagement should use life-cycle costing as a primary decision-making tool with both IT and FM / M&E Engineering having common goals, objectives and incentives within an aligned management structure.

Standards are Important

Understand and introduce the concepts from genuine global standards such as the ISO/IEC 30134 Series and the emerging ISO/IEC TS 22237 Series. For example, if PUE is being used or reported all measurements and reporting should comply with ISO/IEC 30134-2. Anything other than reporting based on the requirements of this global standardised KPI is not true PUE.

Reference and apply voluntary industry best practices such as the European Code of Conduct for Data Centre Energy Efficiency which is freely available from:

Building and Equipment Maintenance

The purpose of maintenance it to maintain the data centre site in a “like new” condition. This is to both reduce operating risk due to equipment failure and also to ensure that operational efficiency remains high in order to reduce electrical consumption and therefore energy costs.

Effective maintenance starts with a full and comprehensive commissioning programme which is vital to ensure that the site infrastructure functions according to the design specifications at the outset.

Employ predictive (Condition based), as well as preventative maintenance where possible. This practice should employ trend analyses and lifecycle analysis as well as the regular audits highlighted above. Operating in this way can both reduce costs and results in more effective risk reduction.

Future-tech have been designing, building and managing business critical data centres since 1982. The experience gained in being involved in the data centre sector from the outset has resulted in Future-tech sites achieved 99.999% uptime during 35+ years of operation. Future-tech has a team of experienced, skilled and highly trained in-house Data Centre Engineers capable of properly maintaining and operating business critical data centre sites of all sizes. For more details please contact Richard Stacey on 0845 900 0127 or at