This article is the seventh in a series of Data Centre Best Practice articles provided by Future-tech Ltd.
In this article we seek to highlight those basic best practices which are essential to ensure that a new project or entire site is truly ready for live operation and customer service delivery. The intention is to prevent handing an unfinished project to the operational delivery and site management teams and in order to avoid reliability problems and stakeholder or customer dissatisfaction. Failure to have all the elements listed below in place may result in direct risk to the operational reliability of the site and may also lead to long term reliability issues. These best practices should involve all site management teams including, but not limited to, facility engineering / operational management and IT hardware management teams.
Project or Building Acceptance
No responsibility for ‘completed’ construction areas should be taken on by the site operational management team(s), without the formal acceptance of the project, area or building according to pre-defined criteria. This should include a formal sign-off and handover process involving the team(s) who will be responsible for site operations and ongoing site reliability. should include the following elements as a minimum:
The handover of an authorised Practical Completion document with all outstanding issues (Snagging List), noted and agreed with a complete and accurate asset list made available to the site operational teams.
A full commissioning programme that has been successfully completed up to and including Integrated Systems testing (IST), with all commissioning records fully updated and made available.
All new systems are fully integrated, properly commissioned and proved to be working correctly in conjunction with existing systems. The handover should include all commissioning records and operational documentation as well as all control system details and maintenance requirements. All new systems should be proved to be functioning correctly and properly maintainable with accompanying maintenance schedules and maintenance procedure documentation.
All hard and soft copies of O&M, H&S manuals and CDM documentation must be up to date, correct and made available in their entirety prior to the site Operational Management team before they take any operational responsibility.
Acceptance testing may be required by a by the operational teams to confirm maintainability and the operational teams should not undertake any management responsibility until they have been properly trained on the new systems and satisfied themselves that these systems are working and able to be properly maintained.
The site management team(s) should have the opportunity to recruit and train staff well before live production operations commence. Ideally the core site operations staff should be present during commissioning.
The following is a list of key documentation should be made available to site management team(s) prior to handover into live operations:
- Up to date and accurate “As-Built” records
- Engineering single line drawings
- A full set of O & M manuals, including SOPs, MOPs, EOPs, escalation procedures etc.
- Comprehensive Commissioning Records
- An up to date and accurate Asset Register
- A documented Planned Maintenance Schedule and a full set of maintenance records
- All documentation required for compliance with statutory regulation (QHSE etc.)
- All documentation required for compliance with voluntary standards and certificates
- Authorisation / Certification and Staff Training records
- A complete roles and responsibilities matrix across all departments
- Customer and supplier contract details, OLAs and SLAs etc.
Reporting is a key element of data centre Operations and Management and should always consider both the type of information required by the business and the audience for the reports produced. There is little point in reporting merely for the sake of it and yet accurate and focussed reporting is essential to monitor and manage operational performance.
The types of reports produced typically include;
1. Internal Data Centre Operations Team Management Reports. These are the most detailed reports and provide Data Centre Operations management with the information needed to both fine tune ongoing site operations according to business requirements and to manage external suppliers and contractors. These reports would normally also include references to capacity and resource utilisation as well as energy, cost and general performance metrics.
2. Senior IT Management Reports. These will typically provide information on how effectively the Data Centre is being run and managed at a high level. Whilst there will be fixed elements to the report; run-rates, response times, failure rates, SLA compliance, costs and energy efficiency etc.
3. Reporting overhead can be reduced by introducing using “Reporting by Exception”. This allows the reduction in detail and resource overhead by proving information on the basis of exception according to a clear and well defined set of metrics and key performance indicators (KPIs). These will typically include noted outstanding risks to site operations, equipment failures, health and safety incidents, supplier performance issues, customer complaints, critical outages or major planned work items etc. These exceptions can usefully be presented in the form of a simplified dashboard format.
Reporting will always form an integral part of the overall framework of site operations and management. With this in mind it is recommended that a reporting schedule is established prior to formal handover and the commencement of live operations. The schedule should establish both the detail to be included and regularity with which reports are delivered. It is likely that reporting will be based on a weekly and monthly schedule with supporting daily bulletins depending on the content required and deemed to be appropriate. The agreed schedule should always consider the resource overhead involved in creating and disseminating these reports.
It is also suggested that regular Service Review Meetings are held on a monthly basis to examine the overall level of service across the data centre site(s). These should include reference to both the Delivery of Service to customers or internal stakeholders as well as external suppliers or contractors as measured against contractual requirements and agreed SLAs.
Future-tech have been designing, building and managing business-critical data centres since 1982. The experience gained in being involved in the data centre sector from the outset has resulted in Future-tech sites achieved 99.999% uptime during 35+ years of operation. Future-tech has a team of experienced, skilled and highly trained in-house Data Centre Engineers capable of properly maintaining and operating business critical data centre sites of all sizes. For more details please contact Richard Stacey on 0845 900 0127 or at email@example.com