Length, cost and severity of datacentre outages continue to rise, Uptime Institute research confirms
Despite the very best efforts of datacentre operators the world over to scale back the quantity of downtime their amenities undergo, the severity and monetary influence of server farm outages continue to spiral.
That is in accordance to the fourth annual outage evaluation survey by datacentre resiliency think-tank Uptime Institute, which says outage charges are rising regardless of “strong investment” from operators in applied sciences designed to forestall downtime occasions.
“The overall impact and cost of outages is not shrinking – as might have been hoped – but is, in fact, growing,” mentioned the organisation in its 23-page Annual outage evaluation. “Investment in cloud-based and distributed resiliency may have helped reduce the impact of site-level failures, but it has also introduced error-prone complexity. Better management and staff training would help to reduce these failures.”
The report’s insights are based mostly on an evaluation of publicly out there reviews about datacentre outages, in addition to information accrued by Uptime Institute by its personal business surveys and member suggestions.
It mentioned its findings acknowledge that though datacentres are way more dependable than they used to be, thanks to “decades of innovation, investment and better management”, society’s rising reliance on them means “major failures seem more common”.
It continued: “Despite this, it’s clear from Uptime’s intensive research that outages in 2021 and 2022 continue to happen at a charge that’s not measurably down from earlier years. The proof means that the disruption and prices of outage is, in truth, rising.
“In short, the critical infrastructure industry is struggling to achieve the high standards that customers expect – and that are embodied in service-level agreements.”
Its information revealed that one in 5 organisations reported struggling a “serious” or “severe” outage up to now three years, which constitutes a “slight upward trend in the prevalence of major outages”.
At the identical time, the proportion of outages that cost the affected firm greater than $100,000 has soared in recent times, with greater than 60% of failures now leading to no less than $100,000 in whole losses, which is up markedly from 39% in 2019.
The share of outages that cost upwards of $1m elevated from 11% to 15% over that very same interval.
Also, the size of outages is changing into extra extended, mentioned the report. “The gap between the beginning of a major public outage and full recovery has stretched significantly over the last five years,” it mentioned. “Nearly 30% of these outages in 2021 lasted more than 24 hours – a disturbing increase from just 8% in 2017.”
Power provide points have historically been the most typical trigger of datacentre outages, however Uptime Institute predicted in its 2021 report that networking points are set to develop into the most typical supply of server farm downtime occasions.
The 2022 report backs this view, and mentioned outages are rising attributed to community, software program and programs points, as the size and complexity of the digital infrastructure underpinning enterprise cloud deployments will increase.
“The increasing use of cloud services has changed the characteristics of outages in recent years,” mentioned the report. “Failures are extra doubtless to be due to software program, programs or configuration errors – a mirrored image of the rising complexity of the IT and related networking.
“These outages are also more likely to affect many IT services and organisations, reflecting system interdependency and the concentration of customers using single providers, often in single availability zones.”
Uptime Institute Intelligence founding member and government director Andy Lawrence, who co-authored the report, mentioned the state of affairs will enhance in time, however for now, outages will persist.
On this level, the organisation predicts – based mostly on previous public datacentre downtime information – that there will likely be no less than 20 severe, high-profile IT downtime incidents worldwide annually.
“In time, both the technology and operational practices will improve,” mentioned Lawrence. “But at present, outages remain a top concern for customers, investors and regulators. Operators will be best able to meet the challenge with rigorous staff training and operational procedures to mitigate the human error behind many of these failures.”