There are many reasons to perform Chaos Engineering, including reducing the number of high-severity incidents, reducing the risk of downtime and outages, training teams for incident management and disaster recovery, and improving customer satisfaction and trust. According to the State of Chaos Engineering 2021 report, teams that frequently run Chaos Engineering experiments often increase availability to more than 99.9% and decrease their mean time to resolution (MTTR) to under an hour.
Practicing Chaos Engineering lets you detect issues before they can become high-severity incidents. 80% of teams experience 1-10 high severity (SEV0 and SEV1) incidents per month. Catching these early allows you to address them and prevent them from affecting customers. It also lets your engineers spend less time fighting fires, and more time innovating.
Companies lose an average of $336,000 per hour of downtime. For Internet-scale companies like Amazon, that number skyrockets to more than $13 million. And that doesn’t include high traffic events like Black Friday, Cyber Monday, major sales, and new launches.
Chaos Engineering can reproduce situations that teams are likely to face in production, such as unplanned outages and disasters. Running experiments gives your team the chance to practice being on-call and responding to incidents in a safe, controlled way. Help your teams build their muscle memory so that when an incident does occur, they already know exactly what to do.
Customers are more likely to abandon websites that have downtime. Sites that go down experience a permanent abandonment rate of 9%, and sites that perform slowly have a permanent abandonment rate of 28%. If customers can’t trust services, they’ll switch to a competitor. Using Chaos Engineering to improve system resilience doesn’t just help your revenue, but it reduces customer churn.