[gremlin.com] Chaos Engineering: the history, principles, and practice

March 9, 2018

With the rise of microservices and distributed cloud architectures, the web has grown increasingly complex. As a result, “random” failures have grown difficult to predict. At the same time, our dependence on these systems has only increased.

These failures cause costly outages for companies. These outages hurt customers when they try to shop, transact business, and get work done. Even brief issues hit company bottom lines, and as a result the cost of downtime is becoming a KPI for many engineering teams. For example, in 2017, 98% of organizations said a single hour of downtime will cost their business over $100,000. One outage can cost a single company millions of dollars. The CEO of British Airways recently explained a technological failure which stranded tens of thousands of British Airways (BA) passengers in May 2017 cost the company 80 million pounds ($102.19 million USD).

Companies need a solution to this challenge because waiting for the next incident to respond is too late. To meet this challenge head on, companies are turning to chaos engineering.

Read the full article at: www.gremlin.com