Chaos Engineering

Chaos Engineering

There are hundreds of resources in this expanding field. Many pointers are aggregated in this repo.

Introduction

Thousands of companies of all shapes and sizes, in all verticals, have adopted Chaos Engineering as a core practice to make their products and services safer and more reliable.

“Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” This established that it is a form of experimentation, which sits apart from testing.

The definition highlights that this isn’t about creating chaos. Chaos Engineering is about making the chaos inherent in the system visible.

There are five principles that govern Chaos Engineering:

  1. Build a hypothesis around steady-state behavior
  2. Vary real-world events
  3. Run experiments in production
  4. Automate experiments to run continuously
  5. Minimize blast radius

We will now go into the chaos engineering manifesto that breaks down these principles further.

Netflix and Chaos Engineering

Chaos engineering was born in Netflix and the following presentation goes through the main reasons as well as their journey.

Casey Rosenthal Chaos Engineering presentation - Netflix