Lineage Driven Fault Injection (LDFI) is a state of the art technique in chaos engineering experiment selection. As SRE's we would like to perform chaos experiments that reveal the bugs that the customers are most likely to hit first. In this talk, we present new improvements to LDFI that orders the experiment suggestions.
In the first the half of the talk we will show introduce LDFI as a technique that can be widely used within an enterprise. We also highlight how ordering is general purpose technique that we can use to encode the peculiarities of a heterogeneous microservices architecture. LDFI can work in an enterprise by harnessing the observability infrastructure to model the redundancy of the system.
Next, we present experiments conducted within eBay using ordered LDFI and some preliminary results. We show examples of services where we discovered bugs, and how carefully controlling the order of experiments allowed LDFI to avoid running unnecessary experiments.
We will discuss open problems and future direction of LDFI.
Key takeaways :
- Understand how LDFI can be integrated in the enterprise by harnessing the observability infrastructure
- Limitations of LDFI w.r.t unordered solutions and why ordering matters for chaos experiments
- Preliminary results of prioritized LDFI and a future direction for the community
No prior knowledge of LDFI is required.