Resilience testing, which measures the ability to minimize service
degradation caused by unexpected failures, is crucial for microservice systems.
The current practice for resilience testing relies on manually defining rules
for different microservice systems. Due to the diverse business logic of
microservices, there are no one-size-fits-all microservice resilience testing
rules. As the quantity and dynamic of microservices and failures largely
increase, manual configuration exhibits its scalability and adaptivity issues.
To overcome the two issues, we empirically compare the impacts of common
failures in the resilient and unresilient deployments of a benchmark
microservice system. Our study demonstrates that the resilient deployment can
block the propagation of degradation from system performance metrics (e.g.,
memory usage) to business metrics (e.g., response latency). In this paper, we
propose AVERT, the first AdaptiVE Resilience Testing framework for microservice
systems. AVERT first injects failures into microservices and collects available
monitoring metrics. Then AVERT ranks all the monitoring metrics according to
their contributions to the overall service degradation caused by the injected
failures. Lastly, AVERT produces a resilience index by how much the degradation
in system performance metrics propagates to the degradation in business
metrics. The higher the degradation propagation, the lower the resilience of
the microservice system. We evaluate AVERT on two open-source benchmark
microservice systems. The experimental results show that AVERT can accurately
and efficiently test the resilience of microservice systems