Cloud platforms, under the hood, consist of a complex inter-connected stack
of hardware and software components. Each of these components can fail which
may lead to an outage. Our goal is to improve the quality of Cloud services
through early detection of such failures by analyzing resource utilization
metrics. We tested Gated-Recurrent-Unit-based autoencoder with a likelihood
function to detect anomalies in various multi-dimensional time series and
achieved high performance.Comment: Accepted for publication in Proceedings of the IEEE International
Conference on Cloud Computing (CLOUD 2020). Fix dataset descriptio