4 research outputs found
Diffusion-based Time Series Data Imputation for Microsoft 365
Reliability is extremely important for large-scale cloud systems like
Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten
service reliability, resulting in online service interruptions and economic
loss. Existing works focus on predicting cloud failures and proactively taking
action before failures happen. However, they suffer from poor data quality like
data missing in model training and prediction, which limits the performance. In
this paper, we focus on enhancing data quality through data imputation by the
proposed Diffusion+, a sample-efficient diffusion model, to impute the missing
data efficiently based on the observed data. Our experiments and application
practice show that our model contributes to improving the performance of the
downstream failure prediction task
Interpretable Outlier Summarization
Outlier detection is critical in real applications to prevent financial
fraud, defend network intrusions, or detecting imminent device failures. To
reduce the human effort in evaluating outlier detection results and effectively
turn the outliers into actionable insights, the users often expect a system to
automatically produce interpretable summarizations of subgroups of outlier
detection results. Unfortunately, to date no such systems exist. To fill this
gap, we propose STAIR which learns a compact set of human understandable rules
to summarize and explain the anomaly detection results. Rather than use the
classical decision tree algorithms to produce these rules, STAIR proposes a new
optimization objective to produce a small number of rules with least
complexity, hence strong interpretability, to accurately summarize the
detection results. The learning algorithm of STAIR produces a rule set by
iteratively splitting the large rules and is optimal in maximizing this
objective in each iteration. Moreover, to effectively handle high dimensional,
highly complex data sets which are hard to summarize with simple rules, we
propose a localized STAIR approach, called L-STAIR. Taking data locality into
consideration, it simultaneously partitions data and learns a set of localized
rules for each partition. Our experimental study on many outlier benchmark
datasets shows that STAIR significantly reduces the complexity of the rules
required to summarize the outlier detection results, thus more amenable for
humans to understand and evaluate, compared to the decision tree methods
PANDA: Human-in-the-Loop Anomaly Detection and Explanation
International audienceThe paper addresses the tasks of anomaly detection and explanation simultaneously, in the human-in-the-loop paradigm integrating the end-user expertise: it first proposes to exploit two complementary data representations to identify anomalies, namely the description induced by the raw features and the description induced by a user-defined vocabulary. These representations respectively lead to identify so-called data-driven and knowledge-driven anomalies. The paper then proposes to confront these two sets of instances so as to improve the detection step and to dispose of tools towards anomaly explanations. It distinguishes and discusses three cases, underlining how the two description spaces can benefit from one another, in terms of accuracy and interpretability
Human-in-the-loop Outlier Detection
Outlier detection is critical to a large number of applications from finance fraud detection to health care. Although numerous approaches have been proposed to automatically detect outliers, such outliers detected based on statistical rarity do not necessarily correspond to the true outliers to the interest of applications. In this work, we propose a human-in-the-loop outlier detection approach HOD that effectively leverages human intelligence to discover the true outliers. There are two main challenges in HOD. The first is to design human-friendly questions such that humans can easily understand the questions even if humans know nothing about the outlier detection techniques. The second is to minimize the number of questions. To address the first challenge, we design a clustering-based method to effectively discover a small number of objects that are unlikely to be outliers (aka, inliers) and yet effectively represent the typical characteristics of the given dataset. HOD then leverages this set of inliers (called context inliers) to help humans understand the context in which the outliers occur. This ensures humans are able to easily identify the true outliers from the outlier candidates produced by the machine-based outlier detection techniques. To address the second challenge, we propose a bipartite graph-based question selection strategy that is theoretically proven to be able to minimize the number of questions needed to cover all outlier candidates. Our experimental results on real data sets show that HOD significantly outperforms the state-of-the-art methods on both human efforts and the quality of the discovered outliers