Search CORE

4 research outputs found

Diffusion-based Time Series Data Imputation for Microsoft 365

Author: Björkman Mårten
Li Tianci
Lin Qingwei
Liu Bo
Liu Yudong
Qiao Bo
Rajmohan Saravan
Wang Lu
Wang Paul
Yang Fangkai
Yin Wenjie
Zhang Dongmei
Zhao Pu
Publication venue
Publication date: 03/08/2023
Field of study

Reliability is extremely important for large-scale cloud systems like Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten service reliability, resulting in online service interruptions and economic loss. Existing works focus on predicting cloud failures and proactively taking action before failures happen. However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently based on the observed data. Our experiments and application practice show that our model contributes to improving the performance of the downstream failure prediction task

arXiv.org e-Print Archive

Interpretable Outlier Summarization

Author: Cao Lei
Madden Samuel
Wang Yu
Yan Yizhou
Publication venue
Publication date: 01/09/2023
Field of study

Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures. To reduce the human effort in evaluating outlier detection results and effectively turn the outliers into actionable insights, the users often expect a system to automatically produce interpretable summarizations of subgroups of outlier detection results. Unfortunately, to date no such systems exist. To fill this gap, we propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results. Rather than use the classical decision tree algorithms to produce these rules, STAIR proposes a new optimization objective to produce a small number of rules with least complexity, hence strong interpretability, to accurately summarize the detection results. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets which are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate, compared to the decision tree methods

arXiv.org e-Print Archive

PANDA: Human-in-the-Loop Anomaly Detection and Explanation

Author: Lesot Marie-Jeanne
Pivert Olivier
Smits Grégory
Tchaghe Véronne,
Publication venue: HAL CCSD
Publication date: 11/07/2022
Field of study

International audienceThe paper addresses the tasks of anomaly detection and explanation simultaneously, in the human-in-the-loop paradigm integrating the end-user expertise: it first proposes to exploit two complementary data representations to identify anomalies, namely the description induced by the raw features and the description induced by a user-defined vocabulary. These representations respectively lead to identify so-called data-driven and knowledge-driven anomalies. The paper then proposes to confront these two sets of instances so as to improve the detection step and to dispose of tools towards anomaly explanations. It distinguishes and discusses three cases, underlining how the two description spaces can benefit from one another, in terms of accuracy and interpretability

INRIA a CCSD electronic archive server

Human-in-the-loop Outlier Detection

Author: Chai C.
Ester M.
Hastie T.
He Z.
Krishnan S.
Marcus A.
Steinwart I.
Weiss G. M.
Zhao T.
Zhao T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2020
Field of study

Outlier detection is critical to a large number of applications from finance fraud detection to health care. Although numerous approaches have been proposed to automatically detect outliers, such outliers detected based on statistical rarity do not necessarily correspond to the true outliers to the interest of applications. In this work, we propose a human-in-the-loop outlier detection approach HOD that effectively leverages human intelligence to discover the true outliers. There are two main challenges in HOD. The first is to design human-friendly questions such that humans can easily understand the questions even if humans know nothing about the outlier detection techniques. The second is to minimize the number of questions. To address the first challenge, we design a clustering-based method to effectively discover a small number of objects that are unlikely to be outliers (aka, inliers) and yet effectively represent the typical characteristics of the given dataset. HOD then leverages this set of inliers (called context inliers) to help humans understand the context in which the outliers occur. This ensures humans are able to easily identify the true outliers from the outlier candidates produced by the machine-based outlier detection techniques. To address the second challenge, we propose a bipartite graph-based question selection strategy that is theoretically proven to be able to minimize the number of questions needed to cover all outlier candidates. Our experimental results on real data sets show that HOD significantly outperforms the state-of-the-art methods on both human efforts and the quality of the discovered outliers

DSpace@MIT

Crossref