Data Discovery and Anomaly Detection Using Atypicality: Theory

Clayton Yates (572584); Jason White (146854); Jennifer Myers (4241683); Kaixian Yu (2836718); Karin Vallega (4241680); Qing-Xiang Sang (3461384)

research

Data Discovery and Anomaly Detection Using Atypicality: Theory

Authors: Clayton Yates (572584)
Jason White (146854)
Jennifer Myers (4241683)
Kaixian Yu (2836718)
Karin Vallega (4241680)
Qing-Xiang Sang (3461384)
Publication date: 10 September 2017
Publisher
Doi

Abstract

A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by 'atypical' in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets.Comment: 40 page

Similar works

Full text

Available Versions

FigShare

oai:figshare.com:article/51976...

Last time updated on 12/02/2018