Search CORE

5,674 research outputs found

Local Subspace-Based Outlier Detection using Global Neighbourhoods

Author: Bäck Thomas
van Leeuwen Matthijs
van Stein Bas
Publication venue
Publication date: 01/11/2016
Field of study

Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.Comment: Short version accepted at IEEE BigData 201

arXiv.org e-Print Archive

Crossref

On Achieving Diversity in the Presence of Outliers in Participatory Camera Sensor Networks

Author: Abdelzaher Tarek F.
Amin Md Tanvir Al
Govindan Ramesh
Iyengar Arun
Uddin Md Yusuf Sarwar
Publication venue
Publication date: 25/04/2012
Field of study

This paper addresses the problem of collection and delivery of a representative subset of pictures, in participatory camera networks, to maximize coverage when a significant portion of the pictures may be redundant or irrelevant. Consider, for example, a rescue mission where volunteers and survivors of a large-scale disaster scout a wide area to capture pictures of damage in distressed neighborhoods, using handheld cameras, and report them to a rescue station. In this participatory camera network, a significant amount of pictures may be redundant (i.e., similar pictures may be reported by many) or irrelevant (i.e., may not document an event of interest). Given this pool of pictures, we aim to build a protocol to store and deliver a smaller subset of pictures, among all those taken, that minimizes redundancy and eliminates irrelevant objects and outliers. While previous work addressed removal of redundancy alone, doing so in the presence of outliers is tricky, because outliers, by their very nature, are different from other objects, causing redundancy minimizing algorithms to favor their inclusion, which is at odds with the goal of finding a representative subset. To eliminate both outliers and redundancy at the same time, two seemingly opposite objectives must be met together. The contribution of this paper lies in a new prioritization technique (and its in-network implementation) that minimizes redundancy among delivered pictures, while also reducing outliers.unpublishedis peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Rank Based Anomaly Detection Algorithms

Author: Huang Huaming
Publication venue: SURFACE at Syracuse University
Publication date: 01/05/2013
Field of study

Anomaly or outlier detection problems are of considerable importance, arising frequently in diverse real-world applications such as finance and cyber-security. Several algorithms have been formulated for such problems, usually based on formulating a problem-dependent heuristic or distance metric. This dissertation proposes anomaly detection algorithms that exploit the notion of ``rank, expressing relative outlierness of different points in the relevant space, and exploiting asymmetry in nearest neighbor relations between points: a data point is ``more anomalous if it is not the nearest neighbor of its nearest neighbors. Although rank is computed using distance, it is a more robust and higher level abstraction that is particularly helpful in problems characterized by significant variations of data point density, when distance alone is inadequate. We begin by proposing a rank-based outlier detection algorithm, and then discuss how this may be extended by also considering clustering-based approaches. We show that the use of rank significantly improves anomaly detection performance in a broad range of problems. We then consider the problem of identifying the most anomalous among a set of time series, e.g., the stock price of a company that exhibits significantly different behavior than its peer group of other companies. In such problems, different characteristics of time series are captured by different metrics, and we show that the best performance is obtained by combining several such metrics, along with the use of rank-based algorithms for anomaly detection. In practical scenarios, it is of interest to identify when a time series begins to diverge from the behavior of its peer group. We address this problem as well, using an online version of the anomaly detection algorithm developed earlier. Finally, we address the task of detecting the occurrence of anomalous sub-sequences within a single time series. This is accomplished by refining the multiple-distance combination approach, which succeeds when other algorithms (based on a single distance measure) fail. The algorithms developed in this dissertation can be applied in a large variety of application areas, and can assist in solving many practical problems

Syracuse University Research Facility and Collaborative Environment

Contextual Outlier Interpretation

Author: Hu Xia
Liu Ninghao
Shin Donghwa
Publication venue
Publication date: 04/05/2018
Field of study

Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches

arXiv.org e-Print Archive

Crossref

Plane-extraction from depth-data using a Gaussian mixture regression model

Author: Horaud Radu
Marriott Richard T.
Paschevich Alexander
Publication venue: 'Elsevier BV'
Publication date: 30/03/2018
Field of study

We propose a novel algorithm for unsupervised extraction of piecewise planar models from depth-data. Among other applications, such models are a good way of enabling autonomous agents (robots, cars, drones, etc.) to effectively perceive their surroundings and to navigate in three dimensions. We propose to do this by fitting the data with a piecewise-linear Gaussian mixture regression model whose components are skewed over planes, making them flat in appearance rather than being ellipsoidal, by embedding an outlier-trimming process that is formally incorporated into the proposed expectation-maximization algorithm, and by selectively fusing contiguous, coplanar components. Part of our motivation is an attempt to estimate more accurate plane-extraction by allowing each model component to make use of all available data through probabilistic clustering. The algorithm is thoroughly evaluated against a standard benchmark and is shown to rank among the best of the existing state-of-the-art methods.Comment: 11 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server