5,674 research outputs found
Local Subspace-Based Outlier Detection using Global Neighbourhoods
Outlier detection in high-dimensional data is a challenging yet important
task, as it has applications in, e.g., fraud detection and quality control.
State-of-the-art density-based algorithms perform well because they 1) take the
local neighbourhoods of data points into account and 2) consider feature
subspaces. In highly complex and high-dimensional data, however, existing
methods are likely to overlook important outliers because they do not
explicitly take into account that the data is often a mixture distribution of
multiple components.
We therefore introduce GLOSS, an algorithm that performs local subspace
outlier detection using global neighbourhoods. Experiments on synthetic data
demonstrate that GLOSS more accurately detects local outliers in mixed data
than its competitors. Moreover, experiments on real-world data show that our
approach identifies relevant outliers overlooked by existing methods,
confirming that one should keep an eye on the global perspective even when
doing local outlier detection.Comment: Short version accepted at IEEE BigData 201
On Achieving Diversity in the Presence of Outliers in Participatory Camera Sensor Networks
This paper addresses the problem of collection and
delivery of a representative subset of pictures, in participatory camera networks, to maximize coverage when a significant portion of the pictures may be redundant or irrelevant. Consider, for example, a rescue mission where volunteers and survivors of a large-scale disaster scout a wide area to capture pictures of
damage in distressed neighborhoods, using handheld cameras, and report them to a rescue station. In this participatory camera network, a significant amount of pictures may be redundant (i.e., similar pictures may be reported by many) or irrelevant (i.e., may
not document an event of interest). Given this pool of pictures, we aim to build a protocol to store and deliver a smaller subset of pictures, among all those taken, that minimizes redundancy and eliminates irrelevant objects and outliers. While previous work addressed removal of redundancy alone, doing so in the presence of outliers is tricky, because outliers, by their very nature, are different from other objects, causing redundancy minimizing algorithms to favor their inclusion, which is at odds with the goal of finding a representative subset. To eliminate both outliers and redundancy at the same time, two seemingly opposite objectives must be met together. The contribution of this
paper lies in a new prioritization technique (and its in-network
implementation) that minimizes redundancy among delivered
pictures, while also reducing outliers.unpublishedis peer reviewe
Rank Based Anomaly Detection Algorithms
Anomaly or outlier detection problems are of considerable importance, arising frequently in diverse real-world applications such as finance and cyber-security. Several algorithms have been formulated for such problems, usually based on formulating a problem-dependent heuristic or distance metric. This dissertation proposes anomaly detection algorithms that exploit the notion of ``rank, expressing relative outlierness of different points in the relevant space, and exploiting asymmetry in nearest neighbor relations between points: a data point is ``more anomalous if it is not the nearest neighbor of its nearest neighbors. Although rank is computed using distance, it is a more robust and higher level abstraction that is particularly helpful in problems characterized by significant variations of data point density, when distance alone is inadequate.
We begin by proposing a rank-based outlier detection algorithm, and then discuss how this may be extended by also considering clustering-based approaches. We show that the use of rank significantly improves anomaly detection performance in a broad range of problems.
We then consider the problem of identifying the most anomalous among a set of time series, e.g., the stock price of a company that exhibits significantly different behavior than its peer group of other companies. In such problems, different characteristics of time series are captured by different metrics, and we show that the best performance is obtained by combining several such metrics, along with the use of rank-based algorithms for anomaly detection.
In practical scenarios, it is of interest to identify when a time series begins to diverge from the behavior of its peer group. We address this problem as well, using an online version of the anomaly detection algorithm developed earlier.
Finally, we address the task of detecting the occurrence of anomalous sub-sequences within a single time series. This is accomplished by refining the multiple-distance combination approach, which succeeds when other algorithms (based on a single distance measure) fail.
The algorithms developed in this dissertation can be applied in a large variety of application areas, and can assist in solving many practical problems
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Plane-extraction from depth-data using a Gaussian mixture regression model
We propose a novel algorithm for unsupervised extraction of piecewise planar
models from depth-data. Among other applications, such models are a good way of
enabling autonomous agents (robots, cars, drones, etc.) to effectively perceive
their surroundings and to navigate in three dimensions. We propose to do this
by fitting the data with a piecewise-linear Gaussian mixture regression model
whose components are skewed over planes, making them flat in appearance rather
than being ellipsoidal, by embedding an outlier-trimming process that is
formally incorporated into the proposed expectation-maximization algorithm, and
by selectively fusing contiguous, coplanar components. Part of our motivation
is an attempt to estimate more accurate plane-extraction by allowing each model
component to make use of all available data through probabilistic clustering.
The algorithm is thoroughly evaluated against a standard benchmark and is shown
to rank among the best of the existing state-of-the-art methods.Comment: 11 pages, 2 figures, 1 tabl
- …