1,239 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies
Given the superposition of a low-rank matrix plus the product of a known fat
compression matrix times a sparse matrix, the goal of this paper is to
establish deterministic conditions under which exact recovery of the low-rank
and sparse components becomes possible. This fundamental identifiability issue
arises with traffic anomaly detection in backbone networks, and subsumes
compressed sensing as well as the timely low-rank plus sparse matrix recovery
tasks encountered in matrix decomposition problems. Leveraging the ability of
- and nuclear norms to recover sparse and low-rank matrices, a convex
program is formulated to estimate the unknowns. Analysis and simulations
confirm that the said convex program can recover the unknowns for sufficiently
low-rank and sparse enough components, along with a compression matrix
possessing an isometry property when restricted to operate on sparse vectors.
When the low-rank, sparse, and compression matrices are drawn from certain
random ensembles, it is established that exact recovery is possible with high
probability. First-order algorithms are developed to solve the nonsmooth convex
optimization problem with provable iteration complexity guarantees. Insightful
tests with synthetic and real network data corroborate the effectiveness of the
novel approach in unveiling traffic anomalies across flows and time, and its
ability to outperform existing alternatives.Comment: 38 pages, submitted to the IEEE Transactions on Information Theor
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Effective anomaly detection in sensor networks data streams
This paper addresses a major challenge in data mining applications where the full information about the underlying processes, such as sensor networks or large online database, cannot be practically obtained due to physical limitations such as low bandwidth or memory, storage, or computing power. Motivated by the recent theory on direct information sampling called compressed sensing (CS), we propose a framework for detecting anomalies from these largescale data mining applications where the full information is not practically possible to obtain. Exploiting the fact that the intrinsic dimension of the data in these applications are typically small relative to the raw dimension and the fact that compressed sensing is capable of capturing most information with few measurements, our work show that spectral methods that used for volume anomaly detection can be directly applied to the CS data with guarantee on performance. Our theoretical contributions are supported by extensive experimental results on large datasets which show satisfactory performance.<br /
- β¦