8,130 research outputs found
DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Recently, the introduction of the generative adversarial network (GAN) and
its variants has enabled the generation of realistic synthetic samples, which
has been used for enlarging training sets. Previous work primarily focused on
data augmentation for semi-supervised and supervised tasks. In this paper, we
instead focus on unsupervised anomaly detection and propose a novel generative
data augmentation framework optimized for this task. In particular, we propose
to oversample infrequent normal samples - normal samples that occur with small
probability, e.g., rare normal events. We show that these samples are
responsible for false positives in anomaly detection. However, oversampling of
infrequent normal samples is challenging for real-world high-dimensional data
with multimodal distributions. To address this challenge, we propose to use a
GAN variant known as the adversarial autoencoder (AAE) to transform the
high-dimensional multimodal data distributions into low-dimensional unimodal
latent distributions with well-defined tail probability. Then, we
systematically oversample at the `edge' of the latent distributions to increase
the density of infrequent normal samples. We show that our oversampling
pipeline is a unified one: it is generally applicable to datasets with
different complex data distributions. To the best of our knowledge, our method
is the first data augmentation technique focused on improving performance in
unsupervised anomaly detection. We validate our method by demonstrating
consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International
Conference on Data Mining
Reducing the Effects of Detrimental Instances
Not all instances in a data set are equally beneficial for inducing a model
of the data. Some instances (such as outliers or noise) can be detrimental.
However, at least initially, the instances in a data set are generally
considered equally in machine learning algorithms. Many current approaches for
handling noisy and detrimental instances make a binary decision about whether
an instance is detrimental or not. In this paper, we 1) extend this paradigm by
weighting the instances on a continuous scale and 2) present a methodology for
measuring how detrimental an instance may be for inducing a model of the data.
We call our method of identifying and weighting detrimental instances reduced
detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets
and 5 learning algorithms and compare RIDL with other weighting and filtering
approaches. RDIL is especially useful for learning algorithms where every
instance can affect the classification boundary and the training instances are
considered individually, such as multilayer perceptrons trained with
backpropagation (MLPs). Our results also suggest that a more accurate estimate
of which instances are detrimental can have a significant positive impact for
handling them.Comment: 6 pages, 5 tables, 2 figures. arXiv admin note: substantial text
overlap with arXiv:1403.189
Unsupervised spectral sub-feature learning for hyperspectral image classification
Spectral pixel classification is one of the principal techniques used in hyperspectral image (HSI) analysis. In this article, we propose an unsupervised feature learning method for classification of hyperspectral images. The proposed method learns a dictionary of sub-feature basis representations from the spectral domain, which allows effective use of the correlated spectral data. The learned dictionary is then used in encoding convolutional samples from the hyperspectral input pixels to an expanded but sparse feature space. Expanded hyperspectral feature representations enable linear separation between object classes present in an image. To evaluate the proposed method, we performed experiments on several commonly used HSI data sets acquired at different locations and by different sensors. Our experimental results show that the proposed method outperforms other pixel-wise classification methods that make use of unsupervised feature extraction approaches. Additionally, even though our approach does not use any prior knowledge, or labelled training data to learn features, it yields either advantageous, or comparable, results in terms of classification accuracy with respect to recent semi-supervised methods
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
- …