398 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Active Authentication using an Autoencoder regularized CNN-based One-Class Classifier
Active authentication refers to the process in which users are unobtrusively
monitored and authenticated continuously throughout their interactions with
mobile devices. Generally, an active authentication problem is modelled as a
one class classification problem due to the unavailability of data from the
impostor users. Normally, the enrolled user is considered as the target class
(genuine) and the unauthorized users are considered as unknown classes
(impostor). We propose a convolutional neural network (CNN) based approach for
one class classification in which a zero centered Gaussian noise and an
autoencoder are used to model the pseudo-negative class and to regularize the
network to learn meaningful feature representations for one class data,
respectively. The overall network is trained using a combination of the
cross-entropy and the reconstruction error losses. A key feature of the
proposed approach is that any pre-trained CNN can be used as the base network
for one class classification. Effectiveness of the proposed framework is
demonstrated using three publically available face-based active authentication
datasets and it is shown that the proposed method achieves superior performance
compared to the traditional one class classification methods. The source code
is available at: github.com/otkupjnoz/oc-acnn.Comment: Accepted and to appear at AFGR 201
Enhanced Industrial Machinery Condition Monitoring Methodology based on Novelty Detection and Multi-Modal Analysis
This paper presents a condition-based monitoring methodology based on novelty detection applied to industrial machinery. The proposed approach includes both, the classical classification of multiple a priori known scenarios, and the innovative detection capability of new operating modes not previously available. The development of condition-based monitoring methodologies considering the isolation capabilities of unexpected scenarios represents, nowadays, a trending topic able to answer the demanding requirements of the future industrial processes monitoring systems. First, the method is based on the temporal segmentation of the available physical magnitudes, and the estimation of a set of time-based statistical features. Then, a double feature reduction stage based on Principal Component Analysis and Linear Discriminant Analysis is applied in order to optimize the classification and novelty detection performances. The posterior combination of a Feed-forward Neural Network and One-Class Support Vector Machine allows the proper interpretation of known and unknown operating conditions. The effectiveness of this novel condition monitoring scheme has been verified by experimental results obtained from an automotive industry machine.Postprint (published version
Cost-Quality Trade-Offs in One-Class Active Learning
Active learning is a paradigm to involve users in a machine learning process. The core idea of active learning is to ask a user to annotate a specific observation to improve the classification performance. One important application of active learning is detecting outliers, i.e., unusual observations that deviate from the regular ones in a data set. Applying active learning for outlier detection in practice requires to design a system that consists of several components: the data, the classifier that discerns between inliers and outliers, the query strategy that selects the observations for feedback collection, and an oracle, e.g., the human expert that annotates the queries. Each of these components and their interplay influences the classification quality. Naturally, there are cost budgets limiting certain parts of the system, e.g., the number of queries one can ask a human. Thus, to configure efficient active learning systems, one must decide on several trade-offs between costs and quality. The existing literature on active learning systems does not provide an overview nor a formal description of the cost-quality trade-offs of active learning. All this makes the configuration of efficient active learning systems in practice difficult.
In this thesis, we study different cost-quality trade-offs that are pivotal for configuring an active learning system for outlier detection. We first provide an overview of the costs of an active learning system. Then, we analyze three important trade-offs and propose ways to model and quantify them. In our first contribution, we study how one can reduce classification training costs by training only on a sample of the data set. We formalize the sampling trade-off between classifier training costs and resulting quality as an optimization problem and propose an efficient algorithm to solve it. Compared to the existing sampling methods in literature, our approach guarantees that a classifier trained on our sample makes the same predictions as if trained on the complete data set. We can therefore reduce the classification training costs without a loss of classification quality. In our second contribution, we investigate how selecting multiple queries allows trading off costs against quality. So-called batch queries reduce classifier training costs because the system only updates the classifier once for each batch. But the annotation of a batch may give redundant information, which reduces the achievable quality with a fixed query budget. We are the first to consider batch queries for outlier detection, a generalization of the more common case to query sequentially. We formalize batch active learning and propose several strategies to construct batches by modeling the expected utility of a batch. In our third contribution, we propose query synthesis for outlier detection. Query synthesis allows to artificially generate queries at any point in the data space without being restricted by a pool of query candidates. We propose a framework to efficiently synthesize queries and develop a novel query strategy to improve the generalization of a classifier beyond a biased data set with active learning. For all contributions, we derive recommendations for the cost-quality trade-offs from formal investigations and empirical studies to facilitate the configuration of robust and efficient active learning systems for outlier detection
- …