41,912 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Wearable Sensor Data Based Human Activity Recognition using Machine Learning: A new approach
Recent years have witnessed the rapid development of human activity
recognition (HAR) based on wearable sensor data. One can find many practical
applications in this area, especially in the field of health care. Many machine
learning algorithms such as Decision Trees, Support Vector Machine, Naive
Bayes, K-Nearest Neighbor, and Multilayer Perceptron are successfully used in
HAR. Although these methods are fast and easy for implementation, they still
have some limitations due to poor performance in a number of situations. In
this paper, we propose a novel method based on the ensemble learning to boost
the performance of these machine learning methods for HAR
Detection of Dispersed Radio Pulses: A machine learning approach to candidate identification and classification
Searching for extraterrestrial, transient signals in astronomical data sets
is an active area of current research. However, machine learning techniques are
lacking in the literature concerning single-pulse detection. This paper
presents a new, two-stage approach for identifying and classifying dispersed
pulse groups (DPGs) in single-pulse search output. The first stage identified
DPGs and extracted features to characterize them using a new peak
identification algorithm which tracks sloping tendencies around local maxima in
plots of signal-to-noise ratio vs. dispersion measure. The second stage used
supervised machine learning to classify DPGs. We created four benchmark data
sets: one unbalanced and three balanced versions using three different
imbalance treatments.We empirically evaluated 48 classifiers by training and
testing binary and multiclass versions of six machine learning algorithms on
each of the four benchmark versions. While each classifier had advantages and
disadvantages, all classifiers with imbalance treatments had higher recall
values than those with unbalanced data, regardless of the machine learning
algorithm used. Based on the benchmarking results, we selected a subset of
classifiers to classify the full, unlabelled data set of over 1.5 million DPGs
identified in 42,405 observations made by the Green Bank Telescope. Overall,
the classifiers using a multiclass ensemble tree learner in combination with
two oversampling imbalance treatments were the most efficient; they identified
additional known pulsars not in the benchmark data set and provided six
potential discoveries, with significantly less false positives than the other
classifiers.Comment: 13 pages, accepted for publication in MNRAS, ref. MN-15-1713-MJ.R
Machine Learning Classification of SDSS Transient Survey Images
We show that multiple machine learning algorithms can match human performance
in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS)
supernova survey into real objects and artefacts. This is a first step in any
transient science pipeline and is currently still done by humans, but future
surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate
fully machine-enabled solutions. Using features trained from eigenimage
analysis (principal component analysis, PCA) of single-epoch g, r and
i-difference images, we can reach a completeness (recall) of 96 per cent, while
only incorrectly classifying at most 18 per cent of artefacts as real objects,
corresponding to a precision (purity) of 84 per cent. In general, random
forests performed best, followed by the k-nearest neighbour and the SkyNet
artificial neural net algorithms, compared to other methods such as na\"ive
Bayes and kernel support vector machine. Our results show that PCA-based
machine learning can match human success levels and can naturally be extended
by including multiple epochs of data, transient colours and host galaxy
information which should allow for significant further improvements, especially
at low signal-to-noise.Comment: 14 pages, 8 figures. In this version extremely minor adjustments to
the paper were made - e.g. Figure 5 is now easier to view in greyscal
- …