7,142 research outputs found
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
Computer vision tasks are traditionally defined and evaluated using semantic
categories. However, it is known to the field that semantic classes do not
necessarily correspond to a unique visual class (e.g. inside and outside of a
car). Furthermore, many of the feasible learning techniques at hand cannot
model a visual class which appears consistent to the human eye. These problems
have motivated the use of 1) Unsupervised or supervised clustering as a
preprocessing step to identify the visual subclasses to be used in a
mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other
works model mixture assignment with latent variables which is optimized during
learning 3) Highly non-linear classifiers which are inherently capable of
modelling multi-modal input space but are inefficient at the test time. In this
work, we promote an incremental view over the recognition of semantic classes
with varied appearances. We propose an optimization technique which
incrementally finds maximal visual subclasses in a regularized risk
minimization framework. Our proposed approach unifies the clustering and
classification steps in a single algorithm. The importance of this approach is
its compliance with the classification via the fact that it does not need to
know about the number of clusters, the representation and similarity measures
used in pre-processing clustering methods a priori. Following this approach we
show both qualitatively and quantitatively significant results. We show that
the visual subclasses demonstrate a long tail distribution. Finally, we show
that state of the art object detection methods (e.g. DPM) are unable to use the
tails of this distribution comprising 50\% of the training samples. In fact we
show that DPM performance slightly increases on average by the removal of this
half of the data.Comment: Updated ICCV 2013 submissio
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
Learning with Clustering Structure
We study supervised learning problems using clustering constraints to impose
structure on either features or samples, seeking to help both prediction and
interpretation. The problem of clustering features arises naturally in text
classification for instance, to reduce dimensionality by grouping words
together and identify synonyms. The sample clustering problem on the other
hand, applies to multiclass problems where we are allowed to make multiple
predictions and the performance of the best answer is recorded. We derive a
unified optimization formulation highlighting the common structure of these
problems and produce algorithms whose core iteration complexity amounts to a
k-means clustering step, which can be approximated efficiently. We extend these
results to combine sparsity and clustering constraints, and develop a new
projection algorithm on the set of clustered sparse vectors. We prove
convergence of our algorithms on random instances, based on a union of
subspaces interpretation of the clustering structure. Finally, we test the
robustness of our methods on artificial data sets as well as real data
extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and
sparse clustered case. New projection algorithm on sparse clustered vector
Automated Classification of Periodic Variable Stars detected by the Wide-field Infrared Survey Explorer
We describe a methodology to classify periodic variable stars identified
using photometric time-series measurements constructed from the Wide-field
Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases.
This will assist in the future construction of a WISE Variable Source Database
that assigns variables to specific science classes as constrained by the WISE
observing cadence with statistically meaningful classification probabilities.
We have analyzed the WISE light curves of 8273 variable stars identified in
previous optical variability surveys (MACHO, GCVS, and ASAS) and show that
Fourier decomposition techniques can be extended into the mid-IR to assist with
their classification. Combined with other periodic light-curve features, this
sample is then used to train a machine-learned classifier based on the random
forest (RF) method. Consistent with previous classification studies of variable
stars in general, the RF machine-learned classifier is superior to other
methods in terms of accuracy, robustness against outliers, and relative
immunity to features that carry little or redundant class information. For the
three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae
Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%,
and 84.5% respectively using cross-validation analyses, with 95% confidence
intervals of approximately +/-2%. These accuracies are achieved at purity (or
reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that
achieved in previous automated classification studies of periodic variable
stars.Comment: 48 pages, 17 figures, 1 table, accepted by A
Boundary conditions for the application of machine learning based monitoring systems for supervised anomaly detection in machining
Monitoring systems may contribute increasing the availability of machine tools and detecting process deviations in time. In the past, machine learning has been used to solve a variety of monitoring problems in machining. However, boundary conditions for the assessment of the principal applicability of machine learning approaches for supervised anomaly detection in machining have not been exhaustively described in the literature. In this paper, objectives as well as deficits of literature approaches are identified and influencing factors on the monitoring quality are described. As a result, we derive boundary conditions and discuss challenges for successful implementation of machine learning based monitoring systems for supervised anomaly detection in industrial practice
Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering
This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering,
which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that
the proposed algorithm outperformed conventional methods
Signature extension using transformed cluster statistics and related techniques
There are no author-identified significant results in this report
A balanced approach to the multi-class imbalance problem
The multi-class class-imbalance problem is a subset of supervised machine learning tasks where the classification variable of interest consists of three or more categories with unequal sample sizes. In the fields of manufacturing and business, common machine learning classification tasks such as failure mode, fraud, and threat detection often exhibit class imbalance due to the infrequent occurrence of one or more event states. Though machine learning as a discipline is well established, the study of class imbalance with respect to multi-class learning does not yet have the same deep, rich history. In its current state, the class imbalance literature leverages the use of biased sampling and increasing model complexity to improve predictive performance, and while some have made advances, there is still no standard model evaluation criteria for which to compare their performance. In the presence of substantial multi-class distributional skew, of the model evaluation criteria that can scale beyond the binary case, many become invalid due to their over-emphasis on the majority class observations.
Going a step further, many of the evaluation criteria utilized in practice vary significantly across the class imbalance literature and so far no single measure has been able to galvanize consensus due not only to implementation complexity, but the existence of undesirable properties. Therefore, the focus of this research is to introduce a new performance measure, Class Balance Accuracy, designed specifically for model validation in the presence of multi-class imbalance. This paper begins with the statement of definition for Class Balance Accuracy and provides an intuitive proof for its interpretation as a simultaneous lower bound for the average per class recall and average per class precision. Results from comparison studies show that models chosen by maximizing the training class balance accuracy consistently yield both high overall accuracy and per class recall on the test sets compared to the models chosen by other criteria. Simulation studies were then conducted to highlight specific scenarios where the use of class balance accuracy outperforms model selection based on regular accuracy. The measure is then invoked in two novel applications, one as the maximization criteria in the instance selection biased sampling technique and the other as a model selection tool in a multiple classifier system prediction algorithm. In the case of instance selection, the use of class balance accuracy shows improvement over traditional accuracy in scenarios of multi-class class-imbalance data sets with low separability between the majority and minority classes. Likewise, the use of CBA in the multiple classifier system resulted in improved predictions over state of the art methods such as adaBoost for some of the U.C.I. machine learning repository test data sets. The paper then concludes with a discussion of the climbR package, a repository of functions designed to aid in the model evaluation and prediction of class imbalance machine learning problems
- …