5,459 research outputs found
Supervised Classification: Quite a Brief Overview
The original problem of supervised classification considers the task of
automatically assigning objects to their respective classes on the basis of
numerical measurements derived from these objects. Classifiers are the tools
that implement the actual functional mapping from these measurements---also
called features or inputs---to the so-called class label---or output. The
fields of pattern recognition and machine learning study ways of constructing
such classifiers. The main idea behind supervised methods is that of learning
from examples: given a number of example input-output relations, to what extent
can the general mapping be learned that takes any new and unseen feature vector
to its correct class? This chapter provides a basic introduction to the
underlying ideas of how to come to a supervised classification problem. In
addition, it provides an overview of some specific classification techniques,
delves into the issues of object representation and classifier evaluation, and
(very) briefly covers some variations on the basic supervised classification
task that may also be of interest to the practitioner
A recurrent neural network for classification of unevenly sampled variable stars
Astronomical surveys of celestial sources produce streams of noisy time
series measuring flux versus time ("light curves"). Unlike in many other
physical domains, however, large (and source-specific) temporal gaps in data
arise naturally due to intranight cadence choices as well as diurnal and
seasonal constraints. With nightly observations of millions of variable stars
and transients from upcoming surveys, efficient and accurate discovery and
classification techniques on noisy, irregularly sampled data must be employed
with minimal human-in-the-loop involvement. Machine learning for inference
tasks on such data traditionally requires the laborious hand-coding of
domain-specific numerical summaries of raw data ("features"). Here we present a
novel unsupervised autoencoding recurrent neural network (RNN) that makes
explicit use of sampling times and known heteroskedastic noise properties. When
trained on optical variable star catalogs, this network produces supervised
classification models that rival other best-in-class approaches. We find that
autoencoded features learned on one time-domain survey perform nearly as well
when applied to another survey. These networks can continue to learn from new
unlabeled observations and may be used in other unsupervised tasks such as
forecasting and anomaly detection.Comment: 23 pages, 14 figures. The published version is at Nature Astronomy
(https://www.nature.com/articles/s41550-017-0321-z). Source code for models,
experiments, and figures at
https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper (Zenodo Code
DOI: 10.5281/zenodo.1045560
A Meta-Learning Approach to One-Step Active Learning
We consider the problem of learning when obtaining the training labels is
costly, which is usually tackled in the literature using active-learning
techniques. These approaches provide strategies to choose the examples to label
before or during training. These strategies are usually based on heuristics or
even theoretical measures, but are not learned as they are directly used during
training. We design a model which aims at \textit{learning active-learning
strategies} using a meta-learning setting. More specifically, we consider a
pool-based setting, where the system observes all the examples of the dataset
of a problem and has to choose the subset of examples to label in a single
shot. Experiments show encouraging results
A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases
Machine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for helping in a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Meansinitialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests.Peer ReviewedPostprint (published version
- …