Search CORE

70,328 research outputs found

Supervised Classification: Quite a Brief Overview

Author: Aizerman
Ben-David
Besag
Beygelzimer
Bishop
Boser
Bottou
Bradley
Braga-Neto
Breiman
Breiman
Carbonneau
Chandola
Chapelle
Cheplygina
Chow
Christianini
Cohen
Cohn
Cortes
Cortes
Cover
Devroye
Dietterich
Dietterich
Dubuisson
Duda
Duda
Duin
Duin
Duin
Duin
Dwork
Dwork
Efron
Efron
Efron
Fanelli
Fawcett
Fedorov
Fisher
Fix
Freund
Fu
Galar
Geman
Girosi
Guyon
Hand
Hand
Hastie
Hinton
Ho
Ho
Hoerl
Hoffgen
Ioannidis
Isaksson
Jahrer
Jain
Jain
Kahneman
Krijthe
Kuncheva
Lachenbruch
Lafferty
Landgrebe
Langley
Lavrač
Leek
Levine
Li
Li
Li
Li
Little
Loog
Loog
Loog
Loog
Loog
Markou
Maron
McLachlan
Minka
Moonesinghe
Nair
Niemeijer
Nissen
Pan
Poggio
Polikar
Provost
Pękalska
Pękalska
Pękalska
Quinlan
Quiñonero-Candela
Rasmussen
Ripley
Rosenblatt
Rubinstein
Schaffer
Schiavo
Schmidhuber
Schölkopf
Schölkopf
Schölkopf
Settles
Shrivastava
Smola
Suykens
Tax
Tibshirani
Vapnik
Wahba
Wahba
Wahba
Wald
White
Wolpert
Wolpert
Wolpert
Yang
Zhou
Zhu
Publication venue
Publication date: 25/10/2017
Field of study

The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements---also called features or inputs---to the so-called class label---or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class? This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner

arXiv.org e-Print Archive

Crossref

Optimal cross-validation in density estimation with the $L^2$ -loss

Author: Celisse Alain
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/10/2014
Field of study

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-

p

-out CV procedure (Lpo), where

p

denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon

V

-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with

p=1

, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size

n

, optimality is achieved for

p

large enough [with

p/n=o(1)

] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as

p/n

is conveniently related to the rate of convergence of the best estimator in the collection: (i)

p/n\to1

n\to+\infty

with a parametric rate, and (ii)

p/n=o(1)

with some nonparametric estimators. These theoretical results are validated by simulation experiments.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Author: Arien Crellin-Quick
Bailey
Ball
Barning
Blockeel
Blomme
Breiman
Breiman
Burman
Butler
Cesa-Bianchi
Cheeseman
Covey
Dan L. Starr
Doering
Eyer
Eyer
Eyer
Eyer
Flores
Freund
Friedman
Hastie
Ivezić
John M. Brewer
Joseph W. Richards
Joshua S. Bloom
Justin Higgins
Knerr
LSST Science Collaborations .
Maxime Rischard
Millan-Gabet
Moffat
Nathaniel R. Butler
O'Keefe
Perryman
Press
Quinlan
Rachel Kennedy
Rebbapragada
Sesar
Stankov
Suchkov
Udalski
Vapnik
Walkowicz
Wasserman
Willemsen
Woźniak
Wu
Publication venue: 'IOP Publishing'
Publication date: 10/01/2011
Field of study

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Predicting trend reversals using market instantaneous state

Author: Bury Thomas
Publication venue: 'Elsevier BV'
Publication date: 03/03/2014
Field of study

Collective behaviours taking place in financial markets reveal strongly correlated states especially during a crisis period. A natural hypothesis is that trend reversals are also driven by mutual influences between the different stock exchanges. Using a maximum entropy approach, we find coordinated behaviour during trend reversals dominated by the pairwise component. In particular, these events are predicted with high significant accuracy by the ensemble's instantaneous state.Comment: 18 pages, 15 figure

arXiv.org e-Print Archive

DI-fusion