3,282 research outputs found
Generalization bounds for averaged classifiers
We study a simple learning algorithm for binary classification. Instead of
predicting with the best hypothesis in the hypothesis class, that is, the
hypothesis that minimizes the training error, our algorithm predicts with a
weighted average of all hypotheses, weighted exponentially with respect to
their training error. We show that the prediction of this algorithm is much
more stable than the prediction of an algorithm that predicts with the best
hypothesis. By allowing the algorithm to abstain from predicting on some
examples, we show that the predictions it makes when it does not abstain are
very reliable. Finally, we show that the probability that the algorithm
abstains is comparable to the generalization error of the best hypothesis in
the class.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005
CIFAR-10: KNN-based Ensemble of Classifiers
In this paper, we study the performance of different classifiers on the
CIFAR-10 dataset, and build an ensemble of classifiers to reach a better
performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and
Convolutional Neural Network (CNN), on some classes, are mutually exclusive,
thus yield in higher accuracy when combined. We reduce KNN overfitting using
Principal Component Analysis (PCA), and ensemble it with a CNN to increase its
accuracy. Our approach improves our best CNN model from 93.33% to 94.03%
Automated supervised classification of variable stars I. Methodology
The fast classification of new variable stars is an important step in making
them available for further research. Selection of science targets from large
databases is much more efficient if they have been classified first. Defining
the classes in terms of physical parameters is also important to get an
unbiased statistical view on the variability mechanisms and the borders of
instability strips. Our goal is twofold: provide an overview of the stellar
variability classes that are presently known, in terms of some relevant stellar
parameters; use the class descriptions obtained as the basis for an automated
`supervised classification' of large databases. Such automated classification
will compare and assign new objects to a set of pre-defined variability
training classes. For every variability class, a literature search was
performed to find as many well-known member stars as possible, or a
considerable subset if too many were present. Next, we searched on-line and
private databases for their light curves in the visible band and performed
period analysis and harmonic fitting. The derived light curve parameters are
used to describe the classes and define the training classifiers. We compared
the performance of different classifiers in terms of percentage of correct
identification, of confusion among classes and of computation time. We describe
how well the classes can be separated using the proposed set of parameters and
how future improvements can be made, based on new large databases such as the
light curves to be assembled by the CoRoT and Kepler space missions.Comment: This paper has been accepted for publication in Astronomy and
Astrophysics (reference AA/2007/7638) Number of pages: 27 Number of figures:
1
- …