Search CORE

2,916 research outputs found

Highly comparative feature-based time-series classification

Author: Fulcher Ben D.
Jones Nick S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/05/2014
Field of study

A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation

arXiv.org e-Print Archive

CiteSeerX

Classifier selection with permutation tests

Author: Arias Vicente Marta
Arratia Quesada Argimiro Alejandro
Duarte López Ariel
Publication venue: 'IOS Press'
Publication date: 01/01/2017
Field of study

This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Basics of Feature Selection and Statistical Learning for High Energy Physics

Author: Vossen Anselm
Publication venue
Publication date: 16/03/2008
Field of study

This document introduces basics in data preparation, feature selection and learning basics for high energy physics tasks. The emphasis is on feature selection by principal component analysis, information gain and significance measures for features. As examples for basic statistical learning algorithms, the maximum a posteriori and maximum likelihood classifiers are shown. Furthermore, a simple rule based classification as a means for automated cut finding is introduced. Finally two toolboxes for the application of statistical learning techniques are introduced.Comment: 12 pages, 8 figures. Part of the proceedings of the Track 'Computational Intelligence for HEP Data Analysis' at iCSC 200

arXiv.org e-Print Archive

CERN Document Server

Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

Author: AP Dawid
C Genest
DH Wolpert
ED Sontag
F Pedregosa
GB Giannakis
I Zezula
J Kittler
J Kittler
L Breiman
L Xu
LK Hansen
M Wozniak
OP Faugeras
S Deerwester
TK Ho
V Tresp
Y Freund
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2019
Field of study

We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

UCL Discovery

Hal-Diderot