Search CORE

29,690 research outputs found

Evaluation methods and decision theory for classification of streaming data with temporal dependence

Author: Bifet Albert
Holmes Geoffrey
Pfahringer Bernhard
Read Jesse
Žliobaitė Indrė
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over time, and models that update themselves during operation are becoming the state-of-the-art. This paper formalizes a learning and evaluation scheme of such predictive models. We theoretically analyze evaluation of classifiers on streaming data with temporal dependence. Our findings suggest that the commonly accepted data stream classification measures, such as classification accuracy and Kappa statistic, fail to diagnose cases of poor performance when temporal dependence is present, therefore they should not be used as sole performance indicators. Moreover, classification accuracy can be misleading if used as a proxy for evaluating change detectors with datasets that have temporal dependence. We formulate the decision theory for streaming data classification with temporal dependence and develop a new evaluation methodology for data stream classification that takes temporal dependence into account. We propose a combined measure for classification performance, that takes into account temporal dependence, and we recommend using it as the main performance measure in classification of streaming data

Crossref

Research Commons@Waikato

Feature and Variable Selection in Classification

Author: Karper Aaron
Publication venue
Publication date: 10/02/2014
Field of study

The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not lend themselves to interpretable results, and the CPU and memory resources necessary to run on high-dimensional datasets severly limit the applications of the approaches. Variable and feature selection aim to remedy this by finding a subset of features that in some way captures the information provided best. In this paper we present the general methodology and highlight some specific approaches.Comment: Part of master seminar in document analysis held by Marcus Eichenberger-Liwick

arXiv.org e-Print Archive

CiteSeerX

Exploring Two Novel Features for EEG-based Brain-Computer Interfaces: Multifractal Cumulants and Predictive Complexity

Author: Brodu Nicolas
Lotte Fabien
Lécuyer Anatole
Publication venue
Publication date: 18/09/2010
Field of study

In this paper, we introduce two new features for the design of electroencephalography (EEG) based Brain-Computer Interfaces (BCI): one feature based on multifractal cumulants, and one feature based on the predictive complexity of the EEG time series. The multifractal cumulants feature measures the signal regularity, while the predictive complexity measures the difficulty to predict the future of the signal based on its past, hence a degree of how complex it is. We have conducted an evaluation of the performance of these two novel features on EEG data corresponding to motor-imagery. We also compared them to the most successful features used in the BCI field, namely the Band-Power features. We evaluated these three kinds of features and their combinations on EEG signals from 13 subjects. Results obtained show that our novel features can lead to BCI designs with improved classification performance, notably when using and combining the three kinds of feature (band-power, multifractal cumulants, predictive complexity) together.Comment: Updated with more subjects. Separated out the band-power comparisons in a companion article after reviewer feedback. Source code and companion article are available at http://nicolas.brodu.numerimoire.net/en/recherche/publication

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Author: A Ivshina
Anne-Claire Haury
C Ambroise
C Fan
C Lai
C Sotiriou
C Sotiriou
F Reyal
G Abraham
H Zou
I Guyon
I Guyon
J Bi
J Mairal
J Wang
Jean-Philippe Vert
JPA Ioannidis
L Ein-Dor
L Ein-Dor
M Dai
Muy-Teck Teh
N Meinshausen
P Wirapati
Pierre Gestraud
R Kohavi
R Shen
R Simon
R Tibshirani
RA Irizarry
S Michiels
T Abeel
T Barrett
T Iwamoto
W Shi
Y Benjamini
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/06/2011
Field of study

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL-MINES ParisTech