Search CORE

2,654 research outputs found

Reducing the Effects of Detrimental Instances

Author: Martinez Tony
Smith Michael R.
Publication venue
Publication date: 14/10/2014
Field of study

Not all instances in a data set are equally beneficial for inducing a model of the data. Some instances (such as outliers or noise) can be detrimental. However, at least initially, the instances in a data set are generally considered equally in machine learning algorithms. Many current approaches for handling noisy and detrimental instances make a binary decision about whether an instance is detrimental or not. In this paper, we 1) extend this paradigm by weighting the instances on a continuous scale and 2) present a methodology for measuring how detrimental an instance may be for inducing a model of the data. We call our method of identifying and weighting detrimental instances reduced detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets and 5 learning algorithms and compare RIDL with other weighting and filtering approaches. RDIL is especially useful for learning algorithms where every instance can affect the classification boundary and the training instances are considered individually, such as multilayer perceptrons trained with backpropagation (MLPs). Our results also suggest that a more accurate estimate of which instances are detrimental can have a significant positive impact for handling them.Comment: 6 pages, 5 tables, 2 figures. arXiv admin note: substantial text overlap with arXiv:1403.189

arXiv.org e-Print Archive

Crossref

Over-optimism in bioinformatics: an illustration

Author: Anne-Laure Boulesteix
Arthur Tenenhaus
Korbinian Strimmer
Monika Jelizarow
Vincent Guillemot
Publication venue
Publication date: 03/05/2010
Field of study

In statistical bioinformatics research, different optimization mechanisms potentially lead to "over-optimism" in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the competing methods and, most importantly, of the method’s characteristics. We consider a "promising" new classification algorithm that turns out to yield disappointing results in terms of error rate, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. We quantitatively demonstrate that this disappointing method can artificially seem superior to existing approaches if we "fish for significance”. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should be validated using "fresh" validation data sets

HAL-CentraleSupelec

Open Access LMU

HAL Descartes

The University of Manchester - Institutional Repository

HAL-CEA

HAL-Rennes 1

Support vector machine for functional data classification

Author: Aronszajn
Besse
Biau
Cardot
Cardot
Cristianini
Dauxois
Deville
Evgeniou
Fabrice Rossi
Ferré
Francois
Frank
Hastie
Hastie
Hastie
Hastie
Hoerl
James
Leurgans
Lin
Mallat
Marx
Nathalie Villa
Pezzulli
Ramsay
Ramsay
Rossi
Rossi
Rossi
Rossi
Sandberg
Sandberg
Smola
Smola
Steinwart
Steinwart
Stinchcombe
Vapnik
Vert
Villa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Network measures for protein folding state discrimination

Author: Fariselli Piero
Menichetti Giulia
Remondini Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/12/2015
Field of study

Proteins fold using a two-state or multi-state kinetic mechanisms, but up to now there is not a first-principle model to explain this different behavior. We exploit the network properties of protein structures by introducing novel observables to address the problem of classifying the different types of folding kinetics. These observables display a plain physical meaning, in terms of vibrational modes, possible configurations compatible with the native protein structure, and folding cooperativity. The relevance of these observables is supported by a classification performance up to 90%, even with simple classifiers such as discriminant analysis

arXiv.org e-Print Archive

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova

Institutional Research Information System University of Turin

Differences in intention to use educational RSS feeds between Lebanese and British students: A multi‑group analysis based on the technology acceptance model

Author: Abbasi MS
Scott M
Sharma S
Tarhini A
Publication venue: 'Academic Conferences and Publishing International - ACPIL'
Publication date: 01/01/2015
Field of study

Really Simple Syndication (RSS) offers a means for university students to receive timely updates from virtual learning environments. However, despite its utility, only 21% of home students surveyed at a university in Lebanon claim to have ever used the technology. To investigate whether national culture could be an influence on intention to use RSS, the survey was extended to British students in the UK. Using the Technology Adoption Model (TAM) as a research framework, 437 students responded to a questionnaire containing four constructs: behavioral intention to use; attitude towards benefit; perceived usefulness; and perceived ease of use. Principle components analysis and structural equation modelling were used to explore the psychometric qualities and utility of TAM in both contexts. The results show that adoption was significantly higher, but also modest, in the British context at 36%. Configural and metric invariance were fully supported, while scalar and factorial invariance were partially supported. Further analysis shows significant differences between perceived usefulness and perceived ease of use across the two contexts studied. Therefore, it is recommended that faculty demonstrate to students how educational RSS feeds can be used effectively to increase awareness and emphasize usefulness in both contexts

Falmouth University Research Repository (FURR)

Brunel University Research Archive