Search CORE

90,918 research outputs found

PhysicsGP: A Genetic Programming Approach to Event Selection

Author: Cousins
Cranmer
Cranmer
Field
Kishore
Koza
Kyle Cranmer
Luke
R. Sean Bowman
Rumelhart
Scott
Sontag
Vaiciulis
Vapnik
Vapnik
Werbos
Publication venue: 'Elsevier BV'
Publication date: 05/02/2004
Field of study

We present a novel multivariate classification technique based on Genetic Programming. The technique is distinct from Genetic Algorithms and offers several advantages compared to Neural Networks and Support Vector Machines. The technique optimizes a set of human-readable classifiers with respect to some user-defined performance measure. We calculate the Vapnik-Chervonenkis dimension of this class of learning machines and consider a practical example: the search for the Standard Model Higgs Boson at the LHC. The resulting classifier is very fast to evaluate, human-readable, and easily portable. The software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Linear Time Feature Selection for Regularized Least-Squares

Author: Airola Antti
Pahikkala Tapio
Salakoski Tapio
Publication venue
Publication date: 01/01/2010
Field of study

We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm and its ability to find good quality feature sets.Comment: 17 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

A Fast Algorithm for Robust Regression with Penalised Trimmed Squares

Author: A Giloni
AC Atkinson
AC Atkinson
AS Hadi
C Agostinelli
CW Coakley
D Gervini
D Peña
D Peña
DM Hawkins
DM Hawkins
DM Hawkins
DM Sebert
G Zioutas
G Zioutas
G. Zioutas
J Agulló
JF Gentleman
L. Pitsoulis
LM Li
LS Pitsoulis
M Salibian-Barrera
MS Bazaraa
N Billor
N Billor
N Billor
O Hössjer
PJ Rousseeuw
PJ Rousseeuw
PJ Rousseeuw
PJ Rousseeuw
RJ Rousseeuw
TA Feo
VJ Yohai
VJ Yohai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in \cite{ZiouAv:05,ZiAvPi:07} and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.Comment: 27 page

arXiv.org e-Print Archive

CiteSeerX

Crossref