Search CORE

2,541 research outputs found

Statistical Significance of the Netflix Challenge

Author: Feuerverger Andrey
He Yu
Khatri Shashi
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/07/2012
Field of study

Inspired by the legacy of the Netflix contest, we provide an overview of what has been learned---from our own efforts, and those of others---concerning the problems of collaborative filtering and recommender systems. The data set consists of about 100 million movie ratings (from 1 to 5 stars) involving some 480 thousand users and some 18 thousand movies; the associated ratings matrix is about 99% sparse. The goal is to predict ratings that users will give to movies; systems which can do this accurately have significant commercial applications, particularly on the world wide web. We discuss, in some detail, approaches to "baseline" modeling, singular value decomposition (SVD), as well as kNN (nearest neighbor) and neural network models; temporal effects, cross-validation issues, ensemble methods and other considerations are discussed as well. We compare existing models in a search for new models, and also discuss the mission-critical issues of penalization and parameter shrinkage which arise when the dimensions of a parameter space reaches into the millions. Although much work on such problems has been carried out by the computer science and machine learning communities, our goal here is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.Comment: Published in at http://dx.doi.org/10.1214/11-STS368 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Support vector machine for functional data classification

Author: Aronszajn
Besse
Biau
Cardot
Cardot
Cristianini
Dauxois
Deville
Evgeniou
Fabrice Rossi
Ferré
Francois
Frank
Hastie
Hastie
Hastie
Hastie
Hoerl
James
Leurgans
Lin
Mallat
Marx
Nathalie Villa
Pezzulli
Ramsay
Ramsay
Rossi
Rossi
Rossi
Rossi
Sandberg
Sandberg
Smola
Smola
Steinwart
Steinwart
Stinchcombe
Vapnik
Vert
Villa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Iteratively-Reweighted Least-Squares Fitting of Support Vector Machines: A Majorization--Minimization Algorithm Approach

Author: McLachlan Geoffrey J.
Nguyen Hien D.
Publication venue
Publication date: 01/01/2017
Field of study

Support vector machines (SVMs) are an important tool in modern data analysis. Traditionally, support vector machines have been fitted via quadratic programming, either using purpose-built or off-the-shelf algorithms. We present an alternative approach to SVM fitting via the majorization--minimization (MM) paradigm. Algorithms that are derived via MM algorithm constructions can be shown to monotonically decrease their objectives at each iteration, as well as be globally convergent to stationary points. We demonstrate the construction of iteratively-reweighted least-squares (IRLS) algorithms, via the MM paradigm, for SVM risk minimization problems involving the hinge, least-square, squared-hinge, and logistic losses, and 1-norm, 2-norm, and elastic net penalizations. Successful implementations of our algorithms are presented via some numerical examples

arXiv.org e-Print Archive

University of Queensland eSpace