Search CORE

3,902 research outputs found

Recommended from our members

Sparse kernel density estimation technique based on zero-norm constraint

Author: Chen S
Harris C J
Hong Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2010
Field of study

A sparse kernel density estimator is derived based on the zero-norm constraint, in which the zero-norm of the kernel weights is incorporated to enhance model sparsity. The classical Parzen window estimate is adopted as the desired response for density estimation, and an approximate function of the zero-norm is used for achieving mathemtical tractability and algorithmic efficiency. Under the mild condition of the positive definite design matrix, the kernel weights of the proposed density estimator based on the zero-norm approximation can be obtained using the multiplicative nonnegative quadratic programming algorithm. Using the -optimality based selection algorithm as the preprocessing to select a small significant subset design matrix, the proposed zero-norm based approach offers an effective means for constructing very sparse kernel density estimates with excellent generalisation performance

Central Archive at the University of Reading

Southampton (e-Prints Soton)

Crossref

Recommended from our members

Elastic net prefiltering for two class classification

Author: Chen Sheng
Harris Chris J.
Hong Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2012
Field of study

A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems

Central Archive at the University of Reading

CiteSeerX

Southampton (e-Prints Soton)

Crossref

An optimal experimental design perspective on redial basis function regression

Author: Fokoue Ernest
Goel Prem
Publication venue: RIT Scholar Works
Publication date: 15/03/2010
Field of study

This paper provides a new look at radial basis function regression that reveals striking similarities with the traditional optimal experimental design framework. We show theoreti- cally and computationally that the so-called relevant vectors derived through the relevance vector machine (RVM) and corresponding to the centers of the radial basis function net- work, are very similar and often identical to the support points obtained through various optimal experimental design criteria like D-optimality. This allows us to provide a sta- tistical meaning to the relevant centers in the context of radial basis function regression, but also opens the door to a variety of ways of approach optimal experimental design in multivariate settings

RIT Scholar Works

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

Author: Balcan Maria-Florina
Bellet Aurélien
Garakani Alireza Bagheri
Liang Yingyu
Sha Fei
Publication venue
Publication date: 12/01/2015
Field of study

Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error

\epsilon

and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an

\epsilon

-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

arXiv.org e-Print Archive

Crossref

Sharp Oracle Inequalities for Aggregation of Affine Estimators

Author: Dalalyan Arnak
Salmon Joseph
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PAC-Bayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures such as least square regression, kernel ridge regression, shrinking estimators and many other estimators used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech