3,902 research outputs found
Recommended from our members
Sparse kernel density estimation technique based on zero-norm constraint
A sparse kernel density estimator is derived based on the zero-norm constraint, in which the zero-norm of the kernel weights is incorporated to enhance model sparsity. The classical Parzen window estimate is adopted as the desired response for density estimation, and an approximate function of the zero-norm is used for achieving mathemtical tractability and algorithmic efficiency. Under the mild condition of the positive definite design matrix, the kernel weights of the proposed density estimator based on the zero-norm approximation can be obtained using the multiplicative nonnegative quadratic programming algorithm. Using the -optimality based selection algorithm as the preprocessing to select a small significant subset design matrix, the proposed zero-norm based approach offers an effective means for constructing very sparse kernel density estimates with excellent generalisation performance
Recommended from our members
Elastic net prefiltering for two class classification
A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems
An optimal experimental design perspective on redial basis function regression
This paper provides a new look at radial basis function regression that reveals striking similarities with the traditional optimal experimental design framework. We show theoreti- cally and computationally that the so-called relevant vectors derived through the relevance vector machine (RVM) and corresponding to the centers of the radial basis function net- work, are very similar and often identical to the support points obtained through various optimal experimental design criteria like D-optimality. This allows us to provide a sta- tistical meaning to the relevant centers in the context of radial basis function regression, but also opens the door to a variety of ways of approach optimal experimental design in multivariate settings
A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning
Learning sparse combinations is a frequent theme in machine learning. In this
paper, we study its associated optimization problem in the distributed setting
where the elements to be combined are not centrally located but spread over a
network. We address the key challenges of balancing communication costs and
optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW)
algorithm. We obtain theoretical guarantees on the optimization error
and communication cost that do not depend on the total number of
combining elements. We further show that the communication cost of dFW is
optimal by deriving a lower-bound on the communication cost required to
construct an -approximate solution. We validate our theoretical
analysis with empirical studies on synthetic and real-world data, which
demonstrate that dFW outperforms both baselines and competing methods. We also
study the performance of dFW when the conditions of our analysis are relaxed,
and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape
Sharp Oracle Inequalities for Aggregation of Affine Estimators
We consider the problem of combining a (possibly uncountably infinite) set of
affine estimators in non-parametric regression model with heteroscedastic
Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a
PAC-Bayesian type inequality that leads to sharp oracle inequalities in
discrete but also in continuous settings. The framework is general enough to
cover the combinations of various procedures such as least square regression,
kernel ridge regression, shrinking estimators and many other estimators used in
the literature on statistical inverse problems. As a consequence, we show that
the proposed aggregate provides an adaptive estimator in the exact minimax
sense without neither discretizing the range of tuning parameters nor splitting
the set of observations. We also illustrate numerically the good performance
achieved by the exponentially weighted aggregate
- …