4 research outputs found
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
Nonlinear kernels can be approximated using finite-dimensional feature maps
for efficient risk minimization. Due to the inherent trade-off between the
dimension of the (mapped) feature space and the approximation accuracy, the key
problem is to identify promising (explicit) features leading to a satisfactory
out-of-sample performance. In this work, we tackle this problem by efficiently
choosing such features from multiple kernels in a greedy fashion. Our method
sequentially selects these explicit features from a set of candidate features
using a correlation metric. We establish an out-of-sample error bound capturing
the trade-off between the error in terms of explicit features (approximation
error) and the error due to spectral properties of the best model in the
Hilbert space associated to the combined kernel (spectral error). The result
verifies that when the (best) underlying data model is sparse enough, i.e., the
spectral error is negligible, one can control the test error with a small
number of explicit features, that can scale poly-logarithmically with data. Our
empirical results show that given a fixed number of explicit features, the
method can achieve a lower test error with a smaller time cost, compared to the
state-of-the-art in data-dependent random features.Comment: Proc. of 2018 Advances in Neural Information Processing Systems (NIPS
2018
Nonparametric Basis Pursuit via Sparse Kernel-based Learning
Signal processing tasks as fundamental as sampling, reconstruction, minimum
mean-square error interpolation and prediction can be viewed under the prism of
reproducing kernel Hilbert spaces. Endowing this vantage point with
contemporary advances in sparsity-aware modeling and processing, promotes the
nonparametric basis pursuit advocated in this paper as the overarching
framework for the confluence of kernel-based learning (KBL) approaches
leveraging sparse linear regression, nuclear-norm regularization, and
dictionary learning. The novel sparse KBL toolbox goes beyond translating
sparse parametric approaches to their nonparametric counterparts, to
incorporate new possibilities such as multi-kernel selection and matrix
smoothing. The impact of sparse KBL to signal processing applications is
illustrated through test cases from cognitive radio sensing, microarray data
imputation, and network traffic prediction.Comment: IEEE SIGNAL PROCESSING MAGAZINE, 2013 (TO APPEAR
Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features
Despite their success, kernel methods suffer from a massive computational
cost in practice. In this paper, in lieu of commonly used kernel expansion with
respect to inputs, we develop a novel optimal design maximizing the entropy
among kernel features. This procedure results in a kernel expansion with
respect to entropic optimal features (EOF), improving the data representation
dramatically due to features dissimilarity. Under mild technical assumptions,
our generalization bound shows that with only features
(disregarding logarithmic factors), we can achieve the optimal statistical
accuracy (i.e., ). The salient feature of our design is its
sparsity that significantly reduces the time and space cost. Our numerical
experiments on benchmark datasets verify the superiority of EOF over the
state-of-the-art in kernel approximation
Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the non-parametric extension of group sparsity in linear models. While the two dominant algorithmic strands of sparse learning, namely convex relaxations using l1 norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorously extended for group sparsity, the sparse MKL literature has so far mainly adopted the former with mild empirical success. In this paper, we close this gap by proposing a Group-OMP based framework for sparse MKL. Unlike l1-MKL, our approach decouples the sparsity regularizer (via a direct l0 constraint) from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a black-box single-kernel solver. The algorithmic development and empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds and sparse recovery conditions analogous to those for OMP [27] and Group-OMP [16].