10,665 research outputs found
Spectral Norm of Random Kernel Matrices with Applications to Privacy
Kernel methods are an extremely popular set of techniques used for many
important machine learning and data analysis applications. In addition to
having good practical performances, these methods are supported by a
well-developed theory. Kernel methods use an implicit mapping of the input data
into a high dimensional feature space defined by a kernel function, i.e., a
function returning the inner product between the images of two data points in
the feature space. Central to any kernel method is the kernel matrix, which is
built by evaluating the kernel function on a given sample dataset.
In this paper, we initiate the study of non-asymptotic spectral theory of
random kernel matrices. These are n x n random matrices whose (i,j)th entry is
obtained by evaluating the kernel function on and , where
are a set of n independent random high-dimensional vectors. Our
main contribution is to obtain tight upper bounds on the spectral norm (largest
eigenvalue) of random kernel matrices constructed by commonly used kernel
functions based on polynomials and Gaussian radial basis.
As an application of these results, we provide lower bounds on the distortion
needed for releasing the coefficients of kernel ridge regression under
attribute privacy, a general privacy notion which captures a large class of
privacy definitions. Kernel ridge regression is standard method for performing
non-parametric regression that regularly outperforms traditional regression
approaches in various domains. Our privacy distortion lower bounds are the
first for any kernel technique, and our analysis assumes realistic scenarios
for the input, unlike all previous lower bounds for other release problems
which only hold under very restrictive input settings.Comment: 16 pages, 1 Figur
The Degrees of Freedom of Partial Least Squares Regression
The derivation of statistical properties for Partial Least Squares regression
can be a challenging task. The reason is that the construction of latent
components from the predictor variables also depends on the response variable.
While this typically leads to good performance and interpretable models in
practice, it makes the statistical analysis more involved. In this work, we
study the intrinsic complexity of Partial Least Squares Regression. Our
contribution is an unbiased estimate of its Degrees of Freedom. It is defined
as the trace of the first derivative of the fitted values, seen as a function
of the response. We establish two equivalent representations that rely on the
close connection of Partial Least Squares to matrix decompositions and Krylov
subspace techniques. We show that the Degrees of Freedom depend on the
collinearity of the predictor variables: The lower the collinearity is, the
higher the Degrees of Freedom are. In particular, they are typically higher
than the naive approach that defines the Degrees of Freedom as the number of
components. Further, we illustrate how the Degrees of Freedom approach can be
used for the comparison of different regression methods. In the experimental
section, we show that our Degrees of Freedom estimate in combination with
information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio
Early stopping and non-parametric regression: An optimal data-dependent stopping rule
The strategy of early stopping is a regularization technique based on
choosing a stopping time for an iterative algorithm. Focusing on non-parametric
regression in a reproducing kernel Hilbert space, we analyze the early stopping
strategy for a form of gradient-descent applied to the least-squares loss
function. We propose a data-dependent stopping rule that does not involve
hold-out or cross-validation data, and we prove upper bounds on the squared
error of the resulting function estimate, measured in either the and
norm. These upper bounds lead to minimax-optimal rates for various
kernel classes, including Sobolev smoothness classes and other forms of
reproducing kernel Hilbert spaces. We show through simulation that our stopping
rule compares favorably to two other stopping rules, one based on hold-out data
and the other based on Stein's unbiased risk estimate. We also establish a
tight connection between our early stopping strategy and the solution path of a
kernel ridge regression estimator.Comment: 29 pages, 4 figure
Statistical inference in mechanistic models: time warping for improved gradient matching
Inference in mechanistic models of non-linear differential equations is a challenging problem in current computational statistics. Due to the high computational costs of numerically solving the differential equations in every step of an iterative parameter adaptation scheme, approximate methods based on gradient matching have become popular. However, these methods critically depend on the smoothing scheme for function interpolation. The present article adapts an idea from manifold learning and demonstrates that a time warping approach aiming to homogenize intrinsic length scales can lead to a significant improvement in parameter estimation accuracy. We demonstrate the effectiveness of this scheme on noisy data from two dynamical systems with periodic limit cycle, a biopathway, and an application from soft-tissue mechanics. Our study also provides a comparative evaluation on a wide range of signal-to-noise ratios
- …