352,493 research outputs found
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
Spectral Norm of Random Kernel Matrices with Applications to Privacy
Kernel methods are an extremely popular set of techniques used for many
important machine learning and data analysis applications. In addition to
having good practical performances, these methods are supported by a
well-developed theory. Kernel methods use an implicit mapping of the input data
into a high dimensional feature space defined by a kernel function, i.e., a
function returning the inner product between the images of two data points in
the feature space. Central to any kernel method is the kernel matrix, which is
built by evaluating the kernel function on a given sample dataset.
In this paper, we initiate the study of non-asymptotic spectral theory of
random kernel matrices. These are n x n random matrices whose (i,j)th entry is
obtained by evaluating the kernel function on and , where
are a set of n independent random high-dimensional vectors. Our
main contribution is to obtain tight upper bounds on the spectral norm (largest
eigenvalue) of random kernel matrices constructed by commonly used kernel
functions based on polynomials and Gaussian radial basis.
As an application of these results, we provide lower bounds on the distortion
needed for releasing the coefficients of kernel ridge regression under
attribute privacy, a general privacy notion which captures a large class of
privacy definitions. Kernel ridge regression is standard method for performing
non-parametric regression that regularly outperforms traditional regression
approaches in various domains. Our privacy distortion lower bounds are the
first for any kernel technique, and our analysis assumes realistic scenarios
for the input, unlike all previous lower bounds for other release problems
which only hold under very restrictive input settings.Comment: 16 pages, 1 Figur
Asymptotic Normality of Support Vector Machine Variants and Other Regularized Kernel Methods
In nonparametric classification and regression problems, regularized kernel
methods, in particular support vector machines, attract much attention in
theoretical and in applied statistics. In an abstract sense, regularized kernel
methods (simply called SVMs here) can be seen as regularized M-estimators for a
parameter in a (typically infinite dimensional) reproducing kernel Hilbert
space. For smooth loss functions, it is shown that the difference between the
estimator, i.e.\ the empirical SVM, and the theoretical SVM is asymptotically
normal with rate . That is, the standardized difference converges
weakly to a Gaussian process in the reproducing kernel Hilbert space. As common
in real applications, the choice of the regularization parameter may depend on
the data. The proof is done by an application of the functional delta-method
and by showing that the SVM-functional is suitably Hadamard-differentiable
- …