573 research outputs found
A Primer on Reproducing Kernel Hilbert Spaces
Reproducing kernel Hilbert spaces are elucidated without assuming prior
familiarity with Hilbert spaces. Compared with extant pedagogic material,
greater care is placed on motivating the definition of reproducing kernel
Hilbert spaces and explaining when and why these spaces are efficacious. The
novel viewpoint is that reproducing kernel Hilbert space theory studies
extrinsic geometry, associating with each geometric configuration a canonical
overdetermined coordinate system. This coordinate system varies continuously
with changing geometric configurations, making it well-suited for studying
problems whose solutions also vary continuously with changing geometry. This
primer can also serve as an introduction to infinite-dimensional linear algebra
because reproducing kernel Hilbert spaces have more properties in common with
Euclidean spaces than do more general Hilbert spaces.Comment: Revised version submitted to Foundations and Trends in Signal
Processin
One-Class Support Measure Machines for Group Anomaly Detection
We propose one-class support measure machines (OCSMMs) for group anomaly
detection which aims at recognizing anomalous aggregate behaviors of data
points. The OCSMMs generalize well-known one-class support vector machines
(OCSVMs) to a space of probability measures. By formulating the problem as
quantile estimation on distributions, we can establish an interesting
connection to the OCSVMs and variable kernel density estimators (VKDEs) over
the input space on which the distributions are defined, bridging the gap
between large-margin methods and kernel density estimators. In particular, we
show that various types of VKDEs can be considered as solutions to a class of
regularization problems studied in this paper. Experiments on Sloan Digital Sky
Survey dataset and High Energy Particle Physics dataset demonstrate the
benefits of the proposed framework in real-world applications.Comment: Conference on Uncertainty in Artificial Intelligence (UAI2013
Topics In Multivariate Statistics
Multivariate statistics concerns the study of dependence relations among multiple variables of interest. Distinct from widely studied regression problems where one of the variables is singled out as a response, in multivariate analysis all variables are treated symmetrically and the dependency structures are examined, either for interest in its own right or for further analyses such as regressions. This thesis includes the study of three independent research problems in multivariate statistics.
The first part of the thesis studies additive principal components (APCs for short), a nonlinear method useful for exploring additive relationships among a set of variables. We propose a shrinkage regularization approach for estimating APC transformations by casting the problem in the framework of reproducing kernel Hilbert spaces. To formulate the kernel APC problem, we introduce the Null Comparison Principle, a principle that ties the constraint in a multivariate problem to its criterion in a way that makes the goal of the multivariate method under study transparent. In addition to providing a detailed formulation and exposition of the kernel APC problem, we study asymptotic theory of kernel APCs. Our theory also motivates an iterative algorithm for computing kernel APCs.
The second part of the thesis investigates the estimation of precision matrices in high dimensions when the data is corrupted in a cellwise manner and the uncontaminated data follows a multivariate normal distribution. It is known that in the setting of Gaussian graphical models, the conditional independence relations among variables is captured by the precision matrix of a multivariate normal distribution, and estimating the support of the precision matrix is equivalent to graphical model selection. In this work, we analyze the theoretical properties of robust estimators for precision matrices in high dimensions. The estimators we analyze are formed by plugging appropriately chosen robust covariance matrix estimators into the graphical Lasso and CLIME, two existing methods for high-dimensional precision matrix estimation. We establish error bounds for the precision matrix estimators that reveal the interplay between the dimensionality of the problem and the degree of contamination permitted in the observed distribution, and also analyze the breakdown point of both estimators. We also discuss implications of our work for Gaussian graphical model estimation in the presence of cellwise contamination.
The third part of the thesis studies the problem of optimal estimation of a quadratic functional under the Gaussian two-sequence model. Quadratic functional estimation has been well studied under the Gaussian sequence model, and close connections between the problem of quadratic functional estimation and that of signal detection have been noted. Focusing on the estimation problem in the Gaussian two-sequence model, in this work we propose optimal estimators of the quadratic functional for different regimes and establish the minimax rates of convergence over a family of parameter spaces. The optimal rates exhibit interesting phase transition in this family. We also discuss the implications of our estimation results on the associated simultaneous signal detection problem
Conditional expectation using compactification operators
The separate tasks of denoising, conditional expectation and manifold
learning can often be posed in a common setting of finding the conditional
expectations arising from a product of two random variables. This paper focuses
on this more general problem and describes an operator theoretic approach to
estimating the conditional expectation. Kernel integral operators are used as a
compactification tool, to set up the estimation problem as a linear inverse
problem in a reproducing kernel Hilbert space. This equation is shown to have
solutions that are stable to numerical approximation, thus guaranteeing the
convergence of data-driven implementations. The overall technique is easy to
implement, and their successful application to some real-world problems are
also shown
- …