302 research outputs found
Minimax Estimation of Kernel Mean Embeddings
In this paper, we study the minimax estimation of the Bochner integral
also called as the kernel
mean embedding, based on random samples drawn i.i.d.~from , where
is a positive definite
kernel. Various estimators (including the empirical estimator),
of are studied in the literature wherein all of
them satisfy with
being the reproducing kernel Hilbert space induced by . The
main contribution of the paper is in showing that the above mentioned rate of
is minimax in and
-norms over the class of discrete measures and
the class of measures that has an infinitely differentiable density, with
being a continuous translation-invariant kernel on . The
interesting aspect of this result is that the minimax rate is independent of
the smoothness of the kernel and the density of (if it exists). This result
has practical consequences in statistical applications as the mean embedding
has been widely employed in non-parametric hypothesis testing, density
estimation, causal inference and feature selection, through its relation to
energy distance (and distance covariance)
Pinsker estimators for local helioseismology
A major goal of helioseismology is the three-dimensional reconstruction of
the three velocity components of convective flows in the solar interior from
sets of wave travel-time measurements. For small amplitude flows, the forward
problem is described in good approximation by a large system of convolution
equations. The input observations are highly noisy random vectors with a known
dense covariance matrix. This leads to a large statistical linear inverse
problem.
Whereas for deterministic linear inverse problems several computationally
efficient minimax optimal regularization methods exist, only one
minimax-optimal linear estimator exists for statistical linear inverse
problems: the Pinsker estimator. However, it is often computationally
inefficient because it requires a singular value decomposition of the forward
operator or it is not applicable because of an unknown noise covariance matrix,
so it is rarely used for real-world problems. These limitations do not apply in
helioseismology. We present a simplified proof of the optimality properties of
the Pinsker estimator and show that it yields significantly better
reconstructions than traditional inversion methods used in helioseismology,
i.e.\ Regularized Least Squares (Tikhonov regularization) and SOLA (approximate
inverse) methods.
Moreover, we discuss the incorporation of the mass conservation constraint in
the Pinsker scheme using staggered grids. With this improvement we can
reconstruct not only horizontal, but also vertical velocity components that are
much smaller in amplitude
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
- β¦