83,628 research outputs found
Two-stage Sampled Learning Theory on Distributions
We focus on the distribution regression problem: regressing to a real-valued
response from a probability distribution. Although there exist a large number
of similarity measures between distributions, very little is known about their
generalization performance in specific learning tasks. Learning problems
formulated on distributions have an inherent two-stage sampled difficulty: in
practice only samples from sampled distributions are observable, and one has to
build an estimate on similarities computed between sets of points. To the best
of our knowledge, the only existing method with consistency guarantees for
distribution regression requires kernel density estimation as an intermediate
step (which suffers from slow convergence issues in high dimensions), and the
domain of the distributions to be compact Euclidean. In this paper, we provide
theoretical guarantees for a remarkably simple algorithmic alternative to solve
the distribution regression problem: embed the distributions to a reproducing
kernel Hilbert space, and learn a ridge regressor from the embeddings to the
outputs. Our main contribution is to prove the consistency of this technique in
the two-stage sampled setting under mild conditions (on separable, topological
domains endowed with kernels). For a given total number of observations, we
derive convergence rates as an explicit function of the problem difficulty. As
a special case, we answer a 15-year-old open question: we establish the
consistency of the classical set kernel [Haussler, 1999; Gartner et. al, 2002]
in regression, and cover more recent kernels on distributions, including those
due to [Christmann and Steinwart, 2010].Comment: v6: accepted at AISTATS-2015 for oral presentation; final version;
code: https://bitbucket.org/szzoli/ite/; extension to the misspecified and
vector-valued case: http://arxiv.org/abs/1411.206
Learning Graphs from Linear Measurements: Fundamental Trade-offs and Applications
We consider a specific graph learning task: reconstructing a symmetric matrix that represents an underlying graph using linear measurements. We present a sparsity characterization for distributions of random graphs (that are allowed to contain high-degree nodes), based on which we study fundamental trade-offs between the number of measurements, the complexity of the graph class, and the probability of error. We first derive a necessary condition on the number of measurements. Then, by considering a three-stage recovery scheme, we give a sufficient condition for recovery. Furthermore, assuming the measurements are Gaussian IID, we prove upper and lower bounds on the (worst-case) sample complexity for both noisy and noiseless recovery. In the special cases of the uniform distribution on trees with n nodes and the Erdős-Rényi (n,p) class, the fundamental trade-offs are tight up to multiplicative factors with noiseless measurements. In addition, for practical applications, we design and implement a polynomial-time (in n ) algorithm based on the three-stage recovery scheme. Experiments show that the heuristic algorithm outperforms basis pursuit on star graphs. We apply the heuristic algorithm to learn admittance matrices in electric grids. Simulations for several canonical graph classes and IEEE power system test cases demonstrate the effectiveness and robustness of the proposed algorithm for parameter reconstruction
- …