81 research outputs found

    Discussion of: Brownian distance covariance

    Full text link
    Discussion on "Brownian distance covariance" by G\'abor J. Sz\'ekely and Maria L. Rizzo [arXiv:1010.0297]Comment: Published in at http://dx.doi.org/10.1214/09-AOAS312F the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Optimal Regression on Sets

    Get PDF

    Consistent Vector-valued Regression on Probability Measures

    Get PDF
    I will focus on the distribution regression problem (DRP): our goal is to regress from probability measures to vector-valued outputs, in the two-stage sampled setup when only samples from the distributions are available. The studied DRP framework incorporates several important machine learning and statistical tasks, including multi-instance learning or point estimation problems without analytical solution (such as hyperparameter estimation). Obtaining theoretical guarantees, bounds on the generalization error of the estimated predictor is pretty challenging due to the two-stage sampled characteristic of the task. To the best of our knowledge, among the vast number of heuristic approaches in the literature, the only theoretically justified technique tackling the DRP problem requires that the domain of the distributions be compact Euclidean, and uses density estimation (which often performs poorly in practice). We present a simple, analytically tractable alternative: we embed the probability measures to a reproducing kernel Hilbert space, and perform ridge regression from the embedded distributions to the outputs. We prove that this method is consistent under mild conditions, on separable topological domains endowed with kernels. Specifically, we establish the consistency of the traditional set kernel in regression, which was a 15-year-old open question. We demonstrate the efficiency of our method in supervised entropy learning and aerosol prediction based on multispectral satellite images

    Geometrical Insights for Implicit Generative Modeling

    Full text link
    Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences. In particular, we can establish surprising approximate global convergence guarantees for the 11-Wasserstein distance,even when the parametric generator has a nonconvex parametrization.Comment: this version fixes a typo in a definitio

    Optimal Rates for Random Fourier Feature Approximations

    Get PDF
    Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized methods have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this talk, I am going to present the main ideas and results of a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) providing guarantees in Lr (1≤r <∞) norms. I will also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality

    Distribution Regression with Minimax-Optimal Guarantee

    Get PDF
    We focus on the distribution regression problem (DRP): we regress from probability measures to Hilbert-space valued outputs, where the input distributions are only available through samples (this is the 'two-stage sampled' setting). Several important statistical and machine learning problems can be phrased within this framework including point estimation tasks without analytical solution (such as hyperparameter or entropy estimation) and multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees to solve the DRP task requires density estimation (which often performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the DRP task: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. Our main contribution is to prove that this scheme is consistent in the two-stage sampled setup under mild conditions (on separable topological domains enriched with kernels): we present an exact computational-statistical efficiency tradeoff analysis showing that the studied estimator is able to match the one-stage sampled minimax-optimal rate. This result answers a 17-year-old open question, by establishing the consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002] in regression. We also cover consistency for more recent kernels on distributions, including those due to [Christmann and Steinwart, 2010]. The practical efficiency of the studied technique is illustrated in supervised entropy learning and aerosol prediction using multispectral satellite images

    Distribution Regression - the Set Kernel Heuristic is Consistent

    Get PDF
    Bag of feature (BoF) representations are omnipresent in machine learning; for example, an image can be described by a bag of visual features, a document might be considered as a bag of words, or a molecule can be handled as a bag of its different configurations. Set kernels (also called multi-instance or ensemble kernels; Gaertner 2002) defining the similarity of two bags as the average pairwise point similarities between the sets, are among the most widely applied tools to handle problems based on such BoF representations. Despite the wide applicability of set kernels, even the most fundamental theoretical questions such as their consistency in specific learning tasks is unknown. In my talk, I am going to focus on the distribution regression problem: regressing from a probability distribution to a real-valued response. By considering the mean embeddings of the distributions, this is a natural generalization of set kernels to the infinite sample limit: the bags can be seen as i.i.d. (independent identically distributed) samples from a distribution. We will propose an algorithmically simple ridge regression based solution for distribution regression and prove its consistency under fairly mild conditions (for probability distributions defined on locally compact Polish spaces). As a special case, we give positive answer to a 12-year-old open question, the consistency of set kernels in regression. We demonstrate the efficiency of the studied ridge regression technique on (i) supervised entropy learning, and (ii) aerosol prediction based on satellite images

    Vector-valued distribution regression: a simple and consistent approach

    Get PDF
    We address the distribution regression problem (DRP): regressing on the domain of probability measures, in the two-stage sampled setup when only samples from the distributions are given. The DRP formulation offers a unified framework for several important tasks in statistics and machine learning including multi-instance learning (MIL), or point estimation problems without analytical solution. Despite the large number of MIL heuristics, essentially there is no theoretically grounded approach to tackle the DRP problem in two-stage sampled case. To the best of our knowledge, the only existing technique with consistency guarantees requires kernel density estimation as an intermediate step (which often scale poorly in practice), and the domain of the distributions to be compact Euclidean. We analyse a simple (analytically computable) ridge regression alternative to DRP: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that this scheme is consistent in the two-stage sampled setup under mild conditions, for probability measure inputs defined on separable, topological domains endowed with kernels, with vector-valued outputs belonging to an arbitrary separable Hilbert space. Specially, choosing the kernel on the space of embedded distributions to be linear and the output space to the real line, we get the consistency of set kernels in regression, which was a 15-year-old open question. In our talk we are going to present (i) the main ideas and results of consistency, (ii) concrete kernel constructions on mean embedded distributions, and (iii) two applications (supervised entropy learning, aerosol prediction based on multispectral satellite images) demonstrating the efficiency of our approach

    Regression on Probability Measures: A Simple and Consistent Algorithm

    Get PDF
    We address the distribution regression problem: we regress from probability measures to Hilbert-space valued outputs, where only samples are available from the input distributions. Many important statistical and machine learning problems can be phrased within this framework including point estimation tasks without analytical solution, or multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees requires density estimation (which of ten performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the distribution regression problem: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. We prove that this scheme is consistent under mild conditions (for distributions on separable topological domains endowed with kernels), and construct explicit finite sample bounds on the excess risk as a function of the sample numbers and the problem difficulty, which hold with high probability. Specifically, we establish the consistency of set kernels in regression, which was a 15-year-old-openquestion, and also present new kernels on embedded distributions. The practical efficiency of the studied technique is illustrated in supervised entropy learning and aerosol prediction using multispectral satellite images
    • …
    corecore