30 research outputs found

    Nonparametric Independent Process Analysis

    Get PDF
    Linear dynamical systems are widely used tools to model stochastic time processes, but they have severe limitations; they assume linear dynamics with Gaussian driving noise. Independent component analysis (ICA) aims to weaken these limitations by allowing independent, non-Gaussian sources in the model. Independent subspace analysis (ISA), an important generalization of ICA, has proven to be successful in many source separation applications. Still, the general ISA problem of separating sources with nonparametric dynamics has been hardly touched in the literature yet. The goal of this paper is to extend ISA to the case of (i) nonparametric, asymptotically stationary source dynamics and (ii) unknown source component dimensions. We make use of functional autoregressive (fAR) processes to model the temporal evolution of the hidden sources. An extension of the well-known ISA separation principle is derived for the solution of the introduced fAR independent process analysis (fAR-IPA) task. By applying fAR identification we reduce the problem to ISA. The Nadaraya-Watson kernel regression technique is adapted to obtain strongly consistent fAR estimation. We illustrate the efficiency of the fAR-IPA approach by numerical examples and demonstrate that in this framework our method is superior to standard linear dynamical system based estimators

    Consistent Distribution Regression via Mean Embedding

    Get PDF
    In a standard regression model we need to predict a real-valued response based on a vector input. Recently, there has been a significant interest in extending the prediction problem from finite-dimensional Euclidean input spaces to other domains such as distributions. In my talk I am going to present a general, consistent and at the same time computationally very simple approach to solve the corresponding distribution regression task for the case when we have only samples from the distributions using mean embeddings. I also demonstrate the efficiency of our method on (i) entropy and skewness estimation of distributions, and (ii) on aerosol prediction based on satellite images

    Separation Theorem for Independent Subspace Analysis

    Get PDF
    Here, a separation theorem about Independent Subspace Analysis (ISA), a generalization of Independent Component Analysis (ICA) is proven. According to the theorem, ISA estimation can be executed in two steps under certain conditions. In the first step, 1-dimensional ICA estimation is executed. In the second step, optimal permutation of the ICA elements is searched for. We shall show that elliptically symmetric sources, among others, satisfy the conditions of the theorem

    Simple consistent distribution regression on compact metric domains

    Get PDF
    In a standard regression model, one assumes that both the inputs and outputs are finite dimensional vectors. We address a variant of the regression problem, the distribution regression task, where the inputs are probability measures. Many important machine learning tasks fit naturally into this framework, including multi- instance learning, point estimation problems of statistics without closed form analytical solutions, or tasks where simulation-based results are computationally expensive. Learning problems formulated on distributions have an inherent two-stage sampled challenge: only samples from sampled distributions are available for observation, and one has to construct estimates based on these sets of samples. We propose an algorithmically simple and parallelizable ridge regression based technique to solve the distribution regression problem: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that under mild conditions (for probability measures on compact metric domains with characteristic kernels) this solution scheme is consistent in the two-stage sampled setup. Specially, we establish the consistency of set kernels in regression (a 15-year-old open question) and offer an efficient alternative to existing distribution regression methods, which focus on compact domains of Euclidean spaces and apply density estimation (which suffers from slow convergence issues in high dimensions)

    Distribution Regression - the Set Kernel Heuristic is Consistent

    Get PDF
    Bag of feature (BoF) representations are omnipresent in machine learning; for example, an image can be described by a bag of visual features, a document might be considered as a bag of words, or a molecule can be handled as a bag of its different configurations. Set kernels (also called multi-instance or ensemble kernels; Gaertner 2002) defining the similarity of two bags as the average pairwise point similarities between the sets, are among the most widely applied tools to handle problems based on such BoF representations. Despite the wide applicability of set kernels, even the most fundamental theoretical questions such as their consistency in specific learning tasks is unknown. In my talk, I am going to focus on the distribution regression problem: regressing from a probability distribution to a real-valued response. By considering the mean embeddings of the distributions, this is a natural generalization of set kernels to the infinite sample limit: the bags can be seen as i.i.d. (independent identically distributed) samples from a distribution. We will propose an algorithmically simple ridge regression based solution for distribution regression and prove its consistency under fairly mild conditions (for probability distributions defined on locally compact Polish spaces). As a special case, we give positive answer to a 12-year-old open question, the consistency of set kernels in regression. We demonstrate the efficiency of the studied ridge regression technique on (i) supervised entropy learning, and (ii) aerosol prediction based on satellite images

    Learning on Distributions

    Get PDF
    Problems formulated in terms of distributions have recently gained widespread attention. An important task that belongs to this family is distribution regression: regressing to a real-valued response from a probability distribution. One particularly challenging difficulty of the task is its two-stage sampled nature: in practise we only have samples from sampled distributions. In my presentation I am going to talk about two (intimately related) directions to tackle this difficulty. Firstly, I am going to present a recently released information theoretical estimators open source toolkit capable of estimating numerous dependency, similarity measures on distributions in a nonparametric way. Next, I will propose an algorithmically very simple approach to tackle the distribution regression: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. I will show that (i) this technique is consistent in the two-stage sampled setting under fairly mild conditions, and (ii) it gives state-of-the-art results on supervised entropy learning and the prediction problem of aerosol optical depth based on satellite images. preprint: http://arxiv.org/pdf/1402.1754 ITE toolbox: https://bitbucket.org/szzoli/ite
    corecore