11 research outputs found

    Robust Spectral Clustering via Sparse Representation

    Get PDF
    Clustering high-dimensional data has been a challenging problem in data mining and machining learning. Spectral clustering via sparse representation has been proposed for clustering high-dimensional data. A critical step in spectral clustering is to effectively construct a weight matrix by assessing the proximity between each pair of objects. While sparse representation proves its effectiveness for compressing high-dimensional signals, existing spectral clustering algorithms based on sparse representation use those sparse coefficients directly. We believe that the similarity measure exploiting more global information from the coefficient vectors will provide more truthful similarity among data objects. The intuition is that the sparse coefficient vectors corresponding to two similar objects are similar and those of two dissimilar objects are also dissimilar. In particular, we propose two approaches of weight matrix construction according to the similarity of the sparse coefficient vectors. Experimental results on several real-world high-dimensional data sets demonstrate that spectral clustering based on the proposed similarity matrices outperforms existing spectral clustering algorithms via sparse representation

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Matrix factorisation and the interpretation of geodesic distance

    Full text link
    Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques

    Underdetermined convolutive source separation using two dimensional non-negative factorization techniques

    Get PDF
    PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture. In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both ii of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance. Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods

    Stochastic control liaisons: Richard Sinkhorn meets gaspard monge on a schr\uf6dinger bridge

    Get PDF
    In 1931-1932, Erwin Schr\uf6dinger studied a hot gas Gedankenexperiment (an instance of large deviations of the empirical distribution). Schr\uf6dinger's problem represents an early example of a fundamental inference method, the so-called maximum entropy method, having roots in Boltzmann's work and being developed in subsequent years by Jaynes, Burg, Dempster, and Csisz\ue1r. The problem, known as the Schr\uf6dinger bridge problem (SBP) with "uniform"prior, was more recently recognized as a regularization of the Monge-Kantorovich optimal mass transport (OMT) problem, leading to effective computational schemes for the latter. Specifically, OMT with quadratic cost may be viewed as a zerotemperature limit of the problem posed by Schr\uf6dinger in the early 1930s. The latter amounts to minimization of Helmholtz's free energy over probability distributions that are constrained to possess two given marginals. The problem features a delicate compromise, mediated by a temperature parameter, between minimizing the internal energy and maximizing the entropy. These concepts are central to a rapidly expanding area of modern science dealing with the so-called Sinkhorn algorithm, which appears as a special case of an algorithm first studied in the more challenging continuous space setting by the French analyst Robert Fortet in 1938-1940 specifically for Schr\uf6dinger bridges. Due to the constraint on end-point distributions, dynamic programming is not a suitable tool to attack these problems. Instead, Fortet's iterative algorithm and its discrete counterpart, the Sinkhorn iteration, permit computation of the optimal solution by iteratively solving the so-called Schr\uf6dinger system. Convergence of the iteration is guaranteed by contraction along the steps in suitable metrics, such as Hilbert's projective metric. In both the continuous as well as the discrete time and space settings, stochastic control provides a reformulation of and a context for the dynamic versions of general Schr\uf6dinger bridge problems and of their zero-temperature limit, the OMT problem. These problems, in turn, naturally lead to steering problems for flows of one-time marginals which represent a new paradigm for controlling uncertainty. The zero-temperature problem in the continuous-time and space setting turns out to be the celebrated Benamou-Brenier characterization of the McCann displacement interpolation flow in OMT. The formalism and techniques behind these control problems on flows of probability distributions have attracted significant attention in recent years as they lead to a variety of new applications in spacecraft guidance, control of robot or biological swarms, sensing, active cooling, and network routing as well as in computer and data science. This multifaceted and versatile framework, intertwining SBP and OMT, provides the substrate for the historical and technical overview of the field given in this paper. A key motivation has been to highlight links between the classical early work in both topics and the more recent stochastic control viewpoint, which naturally lends itself to efficient computational schemes and interesting generalizations

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    corecore