3 research outputs found

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments

    Combined-channel instantaneous frequency analysis for audio source separation based on comodulation

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographical references (p. 295-303).Normal human listeners have a remarkable ability to focus on a single sound or speaker of interest and to block out competing sound sources. Individuals with hearing impairments, on the other hand, often experience great difficulty in noisy environments. The goal of our research is to develop novel signal processing methods inspired by neural auditory processing that can improve current speech separation systems. These could potentially be of use as assistive devices for the hearing impaired, and in many other communications applications. Our focus is the monaural case where spatial information is not available. Much perceptual evidence indicates that detecting common amplitude and frequency variation in acoustic signals plays an important role in the separation process. The physical mechanisms of sound generation in many sources cause common onsets/offsets and correlated increases/decreases in both amplitude and frequency among the spectral components of an individual source, which can potentially serve as a distinct signature. However, harnessing these common modulation patterns is difficult because when spectral components of competing sources overlap within the bandwidth of a single auditory filter, the modulation envelope of the resultant waveform resembles that of neither source. To overcome this, for the coherent, constant-frequency AM case, we derive a set of matrix equations which describes the mixture, and we prove that there exists a unique factorization under certain constraints. These constraints provide insight into the importance of onset cues in source separation. We develop algorithms for solving the system in those cases in which a unique solution exists. This work has direct bearing on the general theory of non-negative matrix factorization which has recently been applied to various problems in biology and learning. For the general, incoherent, AM and FM case, the situation is far more complex because constructive and destructive interference between sources causes amplitude fluctuations within channels that obscures the modulation patterns of individual sources.(cont.) Motivated by the importance of temporal processing in the auditory system, and specifically, the use of extrema, we explore novel methods for estimating instantaneous amplitude, frequency, and phase of mixtures of sinusoids by comparing the location of local maxima of waveforms from various frequency channels. By using an overlapping exponential filter bank model with properties resembling the cochlea, and combining information from multiple frequency bands, we are able to achieve extremely high frequency and time resolution. This allows us to isolate and track the behavior of individual spectral components which can be compared and grouped with others of like type. Our work includes both computational and analytic approaches to the general problem. Two suites of tests were performed. The first were comparative evaluations of three filter-bank-based algorithms on sets of harmonic-like signals with constant frequencies. One of these algorithms was selected for further performance tests on more complex waveforms, including AM and FM signals of various types, harmonic sets in noise, and actual recordings of male and female speakers, both individual and mixed. For the frequency-varying case, initial results of signal analysis with our methods appear to resolve individual sidebands of single harmonics on short time scales, and raise interesting conceptual questions on how to define, use and interpret the concept of instantaneous frequency. Based on our results, we revisit a number of questions in current auditory research, including the need for both rate and place coding, the asymmetrical shapes of auditory filters, and a possible explanation for the deficit of the hearing impaired in noise.by Barry David Jacobson.Ph.D
    corecore