Search CORE

12 research outputs found

Learning and inference with Wasserstein metrics

Author: Frogner Charles (Charles Albert)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2018
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

DSpace@MIT

Low Complexity Image Recognition Algorithms for Handheld devices

Author: Ayyalasomayajula Pradyumna
Publication venue: Lausanne, EPFL
Publication date: 21/10/2013
Field of study

Content Based Image Retrieval (CBIR) has gained a lot of interest over the last two decades. The need to search and retrieve images from databases, based on information (“features”) extracted from the image itself, is becoming increasingly important. CBIR can be useful for handheld image recognition devices in which the image to be recognized is acquired with a camera, and thus there is no additional metadata associated to it. However, most CBIR systems require large computations, preventing their use in handheld devices. In this PhD work, we have developed low-complexity algorithms for content based image retrieval in handheld devices for camera acquired images. Two novel algorithms, ‘Color Density Circular Crop’ (CDCC) and ‘DCT-Phase Match’ (DCTPM), to perform image retrieval along with a two-stage image retrieval algorithm that combines CDCC and DCTPM, to achieve the low complexity required in handheld devices are presented. The image recognition algorithms run on a handheld device over a large database with fast retrieval time besides having high accuracy, precision and robustness to environment variations. Three algorithms for Rotation, Scale, and Translation (RST) compensation for images were also developed in this PhD work to be used in conjunction with the two-stage image retrieval algorithm. The developed algorithms are implemented, using a commercial fixed-point Digital Signal Processor (DSP), into a device, called ‘PictoBar’, in the domain of Alternative and Augmentative Communication (AAC). The PictoBar is intended to be used in the field of electronic aid for disabled people, in areas like speech rehabilitation therapy, education etc. The PictoBar is able to recognize pictograms and pictures contained in a database. Once an image is found in the database, a corresponding associated speech message is played. A methodology for optimal implementation and systematic testing of the developed image retrieval algorithms on a fixed point DSP is also established as part of this PhD work

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Wasserstein β-Diversity Metrics over Graphs: Derivation, Efficient Computation and Applications

Author: McClelland Jason
Publication venue: 'Oregon State University'
Publication date
Field of study

Microbial ecology has been transformed by metagenomics, the study of the genetic in-formation in entire communities of organisms. In the following we develop metagenomic tools arising from the classic Wasserstein metric as applied to questions regarding the diversity between microbial communities. We provide a novel proof of the characteriza-tion of the successful UniFrac metric as the Wasserstein metric over a graph-theoretic tree, and use the proof to develop an extremely e°cient computational algorithm. The analytic framework we develop is then leveraged to provide formulations for the distri-bution of this Wasserstein based metric. We implement these ideas and demonstrate their utility on realworld datasets. We next turn to applying the Wasserstein metric as a reference-free diversity metric by utilizing de Bruijn graphs, mathematical structures at the heart of genome assembly techniques. We show how these techniques are related to established phylogenetically-aware diversity metrics. We then implement our results using newly developed approximation techniques for the computation of the Wasserstein metric and demonstrate this novel metric's utility in comparison to established metrics

ScholarsArchive@OSU

Document image analysis and recognition: a survey

Author: Arlazarov V.V.
Bulatov K.B.
Nikolaev D.P.
Petrova O.O.
Savelev B.I.
Slavin O.A.
Publication venue: 'Samara State National Research University'
Publication date: 01/08/2022
Field of study

This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC “Smart Engines Service”), Yuliya Chernysheva (LLC “Smart Engines Service”), Yuliya Shemyakina (LLC “Smart Engines Service”) for valuable comments and suggestions

Samara University

3D object retrieval and segmentation: various approaches including 2D poisson histograms and 3D electrical charge distributions.

Author: Alizadeh Fattah
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2014
Field of study

Nowadays 3D models play an important role in many applications: viz. games, cultural heritage, medical imaging etc. Due to the fast growth in the number of available 3D models, understanding, searching and retrieving such models have become interesting fields within computer vision. In order to search and retrieve 3D models, we present two different approaches: one is based on solving the Poisson Equation over 2D silhouettes of the models. This method uses 60 different silhouettes, which are automatically extracted from different viewangles. Solving the Poisson equation for each silhouette assigns a number to each pixel as its signature. Accumulating these signatures generates a final histogram-based descriptor for each silhouette, which we call a SilPH (Silhouette Poisson Histogram). For the second approach, we propose two new robust shape descriptors based on the distribution of charge density on the surface of a 3D model. The Finite Element Method is used to calculate the charge density on each triangular face of each model as a local feature. Then we utilize the Bag-of-Features and concentric sphere frameworks to perform global matching using these local features. In addition to examining the retrieval accuracy of the descriptors in comparison to the state-of-the-art approaches, the retrieval speeds as well as robustness to noise and deformation on different datasets are investigated. On the other hand, to understand new complex models, we have also utilized distribution of electrical charge for proposing a system to decompose models into meaningful parts. Our robust, efficient and fully-automatic segmentation approach is able to specify the segments attached to the main part of a model as well as locating the boundary parts of the segments. The segmentation ability of the proposed system is examined on the standard datasets and its timing and accuracy are compared with the existing state-of-the-art approaches

Irish Universities

DCU Online Research Access Service