68 research outputs found

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Matching sets of features for efficient retrieval and recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 145-153).In numerous domains it is useful to represent a single example by the collection of local features or parts that comprise it. In computer vision in particular, local image features are a powerful way to describe images of objects and scenes. Their stability under variable image conditions is critical for success in a wide range of recognition and retrieval applications. However, many conventional similarity measures and machine learning algorithms assume vector inputs. Comparing and learning from images represented by sets of local features is therefore challenging, since each set may vary in cardinality and its elements lack a meaningful ordering. In this thesis I present computationally efficient techniques to handle comparisons, learning, and indexing with examples represented by sets of features. The primary goal of this research is to design and demonstrate algorithms that can effectively accommodate this useful representation in a way that scales with both the representation size as well as the number of images available for indexing or learning. I introduce the pyramid match algorithm, which efficiently forms an implicit partial matching between two sets of feature vectors.(cont.) The matching has a linear time complexity, naturally forms a Mercer kernel, and is robust to clutter or outlier features, a critical advantage for handling images with variable backgrounds, occlusions, and viewpoint changes. I provide bounds on the expected error relative to the optimal partial matching. For very large databases, even extremely efficient pairwise comparisons may not offer adequately responsive query times. I show how to perform sub-linear time retrievals under the matching measure with randomized hashing techniques, even when input sets have varying numbers of features. My results are focused on several important vision tasks, including applications to content-based image retrieval, discriminative classification for object recognition, kernel regression, and unsupervised learning of categories. I show how the dramatic increase in performance enables accurate and flexible image comparisons to be made on large-scale data sets, and removes the need to artificially limit the number of local descriptions used per image when learning visual categories.by Kristen Lorraine Grauman.Ph.D

    Probabilistic approaches to matching and modelling shapes

    Get PDF

    Generation and optimisation of real-world static and dynamic location-allocation problems with application to the telecommunications industry.

    Get PDF
    The location-allocation (LA) problem concerns the location of facilities and the allocation of demand, to minimise or maximise a particular function such as cost, profit or a measure of distance. Many formulations of LA problems have been presented in the literature to capture and study the unique aspects of real-world problems. However, some real-world aspects, such as resilience, are still lacking in the literature. Resilience ensures uninterrupted supply of demand and enhances the quality of service. Due to changes in population shift, market size, and the economic and labour markets - which often cause demand to be stochastic - a reasonable LA problem formulation should consider some aspect of future uncertainties. Almost all LA problem formulations in the literature that capture some aspect of future uncertainties fall in the domain of dynamic optimisation problems, where new facilities are located every time the environment changes. However, considering the substantial cost associated with locating a new facility, it becomes infeasible to locate facilities each time the environment changes. In this study, we propose and investigate variations of LA problem formulations. Firstly, we develop and study new LA formulations, which extend the location of facilities and the allocation of demand to add a layer of resilience. We apply the population-based incremental learning algorithm for the first time in the literature to solve the new novel LA formulations. Secondly, we propose and study a new dynamic formulation of the LA problem where facilities are opened once at the start of a defined period and are expected to be satisfactory in servicing customers' demands irrespective of changes in customer distribution. The problem is based on the idea that customers will change locations over a defined period and that these changes have to be taken into account when establishing facilities to service changing customers' distributions. Thirdly, we employ a simulation-based optimisation approach to tackle the new dynamic formulation. Owing to the high computational costs associated with simulation-based optimisation, we investigate the concept of Racing, an approach used in model selection, to reduce the high computational cost by employing the minimum number of simulations for solution selection

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Vision based localization of mobile robots

    Get PDF
    Mobile robotics is an active and exciting sub-field of Computer Science. Its importance is easily witnessed in a variety of undertakings from DARPA\u27s Grand Challenge to NASA\u27s Mars exploration program. The field is relatively young, and still many challenges face roboticists across the board. One important area of research is localization, which concerns itself with granting a robot the ability to discover and continually update an internal representation of its position. Vision based sensor systems have been investigated [8,22,27], but to much lesser extent than other popular techniques [4,6,7,9,10]. A custom mobile platform has been constructed on top of which a monocular vision based localization system has been implemented. The rigorous gathering of empirical data across a large group of parameters germane to the problem has led to various findings about monocular vision based localization and the fitness of the custom robot platform. The localization component is based on a probabilistic technique called Monte-Carlo Localization (MCL) that tolerates a variety of different sensors and effectors, and has further proven to be adept at localization in diverse circumstances. Both a motion model and sensor model that drive the particle filter at the algorithm\u27s core have been carefully derived. The sensor model employs a simple correlation process that leverages color histograms and edge detection to filter robot pose estimations via the on board vision. This algorithm relies on image matching to tune position estimates based on a priori knowledge of its environment in the form of a feature library. It is believed that leveraging different computationally inexpensive features can lead to efficient and robust localization with MCL. The central goal of this thesis is to implement and arrive at such a conclusion through the gathering of empirical data. Section 1 presents a brief introduction to mobile robot localization and robot architectures, while section 2 covers MCL itself in more depth. Section 3 elaborates on the localization strategy, modeling and implementation that forms the basis of the trials that are presented toward the end of that section. Section 4 presents a revised implementation that attempts to address shortcomings identified during localization trials. Finally in section 5, conclusions are drawn about the effectiveness of the localization implementation and a path to improved localization with monocular vision is posited

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Enhancing Face Recognition with Deep Learning Architectures: A Comprehensive Review

    Get PDF
    The progression of information discernment via facial identification and the emergence of innovative frameworks has exhibited remarkable strides in recent years. This phenomenon has been particularly pronounced within the realm of verifying individual credentials, a practice prominently harnessed by law enforcement agencies to advance the field of forensic science. A multitude of scholarly endeavors have been dedicated to the application of deep learning techniques within machine learning models. These endeavors aim to facilitate the extraction of distinctive features and subsequent classification, thereby elevating the precision of unique individual recognition. In the context of this scholarly inquiry, the focal point resides in the exploration of deep learning methodologies tailored for the realm of facial recognition and its subsequent matching processes. This exploration centers on the augmentation of accuracy through the meticulous process of training models with expansive datasets. Within the confines of this research paper, a comprehensive survey is conducted, encompassing an array of diverse strategies utilized in facial recognition. This survey, in turn, delves into the intricacies and challenges that underlie the intricate field of facial recognition within imagery analysis
    corecore