23 research outputs found

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Deformable Voxel Grids for Shape Comparisons

    Full text link
    We present Deformable Voxel Grids (DVGs) for 3D shapes comparison and processing. It consists of a voxel grid which is deformed to approximate the silhouette of a shape, via energy-minimization. By interpreting the DVG as a local coordinates system, it provides a better embedding space than a regular voxel grid, since it is adapted to the geometry of the shape. It also allows to deform the shape by moving the control points of the DVG, in a similar manner to the Free Form Deformation, but with easier interpretability of the control points positions. After proposing a computation scheme of the energies compatible with meshes and pointclouds, we demonstrate the use of DVGs in a variety of applications: correspondences via cubification, style transfer, shape retrieval and PCA deformations. The first two require no learning and can be readily run on any shapes in a matter of minutes on modest hardware. As for the last two, they require to first optimize DVGs on a collection of shapes, which amounts to a pre-processing step. Then, determining PCA coordinates is straightforward and brings a few parameters to deform a shape

    Enhancing Face Recognition with Deep Learning Architectures: A Comprehensive Review

    Get PDF
    The progression of information discernment via facial identification and the emergence of innovative frameworks has exhibited remarkable strides in recent years. This phenomenon has been particularly pronounced within the realm of verifying individual credentials, a practice prominently harnessed by law enforcement agencies to advance the field of forensic science. A multitude of scholarly endeavors have been dedicated to the application of deep learning techniques within machine learning models. These endeavors aim to facilitate the extraction of distinctive features and subsequent classification, thereby elevating the precision of unique individual recognition. In the context of this scholarly inquiry, the focal point resides in the exploration of deep learning methodologies tailored for the realm of facial recognition and its subsequent matching processes. This exploration centers on the augmentation of accuracy through the meticulous process of training models with expansive datasets. Within the confines of this research paper, a comprehensive survey is conducted, encompassing an array of diverse strategies utilized in facial recognition. This survey, in turn, delves into the intricacies and challenges that underlie the intricate field of facial recognition within imagery analysis

    Weather Image Generation using a Generative Adversarial Network

    Get PDF
    This thesis will inspect, if coupling a simple U-Net segmentation model with an image-to-image transformation Generative Adversarial Network, CycleGAN, will improve data augmentation result compared to sole CycleGAN. To evaluate the proposed method, a dataset consisting of weather images of different weather conditions and corresponding segmentation masks is used. Furthermore, we investigate the performance of different pre-trained CNNs in the encoder part of the U-Net model. The main goal is to provide a solution for generating data to be used in future data augmentation projects for real applications. Images of the proposed segmentation and CycleGAN model pipeline will be evaluated with Fréchet Inception Distance metric, and compared to sole CycleGAN results. The results indicate that there is an increase in generated image quality by coupling a segmentation model with a generator of CycleGAN, at least with the used dataset. Additional improvements might be achieved by adding an attention model to the pipeline or changing the segmentation or generative adversarial network model architectures.Tämä tutkielma selvittää, tuottaako yksinkertaisen U-Net segmentaatiomallin yhdistäminen kuvasta-kuvaan generatiiviseen vastakkaisverkkoon, CycleGANiin, parempia tuloksia kuin pelkkä CycleGAN. Esitetyn ratkaisun arvioimiseksi käytetään sääkuvista ja niitä vastaavista segmentaatioleimoista koostuvaa datasettiä. Lisäksi tutkimme, paljonko eroavaisuuksia esiopetetuilla CNN:llä on U-Net arkkitehtuurin enkooderissa suorituskyvyn osalta. Tutkielman päätavoite on tuottaa ratkaisu uuden datan generoimiseksi reaalimailman sovelluskohteisiin. Ehdotetun segmentaatio- ja CycleGAN-mallista koostuvan liukuhihnan suorituskyky arvioidaan Fréchetin aloitusetäisyys-menetelmällä, jota myös verrataan pelkällä CycleGANilla saatuihin tuloksiin. Tutkielman tulokset implikoivat, että kuvanlaatu nousee esitettyä liukuhihnaa käyttämällä ainakin kyseessä olevalla datasetillä. Lisäparannuksia voi saada aikaan liukuhihnaan erillisen huomiomallin tai muuttamalla segmentaatio- tai generatiivisen vastakkaisverkon arkkitehtuuri

    A comprehensive survey on generative adversarial networks

    Get PDF
    Generative Adversarial Networks (GANs) are a class of neural network architectures that have been used to generate a wide variety of realistic data, including images, videos, and audio. GANs consist of two main components: a generator network, which produces new data, and a discriminator network, which attempts to distinguish the generated data from real data. The two networks are trained in a competitive manner, with the generator trying to produce data that can fool the discriminator, and the discriminator trying to correctly identify the generated data. Since their introduction in 2014, GANs have been applied to a wide range of tasks, such as image synthesis, image-to-image translation, and text-to-image synthesis. GANs have also been used in various fields such as computer vision, natural language processing, and speech recognition. Despite their success, GANs have several limitations and challenges, including mode collapse, where the generator produces only a limited number of distinct samples, and instability during training. Several methods have been proposed to address these challenges, including regularization techniques, architectural modifications, and alternative training algorithms. Overall, GANs have proven to be a powerful tool for generating realistic data, and research on GANs is an active area of study in the field of machine learning. This survey paper aims to provide an overview of the GANs architecture and its variants, applications and challenges, and the recent developments in GANs

    SkinCAN AI: A deep learning-based skin cancer classification and segmentation pipeline designed along with a generative model

    Get PDF
    The rarity of Melanoma skin cancer accounts for the dataset collected to be limited and highly skewed, as benign moles can easily mimic the impression of the melanoma-affected area. Such an imbalanced dataset makes training any deep learning classifier network harder by affecting the training stability. We have an intuition that synthesizing such skin lesion medical images could help solve the issue of overfitting in training networks and assist in enforcing the anonymization of actual patients. Despite multiple previous attempts, none of the models were practical for the fast-paced clinical environment. In this thesis, we propose a novel pipeline named SkinCAN AI, inspired by StyleGAN but designed explicitly considering the limitations of the skin lesion dataset and emphasizing the requirement of a faster optimized diagnostic tool that can be easily inferred and integrated into the clinical environment. Our SkinCAN AI model is equipped with its module of adaptive discriminator augmentation that enables limited target data distribution to be learned and artificial data points to be sampled, which further assist the classifier network in learning semantic features. We elucidate the novelty of our SkinCAN AI pipeline by integrating the soft attention module in the classifier network. This module yields an attention mask analyzed by DenseNet201 to focus on learning relevant semantic features from skin lesion images without using any heavy computational burden of artifact removal software. The SkinGAN model achieves an FID score of 0.622 while allowing its synthetic samples to train the DenseNet201 model with an accuracy of 0.9494, AUC of 0.938, specificity of 0.969, and sensitivity of 0.695. We provide evidence in our thesis that our proposed pipelines outperform other state-of-the-art existing networks developed for this task of early diagnosis

    Flow pattern analysis for magnetic resonance velocity imaging

    Get PDF
    Blood flow in the heart is highly complex. Although blood flow patterns have been investigated by both computational modelling and invasive/non-invasive imaging techniques, their evolution and intrinsic connection with cardiovascular disease has yet to be explored. Magnetic resonance (MR) velocity imaging provides a comprehensive distribution of multi-directional in vivo flow distribution so that detailed quantitative analysis of flow patterns is now possible. However, direct visualisation or quantification of vector fields is of little clinical use, especially for inter-subject or serial comparison of changes in flow patterns due to the progression of the disease or in response to therapeutic measures. In order to achieve a comprehensive and integrated description of flow in health and disease, it is necessary to characterise and model both normal and abnormal flows and their effects. To accommodate the diversity of flow patterns in relation to morphological and functional changes, we have described in this thesis an approach of detecting salient topological features prior to analytical assessment of dynamical indices of the flow patterns. To improve the accuracy of quantitative analysis of the evolution of topological flow features, it is essential to restore the original flow fields so that critical points associated with salient flow features can be more reliably detected. We propose a novel framework for the restoration, abstraction, extraction and tracking of flow features such that their dynamic indices can be accurately tracked and quantified. The restoration method is formulated as a constrained optimisation problem to remove the effects of noise and to improve the consistency of the MR velocity data. A computational scheme is derived from the First Order Lagrangian Method for solving the optimisation problem. After restoration, flow abstraction is applied to partition the entire flow field into clusters, each of which is represented by a local linear expansion of its velocity components. This process not only greatly reduces the amount of data required to encode the velocity distribution but also permits an analytical representation of the flow field from which critical points associated with salient flow features can be accurately extracted. After the critical points are extracted, phase portrait theory can be applied to separate them into attracting/repelling focuses, attracting/repelling nodes, planar vortex, or saddle. In this thesis, we have focused on vortical flow features formed in diastole. To track the movement of the vortices within a cardiac cycle, a tracking algorithm based on relaxation labelling is employed. The constraints and parameters used in the tracking algorithm are designed using the characteristics of the vortices. The proposed framework is validated with both simulated and in vivo data acquired from patients with sequential MR examination following myocardial infarction. The main contribution of the thesis is in the new vector field restoration and flow feature abstraction method proposed. They allow the accurate tracking and quantification of dynamic indices associated with salient features so that inter- and intra-subject comparisons can be more easily made. This provides further insight into the evolution of blood flow patterns and permits the establishment of links between blood flow patterns and localised genesis and progression of cardiovascular disease.Open acces

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    A new attribute measuring the contour smoothness of 2-D objects is presented in the context of morphological attribute filtering. The attribute is based on the ratio of the circularity and non-compactness, and has a maximum of 1 for a perfect circle. It decreases as the object boundary becomes irregular. Computation on hierarchical image representation structures relies on five auxiliary data members and is rapid. Contour smoothness is a suitable descriptor for detecting and discriminating man-made structures from other image features. An example is demonstrated on a very-high-resolution satellite image using connected pattern spectra and the switchboard platform
    corecore