48 research outputs found

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Hand tracking and bimanual movement understanding

    Get PDF
    Bimanual movements are a subset ot human movements in which the two hands move together in order to do a task or imply a meaning A bimanual movement appearing in a sequence of images must be understood in order to enable computers to interact with humans in a natural way This problem includes two main phases, hand tracking and movement recognition. We approach the problem of hand tracking from a neuroscience point ot view First the hands are extracted and labelled by colour detection and blob analysis algorithms In the presence of the two hands one hand may occlude the other occasionally Therefore, hand occlusions must be detected in an image sequence A dynamic model is proposed to model the movement of each hand separately Using this model in a Kalman filtering proccss the exact starting and end points of hand occlusions are detected We exploit neuroscience phenomena to understand the beha\ tour of the hands during occlusion periods Based on this, we propose a general hand tracking algorithm to track and reacquire the hands over a movement including hand occlusion The advantages of the algorithm and its generality are demonstrated in the experiments. In order to recognise the movements first we recognise the movement of a hand Using statistical pattern recognition methods (such as Principal Component Analysis and Nearest Neighbour) the static shape of each hand appearing in an image is recognised A Graph- Matching algorithm and Discrete Midden Markov Models (DHMM) as two spatio-temporal pattern recognition techniques are imestigated tor recognising a dynamic hand gesture For recognising bimanual movements we consider two general forms ot these movements, single and concatenated periodic We introduce three Bayesian networks for recognising die movements The networks are designed to recognise and combinc the gestures of the hands in order to understand the whole movement Experiments on different types ot movement demonstrate the advantages and disadvantages of each network

    VISUAL OBJECT DETECTION BY COLOR, SHAPE AND DIMENSION (DETECCIÓN VISUAL DE OBJETOS POR COLOR, FORMA Y DIMENSIÓN)

    Get PDF
    Abstract 3D neural object detection by color, shape and dimension (3DOD-CSD) is a novel and powerful tool that can be used to build object detection systems in environments with uncontrolled illumination. Inspired by the global structure of the human visual system, it uses a neural network in the classification stage and determines the physical dimension of the object’s fea-tures using commercial digital cameras calibrated in stereo configuration. This permits the analysis of images of objects with the same form and color but different dimensions, such as scaled replicas or photographs of a photograph of a 3D object. With this method, a fixed distance from the camera to the object to be analyzed is not necessary - essential for a dynamic recognition system in changing conditions. The results show strong discrimination between desired and undesired objects. This system has many possible applications, including face identification and object selection in varying environments, utility pole detection, coin detection, and more. Keywords: Color, dimensional features, invariant features, neural network, stereo vision, 3D object recognition. Resumen La detección de objetos 3D por color, forma y dimensión (3DOD-CSD) es una herramienta para construir sistemas de detección de objetos en entornos con iluminación incontrolada. Inspirado en la estructura global del sistema visual humano, utiliza una red neuronal en la etapa de clasificación y determina la dimensión física de las características del objeto utilizando cámaras digitales comerciales calibradas en configuración estéreo. Esto permite el análisis de imágenes de objetos con la misma forma y color, pero de diferentes dimensiones, como réplicas a escala o fotografías de una fotografía de un objeto 3D. Con este método, no es necesaria una distancia fija entre la cámara y el objeto a analizar, algo esencial para un sistema de reconocimiento dinámico en condiciones cambiantes. Los resultados muestran una fuerte discriminación entre objetos deseados y no deseados. Este sistema tiene muchas aplicaciones posibles, incluida la identificación de rostros y la selección de objetos en entornos variantes, detección de postes, detección de monedas, entre otros. Palabras Clave: Color, características dimensionales, características invariantes, reconocimiento de objetos 3D, red neuronal, visión estéreo

    Graphical models for visual object recognition and tracking

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

    Reconnaissance Biométrique par Fusion Multimodale de Visages

    Get PDF
    Biometric systems are considered to be one of the most effective methods of protecting and securing private or public life against all types of theft. Facial recognition is one of the most widely used methods, not because it is the most efficient and reliable, but rather because it is natural and non-intrusive and relatively accepted compared to other biometrics such as fingerprint and iris. The goal of developing biometric applications, such as facial recognition, has recently become important in smart cities. Over the past decades, many techniques, the applications of which include videoconferencing systems, facial reconstruction, security, etc. proposed to recognize a face in a 2D or 3D image. Generally, the change in lighting, variations in pose and facial expressions make 2D facial recognition less than reliable. However, 3D models may be able to overcome these constraints, except that most 3D facial recognition methods still treat the human face as a rigid object. This means that these methods are not able to handle facial expressions. In this thesis, we propose a new approach for automatic face verification by encoding the local information of 2D and 3D facial images as a high order tensor. First, the histograms of two local multiscale descriptors (LPQ and BSIF) are used to characterize both 2D and 3D facial images. Next, a tensor-based facial representation is designed to combine all the features extracted from 2D and 3D faces. Moreover, to improve the discrimination of the proposed tensor face representation, we used two multilinear subspace methods (MWPCA and MDA combined with WCCN). In addition, the WCCN technique is applied to face tensors to reduce the effect of intra-class directions using a normalization transform, as well as to improve the discriminating power of MDA. Our experiments were carried out on the three largest databases: FRGC v2.0, Bosphorus and CASIA 3D under different facial expressions, variations in pose and occlusions. The experimental results have shown the superiority of the proposed approach in terms of verification rate compared to the recent state-of-the-art method

    Computational methods for the analysis of functional 4D-CT chest images.

    Get PDF
    Medical imaging is an important emerging technology that has been intensively used in the last few decades for disease diagnosis and monitoring as well as for the assessment of treatment effectiveness. Medical images provide a very large amount of valuable information that is too huge to be exploited by radiologists and physicians. Therefore, the design of computer-aided diagnostic (CAD) system, which can be used as an assistive tool for the medical community, is of a great importance. This dissertation deals with the development of a complete CAD system for lung cancer patients, which remains the leading cause of cancer-related death in the USA. In 2014, there were approximately 224,210 new cases of lung cancer and 159,260 related deaths. The process begins with the detection of lung cancer which is detected through the diagnosis of lung nodules (a manifestation of lung cancer). These nodules are approximately spherical regions of primarily high density tissue that are visible in computed tomography (CT) images of the lung. The treatment of these lung cancer nodules is complex, nearly 70% of lung cancer patients require radiation therapy as part of their treatment. Radiation-induced lung injury is a limiting toxicity that may decrease cure rates and increase morbidity and mortality treatment. By finding ways to accurately detect, at early stage, and hence prevent lung injury, it will have significant positive consequences for lung cancer patients. The ultimate goal of this dissertation is to develop a clinically usable CAD system that can improve the sensitivity and specificity of early detection of radiation-induced lung injury based on the hypotheses that radiated lung tissues may get affected and suffer decrease of their functionality as a side effect of radiation therapy treatment. These hypotheses have been validated by demonstrating that automatic segmentation of the lung regions and registration of consecutive respiratory phases to estimate their elasticity, ventilation, and texture features to provide discriminatory descriptors that can be used for early detection of radiation-induced lung injury. The proposed methodologies will lead to novel indexes for distinguishing normal/healthy and injured lung tissues in clinical decision-making. To achieve this goal, a CAD system for accurate detection of radiation-induced lung injury that requires three basic components has been developed. These components are the lung fields segmentation, lung registration, and features extraction and tissue classification. This dissertation starts with an exploration of the available medical imaging modalities to present the importance of medical imaging in today’s clinical applications. Secondly, the methodologies, challenges, and limitations of recent CAD systems for lung cancer detection are covered. This is followed by introducing an accurate segmentation methodology of the lung parenchyma with the focus of pathological lungs to extract the volume of interest (VOI) to be analyzed for potential existence of lung injuries stemmed from the radiation therapy. After the segmentation of the VOI, a lung registration framework is introduced to perform a crucial and important step that ensures the co-alignment of the intra-patient scans. This step eliminates the effects of orientation differences, motion, breathing, heart beats, and differences in scanning parameters to be able to accurately extract the functionality features for the lung fields. The developed registration framework also helps in the evaluation and gated control of the radiotherapy through the motion estimation analysis before and after the therapy dose. Finally, the radiation-induced lung injury is introduced, which combines the previous two medical image processing and analysis steps with the features estimation and classification step. This framework estimates and combines both texture and functional features. The texture features are modeled using the novel 7th-order Markov Gibbs random field (MGRF) model that has the ability to accurately models the texture of healthy and injured lung tissues through simultaneously accounting for both vertical and horizontal relative dependencies between voxel-wise signals. While the functionality features calculations are based on the calculated deformation fields, obtained from the 4D-CT lung registration, that maps lung voxels between successive CT scans in the respiratory cycle. These functionality features describe the ventilation, the air flow rate, of the lung tissues using the Jacobian of the deformation field and the tissues’ elasticity using the strain components calculated from the gradient of the deformation field. Finally, these features are combined in the classification model to detect the injured parts of the lung at an early stage and enables an earlier intervention

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Decision making with reciprocal chains and binary neural network models

    Get PDF
    Automated decision making systems are relied on in increasingly diverse and critical settings. Human users expect such systems to improve or augment their own decision making in complex scenarios, in real time, often across distributed networks of devices. This thesis studies binary decision making systems of two forms. The rst system is built from a reciprocal chain, a statistical model able to capture the intentional behaviour of targets moving through a statespace, such as moving towards a destination state. The rst part of the thesis questions the utility of this higher level information in a tracking problem where the system must decide whether a target exists or not. The contributions of this study characterise the bene ts to be expected from reciprocal chains for tracking, using statistical tools and a novel simulation environment that provides relevant numerical experiments. Real world decision making systems often combine statistical models, such as the reciprocal chain, with the second type of system studied in this thesis, a neural network. In the tracking context, a neural network typically forms the object detection system. However, the power consumption and memory usage of state of the art neural networks makes their use on small devices infeasible. This motivates the study of binary neural networks in the second part of the thesis. Such networks use less memory and are e cient to run, compared to standard full precision networks. However, their optimisation is di cult, due to the non-di erentiable functions involved. Several algorithms elect to optimise surrogate networks that are di erentiable and correspond in some way to the original binary network. Unfortunately, the many choices involved in the algorithm design are poorly understood. The second part of the thesis questions the role of parameter initialisation in the optimisation of binary neural networks. Borrowing analytic tools from statistical physics, it is possible to characterise the typical behaviour of a range of algorithms at initialisation precisely, by studying how input signals propagate through these networks on average. This theoretical development also yields practical outcomes, providing scales that limit network depth and suggesting new initialisation methods for binary neural networks.Thesis (Ph.D.) -- University of Adelaide, School of Electrical & Electronic Engineering, 202

    3D FACE RECOGNITION USING LOCAL FEATURE BASED METHODS

    Get PDF
    Face recognition has attracted many researchers’ attention compared to other biometrics due to its non-intrusive and friendly nature. Although several methods for 2D face recognition have been proposed so far, there are still some challenges related to the 2D face including illumination, pose variation, and facial expression. In the last few decades, 3D face research area has become more interesting since shape and geometry information are used to handle challenges from 2D faces. Existing algorithms for face recognition are divided into three different categories: holistic feature-based, local feature-based, and hybrid methods. According to the literature, local features have shown better performance relative to holistic feature-based methods under expression and occlusion challenges. In this dissertation, local feature-based methods for 3D face recognition have been studied and surveyed. In the survey, local methods are classified into three broad categories which consist of keypoint-based, curve-based, and local surface-based methods. Inspired by keypoint-based methods which are effective to handle partial occlusion, structural context descriptor on pyramidal shape maps and texture image has been proposed in a multimodal scheme. Score-level fusion is used to combine keypoints’ matching score in both texture and shape modalities. The survey shows local surface-based methods are efficient to handle facial expression. Accordingly, a local derivative pattern is introduced to extract distinct features from depth map in this work. In addition, the local derivative pattern is applied on surface normals. Most 3D face recognition algorithms are focused to utilize the depth information to detect and extract features. Compared to depth maps, surface normals of each point can determine the facial surface orientation, which provides an efficient facial surface representation to extract distinct features for recognition task. An Extreme Learning Machine (ELM)-based auto-encoder is used to make the feature space more discriminative. Expression and occlusion robust analysis using the information from the normal maps are investigated by dividing the facial region into patches. A novel hybrid classifier is proposed to combine Sparse Representation Classifier (SRC) and ELM classifier in a weighted scheme. The proposed algorithms have been evaluated on four widely used 3D face databases; FRGC, Bosphorus, Bu-3DFE, and 3D-TEC. The experimental results illustrate the effectiveness of the proposed approaches. The main contribution of this work lies in identification and analysis of effective local features and a classification method for improving 3D face recognition performance
    corecore