60 research outputs found

    Rich probabilistic models for semantic labeling

    Get PDF
    Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung

    Recent Progress in Image Deblurring

    Full text link
    This paper comprehensively reviews the recent development of image deblurring, including non-blind/blind, spatially invariant/variant deblurring techniques. Indeed, these techniques share the same objective of inferring a latent sharp image from one or several corresponding blurry images, while the blind deblurring techniques are also required to derive an accurate blur kernel. Considering the critical role of image restoration in modern imaging systems to provide high-quality images under complex environments such as motion, undesirable lighting conditions, and imperfect system components, image deblurring has attracted growing attention in recent years. From the viewpoint of how to handle the ill-posedness which is a crucial issue in deblurring tasks, existing methods can be grouped into five categories: Bayesian inference framework, variational methods, sparse representation-based methods, homography-based modeling, and region-based methods. In spite of achieving a certain level of development, image deblurring, especially the blind case, is limited in its success by complex application conditions which make the blur kernel hard to obtain and be spatially variant. We provide a holistic understanding and deep insight into image deblurring in this review. An analysis of the empirical evidence for representative methods, practical issues, as well as a discussion of promising future directions are also presented.Comment: 53 pages, 17 figure

    Event-based neuromorphic stereo vision

    Full text link

    Statistical models for natural scene data

    Get PDF
    This thesis considers statistical modelling of natural image data. Obtaining advances in this field can have significant impact for both engineering applications, and for the understanding of the human visual system. Several recent advances in natural image modelling have been obtained with the use of unsupervised feature learning. We consider a class of such models, restricted Boltzmann machines (RBMs), used in many recent state-of-the-art image models. We develop extensions of these stochastic artificial neural networks, and use them as a basis for building more effective image models, and tools for computational vision. We first develop a novel framework for obtaining Boltzmann machines, in which the hidden unit activations co-transform with transformed input stimuli in a stable and predictable way throughout the network. We define such models to be transformation equivariant. Such properties have been shown useful for computer vision systems, and have been motivational for example in the development of steerable filters, a widely used classical feature extraction technique. Translation equivariant feature sharing has been the standard method for scaling image models beyond patch-sized data to large images. In our framework we extend shallow and deep models to account for other kinds of transformations as well, focusing on in-plane rotations. Motivated by the unsatisfactory results of current generative natural image models, we take a step back, and evaluate whether they are able to model a subclass of the data, natural image textures. This is a necessary subcomponent of any credible model for visual scenes. We assess the performance of a state- of-the-art model of natural images for texture generation, using a dataset and evaluation techniques from in prior work. We also perform a dissection of the model architecture, uncovering the properties important for good performance. Building on this, we develop structured extensions for more complicated data comprised of textures from multiple classes, using the single-texture model architecture as a basis. These models are shown to be able to produce state-of-the-art texture synthesis results quantitatively, and are also effective qualitatively. It is demonstrated empirically that the developed multiple-texture framework provides a means to generate images of differently textured regions, more generic globally varying textures, and can also be used for texture interpolation, where the approach is radically dfferent from the others in the area. Finally we consider visual boundary prediction from natural images. The work aims to improve understanding of Boltzmann machines in the generation of image segment boundaries, and to investigate deep neural network architectures for learning the boundary detection problem. The developed networks (which avoid several hand-crafted model and feature designs commonly used for the problem), produce the fastest reported inference times in the literature, combined with state-of-the-art performance

    Advanced Statistical Modeling for Model-Based Iterative Reconstruction for Single-Energy and Dual-Energy X-Ray CT

    Get PDF
    Model-based iterative reconstruction (MBIR) has been increasingly broadly applied as an improvement over traditional, analytical image reconstruction methods in X-ray CT, primarily due to its significant advantage in drastic dose reduction without diagnostic loss. Early success of the method in conventional CT has encouraged the extension to a wide range of applications that includes more advanced imaging modalities, such as dual-energy X-ray CT, and more challenging imaging conditions, such as low-dose and sparse-sampling scans, each requiring refined statistical models including the data model and the prior model. In this dissertation, we developed an MBIR algorithm for dual-energy CT that included a joint data-likelihood model to account for correlated data noise. Moreover, we developed a Gaussian-Mixture Markov random filed (GM-MRF) image model that can be used as a very expressive prior model in MBIR for X-ray CT reconstruction. The GM-MRF model is formed by merging individual patch-based Gaussian-mixture models and therefore leads to an expressive MRF model with easily estimated parameters. Experimental results with phantom and clinical datasets have demonstrated the improvement in image quality due to the advanced statistical modeling

    Contributions of Continuous Max-Flow Theory to Medical Image Processing

    Get PDF
    Discrete graph cuts and continuous max-flow theory have created a paradigm shift in many areas of medical image processing. As previous methods limited themselves to analytically solvable optimization problems or guaranteed only local optimizability to increasingly complex and non-convex functionals, current methods based now rely on describing an optimization problem in a series of general yet simple functionals with a global, but non-analytic, solution algorithms. This has been increasingly spurred on by the availability of these general-purpose algorithms in an open-source context. Thus, graph-cuts and max-flow have changed every aspect of medical image processing from reconstruction to enhancement to segmentation and registration. To wax philosophical, continuous max-flow theory in particular has the potential to bring a high degree of mathematical elegance to the field, bridging the conceptual gap between the discrete and continuous domains in which we describe different imaging problems, properties and processes. In Chapter 1, we use the notion of infinitely dense and infinitely densely connected graphs to transfer between the discrete and continuous domains, which has a certain sense of mathematical pedantry to it, but the resulting variational energy equations have a sense of elegance and charm. As any application of the principle of duality, the variational equations have an enigmatic side that can only be decoded with time and patience. The goal of this thesis is to show the contributions of max-flow theory through image enhancement and segmentation, increasing incorporation of topological considerations and increasing the role played by user knowledge and interactivity. These methods will be rigorously grounded in calculus of variations, guaranteeing fuzzy optimality and providing multiple solution approaches to addressing each individual problem

    Face Recognition and Facial Attribute Analysis from Unconstrained Visual Data

    Get PDF
    Analyzing human faces from visual data has been one of the most active research areas in the computer vision community. However, it is a very challenging problem in unconstrained environments due to variations in pose, illumination, expression, occlusion and blur between training and testing images. The task becomes even more difficult when only a limited number of images per subject is available for modeling these variations. In this dissertation, different techniques for performing classification of human faces as well as other facial attributes such as expression, age, gender, and head pose in uncontrolled settings are investigated. In the first part of the dissertation, a method for reconstructing the virtual frontal view from a given non-frontal face image using Markov Random Fields (MRFs) and an efficient variant of the Belief Propagation (BP) algorithm is introduced. In the proposed approach, the input face image is divided into a grid of overlapping patches and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade (LK) algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks as well as no head pose estimation is needed. In the second part, the task of face recognition in unconstrained settings is formulated as a domain adaptation problem. The domain shift is accounted for by deriving a latent subspace or domain, which jointly characterizes the multifactor variations using appropriate image formation models for each factor. The latent domain is defined as a product of Grassmann manifolds based on the underlying geometry of the tensor space, and recognition is performed across domain shift using statistics consistent with the tensor geometry. More specifically, given a face image from the source or target domain, multiple images of that subject are first synthesized under different illuminations, blur conditions, and 2D perturbations to form a tensor representation of the face. The orthogonal matrices obtained from the decomposition of this tensor, where each matrix corresponds to a factor variation, are used to characterize the subject as a point on a product of Grassmann manifolds. For cases with only one image per subject in the source domain, the identity of target domain faces is estimated using the geodesic distance on product manifolds. When multiple images per subject are available, an extension of kernel discriminant analysis is developed using a novel kernel based on the projection metric on product spaces. Furthermore, a probabilistic approach to the problem of classifying image sets on product manifolds is introduced. Understanding attributes such as expression, age class, and gender from face images has many applications in multimedia processing including content personalization, human-computer interaction, and facial identification. To achieve good performance in these tasks, it is important to be able to extract pertinent visual structures from the input data. In the third part of the dissertation, a fully automatic approach for performing classification of facial attributes based on hierarchical feature learning using sparse coding is presented. The proposed approach is generative in the sense that it does not use label information in the process of feature learning. As a result, the same feature representation can be applied for different tasks such as expression, age, and gender classification. Final classification is performed by linear SVM trained with the corresponding labels for each task. The last part of the dissertation presents an automatic algorithm for determining the head pose from a given face image. The face image is divided into a regular grid and represented by dense SIFT descriptors extracted from the grid points. Random Projection (RP) is then applied to reduce the dimension of the concatenated SIFT descriptor vector. Classification and regression using Support Vector Machine (SVM) are combined in order to obtain an accurate estimate of the head pose. The advantage of the proposed approach is that it does not require facial landmarks such as the eye and mouth corners, the nose tip to be extracted from the input face image as in many other methods

    Learning based biological image analysis

    Get PDF
    The fate of contemporary scientific research in biology and medicine is bound to the advancements in computational methods. The unprecedented data explosion in microscopy and the crescent interest of life scientists in studying more complex and more subtle interactions stimulate the research for innovative computational solutions on challenging real world applications. Extensions and novel formulations of generic and flexible methods based on learning/inference are necessary to cope with the large variety of the produced data and to avoid continuous reimplementation and heavy parameter tuning. This thesis exploits cutting edge machine learning methods based on structured probabilistic models and weakly supervised learning to provide four novel solutions in the areas of large-scale microscopic imaging and multiple objects tracking. Chapter 2 introduces a weakly supervised learning framework to tackle the problem of detecting defect images while mining massive microscopic imagery databases. This thesis demonstrates accurate prediction with low user annotation effort. Chapter 3 presents a learning approach for counting overlapping objects in images based on local structured predictors. This problem has numerous applications in high throughput microscopy screening such as cells counting for drug toxicity assays. Chapter 4 develops a deterministic graphical model to impose temporal consistency in objects counts when dealing with a video sequence. This Chapter shows that global (temporal and spatial) structural inference consistently improves over local (only spatial) predictions. The method developed in Chapter 4 is used in a novel downstream tracking algorithm which is introduced in Chapter 5. This Chapter tackles, for the first time, the difficult problem of tracking heavily overlapping, translucent and indistinguishable objects. The mutual occlusion event handling of such objects is formulated as a novel structured inference problem based on the minimization of a convex multi-commodity flow energy. The optimal weights of the energy terms are learned with partial user supervision using structured learning with latent variables.To support behavioral biologists, we apply this method to the problem of tracking a community of interacting Drosophila larvae

    Detection and height estimation of buildings from SAR and optical images using conditional random fields

    Get PDF
    [no abstract
    • …
    corecore