37 research outputs found

    Statistical Approaches to Inferring Object Shape from Single Images

    Get PDF
    Depth inference is a fundamental problem of computer vision with a broad range of potential applications. Monocular depth inference techniques, particularly shape from shading dates back to as early as the 40's when it was first used to study the shape of the lunar surface. Since then there has been ample research to develop depth inference algorithms using monocular cues. Most of these are based on physical models of image formation and rely on a number of simplifying assumptions that do not hold for real world and natural imagery. Very few make use of the rich statistical information contained in real world images and their 3D information. There have been a few notable exceptions though. The study of statistics of natural scenes has been concentrated on outdoor scenes which are cluttered. Statistics of scenes of single objects has been less studied, but is an essential part of daily human interaction with the environment. Inferring shape of single objects is a very important computer vision problem which has captured the interest of many researchers over the past few decades and has applications in object recognition, robotic grasping, fault detection and Content Based Image Retrieval (CBIR). This thesis focuses on studying the statistical properties of single objects and their range images which can benefit shape inference techniques. I acquired two databases: Single Object Range and HDR (SORH) and the Eton Myers Database of single objects, including laser-acquired depth, binocular stereo, photometric stereo and High Dynamic Range (HDR) photography. I took a data driven approach and studied the statistics of color and range images of real scenes of single objects along with whole 3D objects and uncovered some interesting trends in the data. The fractal structure of natural images was previously well known, and thought to be a universal property. However, my research showed that the fractal structure of single objects and surfaces is governed by a wholly different set of rules. Classical computer vision problems of binocular and multi-view stereo, photometric stereo, shape from shading, structure from motion, and others, all rely on accurate and complete models of which 3D shapes and textures are plausible in nature, to avoid producing unlikely outputs. Bayesian approaches are common for these problems, and hopefully the findings on the statistics of the shape of single objects from this work and others will both inform new and more accurate Bayesian priors on shape, and also enable more efficient probabilistic inference procedures

    Road Detection and Recognition from Monocular Images Using Neural Networks

    Get PDF
    Teede eristamine on oluline osa iseseisvatest navigatsioonisüsteemidest, mis aitavad robotitel ja autonoomsetel sõidukitel maapinnal liikuda. See on kasutusel erinevates seotud alamülesannetes, näiteks võimalike valiidsete liikumisteede leidmisel, takistusega kokkupõrke vältimisel ja teel asuvate objektide avastamisel.Selle töö eesmärk on uurida eksisteerivaid teede tuvastamise ja eristamise võtteid ning pakkuda välja alternatiivne lahendus selle teostamiseks.Töö jaoks loodi 5300-pildine andmestik ilma lisainfota teepiltidest. Lisaks tehti kokkuvõte juba eksisteerivatest teepiltide andmestikest. Töös pakume erinevates keskkondades asuvate teede piltide klassifitseerimiseks välja LeNet-5’l põhineva tehisnärvivõrgu. Samuti esitleme FCN-8’l põhinevat mudelit pikslipõhiseks pildituvastuseks.Road recognition is one of the important aspects in Autonomous Navigation Systems. These systems help to navigate the autonomous vehicle and robot on the ground. Further, road detection is useful in related sub-tasks such as finding valid road path where the robot/vehicle can go, for supportive driverless vehicles, preventing the collision with the obstacle, object detection on the road, and others.The goal of this thesis is to examine existing road detection and recognition techniques and propose an alternative solution for road classification and detection task.Our contribution consists of several parts. Firstly, we released the road images dataset with approximately 5,300 unlabeled road images. Secondly, we summarized the information about the existing road images datasets. Thirdly, we proposed the convolutional LeNet-5-based neural network for the road image classification for various environments. Finally, our FCN-8-based model for pixel-wise image recognition has been presented

    Prediction model of Colour Dryback

    Get PDF

    Image segmentation and pigment mapping of cultural heritage based on spectral imaging

    Get PDF
    The goal of the work reported in this dissertation is to develop methods for image segmentation and pigment mapping of paintings based on spectral imaging. To reach this goal it is necessary to achieve sufficient spectral and colorimetric accuracies of both the spectral imaging system and pigment mapping. The output is a series of spatial distributions of pigments (or pigment maps) composing a painting. With these pigment maps, the change of the color appearance of the painting can be simulated when the optical properties of one or more pigments are altered. These pigment maps will also be beneficial for enriching the historical knowledge of the painting and aiding conservators in determining the best course for retouching damaged areas of the painting when metamerism is a factor. First, a new spectral reconstruction algorithm was developed based on Wyszecki’s hypothesis and the matrix R theory developed by Cohen and Kappauf. The method achieved both high spectral and colorimetric accuracies for a certain combination of illuminant and observer. The method was successfully tested with a practical spectral imaging system that included a traditional color-filter-array camera coupled with two optimized filters, developed in the Munsell Color Science Laboratory. The spectral imaging system was used to image test paintings, and the method was used to retrieve spectral reflectance factors for these paintings. Next, pigment mapping methods were brought forth, and these methods were based on Kubelka-Munk (K-M) turbid media theory that can predict spectral reflectance factor for a specimen from the optical properties of the specimen’s constituent pigments. The K-M theory has achieved practical success for opaque materials by reduction in mathematical complexity and elimination of controlling thickness. The use of the general K-M theory for the translucent samples was extensively studied, including determination of optical properties of pigments as functions of film thickness, and prediction of spectral reflectance factor of a specimen by selecting the right pigment combination. After that, an investigation was carried out to evaluate the impact of opacity and layer configuration of a specimen on pigment mapping. The conclusions were drawn from the comparisons of prediction accuracies of pigment mapping between opaque and translucent assumption, and between single and bi-layer assumptions. Finally, spectral imaging and pigment mapping were applied to three paintings. Large images were first partitioned into several small images, and each small image was segmented into different clusters based on either an unsupervised or supervised classification method. For each cluster, pigment mapping was done pixel-wise with a limited number of pigments, or with a limited number of pixels and then extended to other pixels based on a similarity calculation. For the masterpiece The Starry Night, these pigment maps can provide historical knowledge about the painting, aid conservators for inpainting damaged areas, and digitally rejuvenate the original color appearance of the painting (e.g. when the lead white was not noticeably darkened)

    Translational Functional Imaging in Surgery Enabled by Deep Learning

    Get PDF
    Many clinical applications currently rely on several imaging modalities such as Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), Computed Tomography (CT), etc. All such modalities provide valuable patient data to the clinical staff to aid clinical decision-making and patient care. Despite the undeniable success of such modalities, most of them are limited to preoperative scans and focus on morphology analysis, e.g. tumor segmentation, radiation treatment planning, anomaly detection, etc. Even though the assessment of different functional properties such as perfusion is crucial in many surgical procedures, it remains highly challenging via simple visual inspection. Functional imaging techniques such as Spectral Imaging (SI) link the unique optical properties of different tissue types with metabolism changes, blood flow, chemical composition, etc. As such, SI is capable of providing much richer information that can improve patient treatment and care. In particular, perfusion assessment with functional imaging has become more relevant due to its involvement in the treatment and development of several diseases such as cardiovascular diseases. Current clinical practice relies on Indocyanine Green (ICG) injection to assess perfusion. Unfortunately, this method can only be used once per surgery and has been shown to trigger deadly complications in some patients (e.g. anaphylactic shock). This thesis addressed common roadblocks in the path to translating optical functional imaging modalities to clinical practice. The main challenges that were tackled are related to a) the slow recording and processing speed that SI devices suffer from, b) the errors introduced in functional parameter estimations under changing illumination conditions, c) the lack of medical data, and d) the high tissue inter-patient heterogeneity that is commonly overlooked. This framework follows a natural path to translation that starts with hardware optimization. To overcome the limitation that the lack of labeled clinical data and current slow SI devices impose, a domain- and task-specific band selection component was introduced. The implementation of such component resulted in a reduction of the amount of data needed to monitor perfusion. Moreover, this method leverages large amounts of synthetic data, which paired with unlabeled in vivo data is capable of generating highly accurate simulations of a wide range of domains. This approach was validated in vivo in a head and neck rat model, and showed higher oxygenation contrast between normal and cancerous tissue, in comparison to a baseline using all available bands. The need for translation to open surgical procedures was met by the implementation of an automatic light source estimation component. This method extracts specular reflections from low exposure spectral images, and processes them to obtain an estimate of the light source spectrum that generated such reflections. The benefits of light source estimation were demonstrated in silico, in ex vivo pig liver, and in vivo human lips, where the oxygenation estimation error was reduced when utilizing the correct light source estimated with this method. These experiments also showed that the performance of the approach proposed in this thesis surpass the performance of other baseline approaches. Video-rate functional property estimation was achieved by two main components: a regression and an Out-of-Distribution (OoD) component. At the core of both components is a compact SI camera that is paired with state-of-the-art deep learning models to achieve real time functional estimations. The first of such components features a deep learning model based on a Convolutional Neural Network (CNN) architecture that was trained on highly accurate physics-based simulations of light-tissue interactions. By doing this, the challenge of lack of in vivo labeled data was overcome. This approach was validated in the task of perfusion monitoring in pig brain and in a clinical study involving human skin. It was shown that this approach is capable of monitoring subtle perfusion changes in human skin in an arm clamping experiment. Even more, this approach was capable of monitoring Spreading Depolarizations (SDs) (deoxygenation waves) in the surface of a pig brain. Even though this method is well suited for perfusion monitoring in domains that are well represented with the physics-based simulations on which it was trained, its performance cannot be guaranteed for outlier domains. To handle outlier domains, the task of ischemia monitoring was rephrased as an OoD detection task. This new functional estimation component comprises an ensemble of Invertible Neural Networks (INNs) that only requires perfused tissue data from individual patients to detect ischemic tissue as outliers. The first ever clinical study involving a video-rate capable SI camera in laparoscopic partial nephrectomy was designed to validate this approach. Such study revealed particularly high inter-patient tissue heterogeneity under the presence of pathologies (cancer). Moreover, it demonstrated that this personalized approach is now capable of monitoring ischemia at video-rate with SI during laparoscopic surgery. In conclusion, this thesis addressed challenges related to slow image recording and processing during surgery. It also proposed a method for light source estimation to facilitate translation to open surgical procedures. Moreover, the methodology proposed in this thesis was validated in a wide range of domains: in silico, rat head and neck, pig liver and brain, and human skin and kidney. In particular, the first clinical trial with spectral imaging in minimally invasive surgery demonstrated that video-rate ischemia monitoring is now possible with deep learning

    Face analysis and deepfake detection

    Get PDF
    This thesis concerns deep-learning-based face-related research topics. We explore how to improve the performance of several face systems when confronting challenging variations. In Chapter 1, we provide an introduction and background information on the theme, and we list the main research questions of this dissertation. In Chapter 2, we provide a synthetic face data generator with fully controlled variations and proposed a detailed experimental comparison of main characteristics that influence face detection performance. The result shows that our synthetic dataset could complement face detectors to become more robust against specific features in the real world. Our analysis also reveals that a variety of data augmentation is necessary to address differences in performance. In Chapter 3, we propose an age estimation method for handling large pose variations for unconstrained face images. A Wasserstein-based GAN model is used to complete the full uv texture presentation. The proposed AgeGAN method simultaneously learns to capture the facial uv texture map and age characteristics.In Chapter 4, we propose a maximum mean discrepancy (MMD) based cross-domain face forgery detection. The center and triplet losses are also incorporated to ensure that the learned features are shared by multiple domains and provide better generalization abilities to unseen deep fake samples. In Chapter 5, we introduce an end-to-end framework to predict ages from face videos. Clustering based transfer learning is used to provide proper prediction for imbalanced datasets

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    A new attribute measuring the contour smoothness of 2-D objects is presented in the context of morphological attribute filtering. The attribute is based on the ratio of the circularity and non-compactness, and has a maximum of 1 for a perfect circle. It decreases as the object boundary becomes irregular. Computation on hierarchical image representation structures relies on five auxiliary data members and is rapid. Contour smoothness is a suitable descriptor for detecting and discriminating man-made structures from other image features. An example is demonstrated on a very-high-resolution satellite image using connected pattern spectra and the switchboard platform

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    corecore