97 research outputs found

    Face Detection And Lip Localization

    Get PDF
    Integration of audio and video signals for automatic speech recognition has become an important field of study. The Audio-Visual Speech Recognition (AVSR) system is known to have accuracy higher than audio-only or visual-only system. The research focused on the visual front end and has been centered around lip segmentation. Experiments performed for lip feature extraction were mainly done in constrained environment with controlled background noise. In this thesis we focus our attention to a database collected in the environment of a moving car which hampered the quality of the imagery. We first introduce the concept of illumination compensation, where we try to reduce the dependency of light from over- or under-exposed images. As a precursor to lip segmentation, we focus on a robust face detection technique which reaches an accuracy of 95%. We have detailed and compared three different face detection techniques and found a successful way of concatenating them in order to increase the overall accuracy. One of the detection techniques used was the object detection algorithm proposed by Viola-Jones. We have experimented with different color spaces using the Viola-Jones algorithm and have reached interesting conclusions. Following face detection we implement a lip localization algorithm based on the vertical gradients of hybrid equations of color. Despite the challenging background and image quality, success rate of 88% was achieved for lip segmentation

    Robot Vision in Industrial Assembly and Quality Control Processes

    Get PDF

    Explainable Artificial Intelligence for Image Segmentation and for Estimation of Optical Aberrations

    Get PDF
    State-of-the-art machine learning methods such as convolutional neural networks (CNNs) are frequently employed in computer vision. Despite their high performance on unseen data, CNNs are often criticized for lacking transparency — that is, providing very limited if any information about the internal decision-making process. In some applications, especially in healthcare, such transparency of algorithms is crucial for end users, as trust in diagnosis and prognosis is important not only for the satisfaction and potential adherence of patients, but also for their health. Explainable artificial intelligence (XAI) aims to open up this “black box,” often perceived as a cryptic and inconceivable algorithm, to increase understanding of the machines’ reasoning.XAI is an emerging field, and techniques for making machine learning explainable are becoming increasingly available. XAI for computer vision mainly focuses on image classification, whereas interpretability in other tasks remains challenging. Here, I examine explainability in computer vision beyond image classification, namely in semantic segmentation and 3D multitarget image regression. This thesis consists of five chapters. In Chapter 1 (Introduction), the background of artificial intelligence (AI), XAI, computer vision, and optics is presented, and the definitions of the terminology for XAI are proposed. Chapter 2 is focused on explaining the predictions of U-Net, a CNN commonly used for semantic image segmentation, and variations of this architecture. To this end, I propose the gradient-weighted class activation mapping for segmentation (Seg-Grad-CAM) method based on the well-known Grad-CAM method for explainable image classification. In Chapter 3, I present the application of deep learning to estimation of optical aberrations in microscopy biodata by identifying the present Zernike aberration modes and their amplitudes. A CNN-based approach PhaseNet can accurately estimate monochromatic aberrations in images of point light sources. I extend this method to objects of complex shapes. In Chapter 4, an approach for explainable 3D multitarget image regression is reported. First, I visualize how the model differentiates the aberration modes using the local interpretable model-agnostic explanations (LIME) method adapted for 3D image classification. Then I “explain,” using LIME modified for multitarget 3D image regression (Image-Reg-LIME), the outputs of the regression model for estimation of the amplitudes. In Chapter 5, the results are discussed in a broader context. The contribution of this thesis is the development of explainability methods for semantic segmentation and 3D multitarget image regression of optical aberrations. The research opens the door for further enhancement of AI’s transparency.:Title Page i List of Figures xi List of Tables xv 1 Introduction 1 1.1 Essential Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Explainable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Proposed definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Explainable Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Aims and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Image classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Image regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.4 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Aberrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Zernike polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.2 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Explainable Image Segmentation 23 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.4 Seg-Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Estimation of Aberrations 55 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 PhaseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 PhaseNet data generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.3 Retrieval of noise parameters . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.4 Data generator with phantoms . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.5 Restoration via deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.6 Convolution with the “zero” synthetic PSF . . . . . . . . . . . . . . . . 63 3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 Astrocytes (synthetic data) . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.2 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 Drosophila embryo (live sample) . . . . . . . . . . . . . . . . . . . . . . 67 3.4.4 Neurons (fixed sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1 Astrocytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.2 Conclusions on the results for astrocytes . . . . . . . . . . . . . . . . . . 74 3.5.3 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5.4 Conclusions on the results for fluorescent beads . . . . . . . . . . . . . . 81 3.5.5 Drosophila embryo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.6 Conclusions on the results for Drosophila embryo . . . . . . . . . . . . . 87 3.5.7 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 Explainable Multitarget Image Regression 99 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.1 LIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.2 Superpixel algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.3 LIME for 3D image classification . . . . . . . . . . . . . . . . . . . . . . 104 4.3.4 Image-Reg-LIME: LIME for 3D image regression . . . . . . . . . . . . . 107 4.4 Results: Classification of Aberrations . . . . . . . . . . . . . . . . . . . . . . . . 109 viii TABLE OF CONTENTS 4.4.1 Transforming the regression task into classification . . . . . . . . . . . . 110 4.4.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.3 Parameter search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.4 Clustering of 3D images . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.5 Explanations of classification . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.6 Conclusions on the results for classification . . . . . . . . . . . . . . . . 117 4.5 Results: Explainable Regression of Aberrations . . . . . . . . . . . . . . . . . . 118 4.5.1 Explanations with a reference value . . . . . . . . . . . . . . . . . . . . 121 4.5.2 Validation of explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5 Conclusions and Outlook 127 References 12

    Research on a modifeied RANSAC and its applications to ellipse detection from a static image and motion detection from active stereo video sequences

    Get PDF
    制度:新 ; 報告番号:甲3091号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2010/2/24 ; 早大学位記番号:新535

    Entropy in Image Analysis II

    Get PDF
    Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

    BBox-Free SAR Ship Instance Segmentation Method Based on Gaussian Heatmap

    Get PDF
    Recently, deep learning methods have been widely adopted for ship detection in synthetic aperture radar (SAR) images. However, many of the existing methods miss adjacent ship instances when detecting densely arranged ship targets in inshore scenes. Besides, they suffer from the lack of precision in the instance indication information and the confusion of multiple instances by a single mask head. In this paper, we propose a novel center point prediction algorithm, which detects the center points by finding a long distance variation relationship between two points. The whole prediction process is anchor-free and does not require additional bounding box (BBox) predictions for non-maximum suppression (NMS). Therefore, our algorithm is BBox-free and NMS-free, solving the problem of low recall rates when conducting NMS for densely arranged targets. Furthermore, to tackle the deficiency of position indication information in localization tasks, we introduce a feature fusion module with feature decoupling (FD). This module uses classification branch to provide guidance information for localization branch, while suppressing the influence of the gradient flow mixing, effectively improving the algorithm’s segmentation performance of ship contours. Finally, through principal component analysis (PCA) of the Gaussian distribution covariance matrix, we propose a loss function based on the distance between centroids and the difference of angle, called centroid and angle constraint (CAC). CAC guides the network in learning the criterion that a single dynamic mask head is only valid for a single instance. Experiments conducted on PSeg-SSDD and HRSID demonstrate the effectiveness and robustness of our method

    On Body Mass Index Analysis from Human Visual Appearance

    Get PDF
    In the past few decades, overweight and obesity are spreading widely like an epidemic. Generally, a person is considered overweight by body mass index (BMI). In addition to a body fat measurement, BMI is also a risk factor for many diseases, such as cardiovascular diseases, cancers and diabetes, etc. Therefore, BMI is important for personal health monitoring and medical research. Currently, BMI is measured in person with special devices. It is an urgent demand to explore conveniently preventive tools. This work investigates the feasibility of analyzing BMI from human visual appearances, including 2-dimensional (2D)/3-dimensional (3D) body and face data. Motivated by health science studies which have shown that anthropometric measures, such as waist-hip ratio, waist circumference, etc., are indicators for obesity, we analyze body weight from frontal view human body images. A framework is developed for body weight analysis from body images, along with the computation methods of five anthropometric features for body weight characterization. Then, we study BMI estimation from the 3D data by measuring the correlation between the estimated body volume and BMIs, and develop an efficient BMI computation method which consists of body weight and height estimation from normally dressed people in 3D space. We also intensively study BMI estimation from frontal view face images via two key aspects: facial representation extracting and BMI estimator learning. First, we investigate the visual BMI estimation problem from the aspect of the characteristics and performance of different facial representation extracting methods by three designed experiments. Then we study visual BMI estimation from facial images by a two-stage learning framework. BMI related facial features are learned in the first stage. To address the ambiguity of BMI labels, a label distribution based BMI estimator is proposed for the second stage. The experimental results show that this framework improves the performance step by step. Finally, to address the challenges caused by BMI data and labels, we integrate feature learning and estimator learning in one convolutional neural network (CNN). A label assignment matching scheme is proposed which successfully achieves an improvement in BMI estimation from face images
    corecore