55 research outputs found

    Pedestrian recognition and obstacle avoidance for autonomous vehicles using raspberry Pi

    Get PDF
    © Springer Nature Switzerland AG 2020. The aim of this paper is twofold: firstly, to use ultrasonic sensors to detect obstacles and secondly to present a comparison of machine learning and deep learning algorithms for pedestrian recognition in an autonomous vehicle. A mobility scooter was modified to be fully autonomous using Raspberry Pi 3 as a controller. Pedestrians were initially simulated by card board boxes and further replaced by a pedestrian. A mobility scooter was disassembled and connected to Raspberry Pi 3 with ultrasonic sensors and a camera. Two computer vision algorithms of histogram of oriented gradients (HOG) descriptors and Haar-classifiers were trained and tested for pedestrian recognition and compared to deep learning using the single shot detection method. The ultrasonic sensors were tested for time delay for obstacle avoidance and were found to be reliable at ranges between 100 cm and 500 cm at small angles from the acoustic axis, and at delay periods over two seconds. HOG descriptor was found to be a superior algorithm for detecting pedestrians compared to Haar-classifier with an accuracy of around 83%, whereas, deep learning outperformed both with an accuracy of around 88%. The work presented here will enable further tests on the autonomous vehicle to collect meaningful data for management of vehicular cloud

    Banknote Authentication and Medical Image Diagnosis Using Feature Descriptors and Deep Learning Methods

    Get PDF
    Banknote recognition and medical image analysis have been the foci of image processing and pattern recognition research. As counterfeiters have taken advantage of the innovation in print media technologies for reproducing fake monies, hence the need to design systems which can reassure and protect citizens of the authenticity of banknotes in circulation. Similarly, many physicians must interpret medical images. But image analysis by humans is susceptible to error due to wide variations across interpreters, lethargy, and human subjectivity. Computer-aided diagnosis is vital to improvements in medical analysis, as they facilitate the identification of findings that need treatment and assist the expert’s workflow. Thus, this thesis is organized around three such problems related to Banknote Authentication and Medical Image Diagnosis. In our first research problem, we proposed a new banknote recognition approach that classifies the principal components of extracted HOG features. We further experimented on computing HOG descriptors from cells created from image patch vertices of SURF points and designed a feature reduction approach based on a high correlation and low variance filter. In our second research problem, we developed a mobile app for banknote identification and counterfeit detection using the Unity 3D software and evaluated its performance based on a Cascaded Ensemble approach. The algorithm was then extended to a client-server architecture using SIFT and SURF features reduced by Bag of Words and high correlation-based HOG vectors. In our third research problem, experiments were conducted on a pre-trained mobile app for medical image diagnosis using three convolutional layers with an Ensemble Classifier comprising PCA and bagging of five base learners. Also, we implemented a Bidirectional Generative Adversarial Network to mitigate the effect of the Binary Cross Entropy loss based on a Deep Convolutional Generative Adversarial Network as the generator and encoder with Capsule Network as the discriminator while experimenting on images with random composition and translation inferences. Lastly, we proposed a variant of the Single Image Super-resolution for medical analysis by redesigning the Super Resolution Generative Adversarial Network to increase the Peak Signal to Noise Ratio during image reconstruction by incorporating a loss function based on the mean square error of pixel space and Super Resolution Convolutional Neural Network layers

    Facial expression recognition in the wild : from individual to group

    Get PDF
    The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Multi-Modal Ocular Recognition in presence of occlusion in Mobile Devices

    Get PDF
    Title from PDF of title page viewed September 18, 2019Dissertation advisor: Reza DerakhshaniVitaIncludes bibliographical references (pages 128-144)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2018The existence eyeglasses in human faces cause real challenges for ocular, facial, and soft-based (such as eyebrows) biometric recognition due to glasses reflection, shadow, and frame occlusion. In this regard, two operations (eyeglasses detection and eyeglasses segmentation) have been proposed to mitigate the effect of occlusion using eyeglasses. Eyeglasses detection is an important initial step towards eyeglass segmentation. Three schemes of eye glasses detection have been proposed which are non-learning-based, learning-based, and deep learning-based schemes. The non-learning scheme of eyeglasses detection which consists of cascaded filters achieved an overall accuracy of 99.0% for VI SOB and 97.9% for FERET datasets. The learning-based scheme of eyeglass detection consisting of extracting Local Binary Pattern (LBP), Histogram of Gradients (HOG) and fusing them together, then applying classifiers (such as Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Linear Discriminant Analysis (LDA)), and fusing the output of these classifiers. The latter obtained a best overall accuracy of about 99.3% on FERET and 100% on VISOB dataset. Besides, the deep learning-based scheme of eye glasses detection showed a comparative study for eyeglasses frame detection using different Convolutional Neural Network (CNN) structures that are applied to Frame Bridge region and extended ocular region. The best CNN model obtained an overall accuracy of 99.96% for ROI consisting of Frame Bridge. Moreover, two schemes of eyeglasses segmentation have been introduced. The first segmentation scheme was cascaded convolutional Neural Network (CNN). This scheme consists of cascaded CNN’s for eyeglasses detection, weight generation, and glasses segmentation, followed by mathematical and binarization operations. The scheme showed a 100% eyeglasses detection and 91% segmentation accuracy by our proposed approach. Also, the second segmentation scheme was the convolutional de-convolutional network. This CNN model has been implemented with main convolutional layers, de-convolutional layers, and one custom (lamda) layer. This scheme achieved better segmentation results of 97% segmentation accuracy over the cascaded approach. Furthermore, two soft biometric re-identification schemes have been introduced with eyeglasses mitigation. The first scheme was eyebrows-based user authentication consists of local, global, deep feature extraction with learning-based matching. The best result of 0.63% EER using score level fusion of handcraft descriptors (HOG, and GIST) with the deep VGG16 descriptor for eyebrow-based user authentication. The second scheme was eyeglass-based user authentication which consisting of eyeglasses segmentation, morphological cleanup, features extraction, and learning-based matching. The best result of 3.44% EER using score level fusion of handcraft descriptors (HOG, and GIST) with the deep VGG16 descriptor for eyeglasses-based user authentication. Also, an EER enhancement of 2.51% for indoor vs. outdoor (In: Out) light set tings was achieved for eyebrow-based authentication after eyeglasses segmentation and removal using Convolutional-Deconvolutional approach followed by in-painting.Introduction -- Background in machine learning and computer vision -- Eyeglasses detection and segmentation -- User authentication using soft-biometric -- Conclusion and future work -- Appendi
    corecore