2,909 research outputs found

    Face recognition enhancement through the use of depth maps and deep learning

    Get PDF
    Face recognition, although being a popular area of research for over a decade has still many open research challenges. Some of these challenges include the recognition of poorly illuminated faces, recognition under pose variations and also the challenge of capturing sufficient training data to enable recognition under pose/viewpoint changes. With the appearance of cheap and effective multimodal image capture hardware, such as the Microsoft Kinect device, new possibilities of research have been uncovered. One opportunity is to explore the potential use of the depth maps generated by the Kinect as an additional data source to recognize human faces under low levels of scene illumination, and to generate new images through creating a 3D model using the depth maps and visible-spectrum / RGB images that can then be used to enhance face recognition accuracy by improving the training phase of a classification task.. With the goal of enhancing face recognition, this research first investigated how depth maps, since not affected by illumination, can improve face recognition, if algorithms traditionally used in face recognition were used. To this effect a number of popular benchmark face recognition algorithms are tested. It is proved that algorithms based on LBP and Eigenfaces are able to provide high level of accuracy in face recognition due to the significantly high resolution of the depth map images generated by the latest version of the Kinect device. To complement this work a novel algorithm named the Dense Feature Detector is presented and is proven to be effective in face recognition using depth map images, in particular under wellilluminated conditions. Another technique that was presented for the goal of enhancing face recognition is to be able to reconstruct face images in different angles, through the use of the data of one frontal RGB image and the corresponding depth map captured by the Kinect, using faster and effective 3D object reconstruction technique. Using the Overfeat network based on Convolutional Neural Networks for feature extraction and a SVM for classification it is shown that a technically unlimited number of multiple views can be created from the proposed 3D model that consists features of the face if captured real at similar angles. Thus these images can be used as real training images, thus removing the need to capture many examples of a facial image from different viewpoints for the training of the image classifier. Thus the proposed 3D model will save significant amount of time and effort in capturing sufficient training data that is essential in recognition of the human face under variations of pose/viewpoint. The thesis argues that the same approach can also be used as a novel approach to face recognition, which promises significantly high levels of face recognition accuracy base on depth images. Finally following the recent trends in replacing traditional face recognition algorithms with the effective use of deep learning networks, the thesis investigates the use of four popular networks, VGG-16, VGG-19, VGG-S and GoogLeNet in depth maps based face recognition and proposes the effective use of Transfer Learning to enhance the performance of such Deep Learning networks

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin
    • …
    corecore