196,176 research outputs found

    Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

    Full text link
    Large-scale variations still pose a challenge in unconstrained face detection. To the best of our knowledge, no current face detection algorithm can detect a face as large as 800 x 800 pixels while simultaneously detecting another one as small as 8 x 8 pixels within a single image with equally high accuracy. We propose a two-stage cascaded face detection framework, Multi-Path Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a deep neural network with a classic learning strategy, to tackle this challenge. The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes faces at three different scales. It simultaneously utilizes three parallel outputs of the convolutional feature maps to predict multi-scale candidate face regions. The "atrous" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for "hard" examples are embedded in MP-RPN to further boost its performance. The second stage is a Boosted Forests classifier, which utilizes deep facial features pooled from inside the candidate face regions as well as deep contextual features pooled from a larger region surrounding the candidate face regions. This step is included to further remove hard negative samples. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset "hard" partition, outperforming the former best result by 9.6% for the Average Precision.Comment: 11 pages, 7 figures, to be presented at CRV 201

    Learning-Based Approach to Real Time Tracking and Analysis of Faces

    Get PDF
    This paper describes a trainable system capable of tracking faces and facialsfeatures like eyes and nostrils and estimating basic mouth features such as sdegrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets

    Enhanced contextual based deep learning model for niqab face detection

    Get PDF
    Human face detection is one of the most investigated areas in computer vision which plays a fundamental role as the first step for all face processing and facial analysis systems, such as face recognition, security monitoring, and facial emotion recognition. Despite the great impact of Deep Learning Convolutional neural network (DL-CNN) approaches on solving many unconstrained face detection problems in recent years, the low performance of current face detection models when detecting highly occluded faces remains a challenging problem and worth of investigation. This challenge tends to be higher when the occlusion covers most of the face which dramatically reduce the number of learned representative features that are used by Feature Extraction Network (FEN) to discriminate face parts from the background. The lack of occluded face dataset with sufficient images for heavily occluded faces is another challenge that degrades the performance. Therefore, this research addressed the issue of low performance and developed an enhanced occluded face detection model for detecting and localizing heavily occluded faces. First, a highly occluded faces dataset was developed to provide sufficient training examples incorporated with contextual-based annotation technique, to maximize the amount of facial salient features. Second, using the training half of the dataset, a deep learning-CNN Occluded Face Detection model (OFD) with an enhanced feature extraction and detection network was proposed and trained. Common deep learning techniques, namely transfer learning and data augmentation techniques were used to speed up the training process. The false-positive reduction based on max-in-out strategy was adopted to reduce the high false-positive rate. The proposed model was evaluated and benchmarked with five current face detection models on the dataset. The obtained results show that OFD achieved improved performance in terms of accuracy (average 37%), and average precision (16.6%) compared to current face detection models. The findings revealed that the proposed model outperformed current face detection models in improving the detection of highly occluded faces. Based on the findings, an improved contextual based labeling technique has been successfully developed to address the insufficient functionalities of current labeling technique. Faculty of Engineering - School of Computing183http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150777 Deep Learning Convolutional neural network (DL-CNN), Feature Extraction Network (FEN), Occluded Face Detection model (OFD

    On Detecting Faces And Classifying Facial Races With Partial Occlusions And Pose Variations

    Get PDF
    In this dissertation, we present our contributions in face detection and facial race classification. Face detection in unconstrained images is a traditional problem in computer vision community. Challenges still remain. In particular, the detection of partially occluded faces with pose variations has not been well addressed. In the first part of this dissertation, our contributions are three-fold. First, we introduce our four image datasets consisting of large-scale labeled face dataset, noisy large-scale labeled non-face dataset, CrowdFaces dataset, and CrowdNonFaces dataset intended to be used for face detection training. Second, we improve Viola-Jones (VJ) face detection results by first training a Convolutional Neural Network (CNN) model on our noisy datasets. We show our improvement over the VJ face detector on AFW face detection benchmark dataset. However, existing partial occluded face detection methods require training several models, computing hand-crafted features, or both. Hence, we thirdly propose our Large-Scale Deep Learning (LSDL), a method that does not require training several CNN models or hand-crafted features computations to detect faces. Our LSDL face detector is trained on a single CNN model to detect unconstrained multi-view partially occluded and non-partially occluded faces. The model is trained with a large number of face training examples that cover most partial occlusions and non-partial occlusions facial appearances. The LSDL face detection method is achieved by selecting detection windows with the highest confidence scores using a threshold. Our evaluation results show that our LSDL method achieves the best performance on AFW dataset and a comparable performance on FDDB dataset among state-of-the-art face detection methods without manually extending or adjusting the square detection bounding boxes. Many biometrics and security systems use facial information to obtain an individual identification and recognition. Classifying a race from a face image can provide a strong hint to search for facial identity and criminal identification. Current facial race classification methods are confined only to constrained non-partially occluded frontal faces. Challenges remain under unconstrained environments such as partial occlusions and pose variations, low illuminations, and small scales. In the second part of the dissertation, we propose a CNN model to classify facial races with partial occlusions and pose variations. The proposed model is trained using a broad and balanced racial distributed face image dataset. The model is trained on four major human races, Caucasian, Indian, Mongolian, and Negroid. Our model is evaluated against the state-of-the-art methods on a constrained face test dataset. Also, an evaluation of the proposed model and human performance is conducted and compared on our new unconstrained facial race benchmark (CIMN) dataset. Our results show that our model achieves 95.1% of race classification accuracy in the constrained environment. Furthermore, the model achieves a comparable accuracy of race classification compared to human performance on the current challenges in the unconstrained environment

    Object detection for big data

    Get PDF
    "May 2014."Dissertation supervisor: Dr. Tony X. Han.Includes vita.We have observed significant advances in object detection over the past few decades and gladly seen the related research has began to contribute to the world: Vehicles could automatically stop before hitting any pedestrian; Face detectors have been integrated into smart phones and tablets; Video surveillance systems could locate the suspects and stop crimes. All these applications demonstrate the substantial research progress on object detection. However learning a robust object detector is still quite challenging due to the fact that object detection is a very unbalanced big data problem. In this dissertation, we aim at improving the object detector's performance from different aspects. For object detection, the state-of-the-art performance is achieved through supervised learning. The performances of object detectors of this kind are mainly determined by two factors: features and underlying classification algorithms. We have done thorough research on both of these factors. Our contribution involves model adaption, local learning, contextual boosting, template learning and feature development. Since the object detection is an unbalanced problem, in which positive examples are hard to be collected, we propose to adapt a general object detector for a specific scenario with a few positive examples; To handle the large intra-class variation problem lying in object detection task, we propose a local adaptation method to learn a set of efficient and effective detectors for a single object category; To extract the effective context from the huge amount of negative data in object detection, we introduce a novel contextual descriptor to iteratively improve the detector; To detect object with a depth sensor, we design an effective depth descriptor; To distinguish the object categories with the similar appearance, we propose a local feature embedding and template selection algorithm, which has been successfully incorporated into a real-world fine-grained object recognition application. All the proposed algorithms and featuIncludes bibliographical references (pages 117-130)
    corecore