10 research outputs found
ScratchDet: Training Single-Shot Object Detectors from Scratch
Current state-of-the-art object objectors are fine-tuned from the
off-the-shelf networks pretrained on large-scale classification dataset
ImageNet, which incurs some additional problems: 1) The classification and
detection have different degrees of sensitivity to translation, resulting in
the learning objective bias; 2) The architecture is limited by the
classification network, leading to the inconvenience of modification. To cope
with these problems, training detectors from scratch is a feasible solution.
However, the detectors trained from scratch generally perform worse than the
pretrained ones, even suffer from the convergence issue in training. In this
paper, we explore to train object detectors from scratch robustly. By analysing
the previous work on optimization landscape, we find that one of the overlooked
points in current trained-from-scratch detector is the BatchNorm. Resorting to
the stable and predictable gradient brought by BatchNorm, detectors can be
trained from scratch stably while keeping the favourable performance
independent to the network architecture. Taking this advantage, we are able to
explore various types of networks for object detection, without suffering from
the poor convergence. By extensive experiments and analyses on downsampling
factor, we propose the Root-ResNet backbone network, which makes full use of
the information from original images. Our ScratchDet achieves the
state-of-the-art accuracy on PASCAL VOC 2007, 2012 and MS COCO among all the
train-from-scratch detectors and even performs better than several one-stage
pretrained methods. Codes will be made publicly available at
https://github.com/KimSoybean/ScratchDet.Comment: CVPR2019 Oral Presentation. Camera Ready Versio
Enhanced contextual based deep learning model for niqab face detection
Human face detection is one of the most investigated areas in computer vision which plays a fundamental role as the first step for all face processing and facial analysis systems, such as face recognition, security monitoring, and facial emotion recognition. Despite the great impact of Deep Learning Convolutional neural network (DL-CNN) approaches on solving many unconstrained face detection problems in recent years, the low performance of current face detection models when detecting highly occluded faces remains a challenging problem and worth of investigation. This challenge tends to be higher when the occlusion covers most of the face which dramatically reduce the number of learned representative features that are used by Feature Extraction Network (FEN) to discriminate face parts from the background. The lack of occluded face dataset with sufficient images for heavily occluded faces is another challenge that degrades the performance. Therefore, this research addressed the issue of low performance and developed an enhanced occluded face detection model for detecting and localizing heavily occluded faces. First, a highly occluded faces dataset was developed to provide sufficient training examples incorporated with contextual-based annotation technique, to maximize the amount of facial salient features. Second, using the training half of the dataset, a deep learning-CNN Occluded Face Detection model (OFD) with an enhanced feature extraction and detection network was proposed and trained. Common deep learning techniques, namely transfer learning and data augmentation techniques were used to speed up the training process. The false-positive reduction based on max-in-out strategy was adopted to reduce the high false-positive rate. The proposed model was evaluated and benchmarked with five current face detection models on the dataset. The obtained results show that OFD achieved improved performance in terms of accuracy (average 37%), and average precision (16.6%) compared to current face detection models. The findings revealed that the proposed model outperformed current face detection models in improving the detection of highly occluded faces. Based on the findings, an improved contextual based labeling technique has been successfully developed to address the insufficient functionalities of current labeling technique.
Faculty of Engineering - School of Computing183http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150777
Deep Learning Convolutional neural network (DL-CNN), Feature Extraction Network (FEN), Occluded Face Detection model (OFD