4,243 research outputs found
Joint Multi-view Face Alignment in the Wild
The de facto algorithm for facial landmark estimation involves running a face
detector with a subsequent deformable model fitting on the bounding box. This
encompasses two basic problems: i) the detection and deformable fitting steps
are performed independently, while the detector might not provide best-suited
initialisation for the fitting step, ii) the face appearance varies hugely
across different poses, which makes the deformable face fitting very
challenging and thus distinct models have to be used (\eg, one for profile and
one for frontal faces). In this work, we propose the first, to the best of our
knowledge, joint multi-view convolutional network to handle large pose
variations across faces in-the-wild, and elegantly bridge face detection and
facial landmark localisation tasks. Existing joint face detection and landmark
localisation methods focus only on a very small set of landmarks. By contrast,
our method can detect and align a large number of landmarks for semi-frontal
(68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a
plethora of datasets including standard static image datasets such as IBUG,
300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile
faces. Significant improvement over state-of-the-art methods on deformable face
tracking is witnessed on 300VW benchmark. We also demonstrate state-of-the-art
results for face detection on FDDB and MALF datasets.Comment: submit to IEEE Transactions on Image Processin
Adversarial Occlusion-aware Face Detection
Occluded face detection is a challenging detection task due to the large
appearance variations incurred by various real-world occlusions. This paper
introduces an Adversarial Occlusion-aware Face Detector (AOFD) by
simultaneously detecting occluded faces and segmenting occluded areas.
Specifically, we employ an adversarial training strategy to generate
occlusion-like face features that are difficult for a face detector to
recognize. Occlusion mask is predicted simultaneously while detecting occluded
faces and the occluded area is utilized as an auxiliary instead of being
regarded as a hindrance. Moreover, the supervisory signals from the
segmentation branch will reversely affect the features, aiding in detecting
heavily-occluded faces accordingly. Consequently, AOFD is able to find the
faces with few exposed facial landmarks with very high confidences and keeps
high detection accuracy even for masked faces. Extensive experiments
demonstrate that AOFD not only significantly outperforms state-of-the-art
methods on the MAFA occluded face detection dataset, but also achieves
competitive detection accuracy on benchmark dataset for general face detection
such as FDDB.Comment: Accepted by ACPR201
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
In recent years, numerous effective multi-object tracking (MOT) methods are
developed because of the wide range of applications. Existing performance
evaluations of MOT methods usually separate the object tracking step from the
object detection step by using the same fixed object detection results for
comparisons. In this work, we perform a comprehensive quantitative study on the
effects of object detection accuracy to the overall MOT performance, using the
new large-scale University at Albany DETection and tRACking (UA-DETRAC)
benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging
video sequences captured from real-world traffic scenes (over 140,000 frames
with rich annotations, including occlusion, weather, vehicle category,
truncation, and vehicle bounding boxes) for object detection, object tracking
and MOT system. We evaluate complete MOT systems constructed from combinations
of state-of-the-art object detection and object tracking methods. Our analysis
shows the complex effects of object detection accuracy on MOT system
performance. Based on these observations, we propose new evaluation tools and
metrics for MOT systems that consider both object detection and object tracking
for comprehensive analysis.Comment: 18 pages, 11 figures, accepted by CVI
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
A Survey on Ear Biometrics
Recognizing people by their ear has recently received significant attention in the literature. Several reasons account for this trend: first, ear recognition does not suffer from some problems associated with other non contact biometrics, such as face recognition; second, it is the most promising candidate for combination with the face in the context of multi-pose face recognition; and third, the ear can be used for human recognition in surveillance videos where the face may be occluded completely or in part. Further, the ear appears to degrade little with age. Even though, current ear detection and recognition systems have reached a certain level of maturity, their success is limited to controlled indoor conditions. In addition to variation in illumination, other open research problems include hair occlusion; earprint forensics; ear symmetry; ear classification; and ear individuality. This paper provides a detailed survey of research conducted in ear detection and recognition. It provides an up-to-date review of the existing literature revealing the current state-of-art for not only those who are working in this area but also for those who might exploit this new approach. Furthermore, it offers insights into some unsolved ear recognition problems as well as ear databases available for researchers
Multi-Face Tracking by Extended Bag-of-Tracklets in Egocentric Videos
Wearable cameras offer a hands-free way to record egocentric images of daily
experiences, where social events are of special interest. The first step
towards detection of social events is to track the appearance of multiple
persons involved in it. In this paper, we propose a novel method to find
correspondences of multiple faces in low temporal resolution egocentric videos
acquired through a wearable camera. This kind of photo-stream imposes
additional challenges to the multi-tracking problem with respect to
conventional videos. Due to the free motion of the camera and to its low
temporal resolution, abrupt changes in the field of view, in illumination
condition and in the target location are highly frequent. To overcome such
difficulties, we propose a multi-face tracking method that generates a set of
tracklets through finding correspondences along the whole sequence for each
detected face and takes advantage of the tracklets redundancy to deal with
unreliable ones. Similar tracklets are grouped into the so called extended
bag-of-tracklets (eBoT), which is aimed to correspond to a specific person.
Finally, a prototype tracklet is extracted for each eBoT, where the occurred
occlusions are estimated by relying on a new measure of confidence. We
validated our approach over an extensive dataset of egocentric photo-streams
and compared it to state of the art methods, demonstrating its effectiveness
and robustness.Comment: 27 pages, 18 figures, submitted to computer vision and image
understanding journa
People Counting in Crowded and Outdoor Scenes using a Hybrid Multi-Camera Approach
This paper presents two novel approaches for people counting in crowded and
open environments that combine the information gathered by multiple views.
Multiple camera are used to expand the field of view as well as to mitigate the
problem of occlusion that commonly affects the performance of counting methods
using single cameras. The first approach is regarded as a direct approach and
it attempts to segment and count each individual in the crowd. For such an aim,
two head detectors trained with head images are employed: one based on support
vector machines and another based on Adaboost perceptron. The second approach,
regarded as an indirect approach employs learning algorithms and statistical
analysis on the whole crowd to achieve counting. For such an aim, corner points
are extracted from groups of people in a foreground image and computed by a
learning algorithm which estimates the number of people in the scene. Both
approaches count the number of people on the scene and not only on a given
image or video frame of the scene. The experimental results obtained on the
benchmark PETS2009 video dataset show that proposed indirect method surpasses
other methods with improvements of up to 46.7% and provides accurate counting
results for the crowded scenes. On the other hand, the direct method shows high
error rates due to the fact that the latter has much more complex problems to
solve, such as segmentation of heads
Improved Selective Refinement Network for Face Detection
As a long-standing problem in computer vision, face detection has attracted
much attention in recent decades for its practical applications. With the
availability of face detection benchmark WIDER FACE dataset, much of the
progresses have been made by various algorithms in recent years. Among them,
the Selective Refinement Network (SRN) face detector introduces the two-step
classification and regression operations selectively into an anchor-based face
detector to reduce false positives and improve location accuracy
simultaneously. Moreover, it designs a receptive field enhancement block to
provide more diverse receptive field. In this report, to further improve the
performance of SRN, we exploit some existing techniques via extensive
experiments, including new data augmentation strategy, improved backbone
network, MS COCO pretraining, decoupled classification module, segmentation
branch and Squeeze-and-Excitation block. Some of these techniques bring
performance improvements, while few of them do not well adapt to our baseline.
As a consequence, we present an improved SRN face detector by combining these
useful techniques together and obtain the best performance on widely used face
detection benchmark WIDER FACE dataset.Comment: Technical report, 8 pages, 6 figure
Recognizing Partial Biometric Patterns
Biometric recognition on partial captured targets is challenging, where only
several partial observations of objects are available for matching. In this
area, deep learning based methods are widely applied to match these partial
captured objects caused by occlusions, variations of postures or just partial
out of view in person re-identification and partial face recognition. However,
most current methods are not able to identify an individual in case that some
parts of the object are not obtainable, while the rest are specialized to
certain constrained scenarios. To this end, we propose a robust general
framework for arbitrary biometric matching scenarios without the limitations of
alignment as well as the size of inputs. We introduce a feature post-processing
step to handle the feature maps from FCN and a dictionary learning based
Spatial Feature Reconstruction (SFR) to match different sized feature maps in
this work. Moreover, the batch hard triplet loss function is applied to
optimize the model. The applicability and effectiveness of the proposed method
are demonstrated by the results from experiments on three person
re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial
person datasets (Partial REID and Partial iLIDS) and two partial face datasets
(CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is
ensured in comparison with several state-of-the-art approaches. The code is
released online and can be found on the website:
https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure
Can We Boost the Power of the Viola-Jones Face Detector Using Pre-processing? An Empirical Study
The Viola-Jones face detection algorithm was (and still is) a quite popular
face detector. In spite of the numerous face detection techniques that have
been recently presented, there are many research works that are still based on
the Viola-Jones algorithm because of its simplicity. In this paper, we study
the influence of a set of blind pre-processing methods on the face detection
rate using the Viola-Jones algorithm. We focus on two aspects of improvement,
specifically badly illuminated faces and blurred faces. Many methods for
lighting invariant and deblurring are used in order to improve the detection
accuracy. We want to avoid using blind pre-processing methods that may obstruct
the face detector. To that end, we perform two sets of experiments. The first
set is performed to avoid any blind pre-processing method that may hurt the
face detector. The second set is performed to study the effect of the selected
pre-processing methods on images that suffer from hard conditions. We present
two manners of applying the pre-processing method to the image prior to being
used by the Viola-Jones face detector. Four different datasets are used to draw
a coherent conclusion about the potential improvement caused by using prior
enhanced images. The results demonstrate that some of the pre-processing
methods may hurt the accuracy of Viola-Jones face detection algorithm. However,
other pre-processing methods have an evident positive impact on the accuracy of
the face detector. Overall, we recommend three simple and fast blind
photometric normalization methods as a pre-processing step in order to improve
the accuracy of the pre-trained Viola-Jones face detector.Comment: 14 pages, 10 figures, 8 table
- …