22 research outputs found
Context-aware CNNs for person head detection
Person detection is a key problem for many computer vision tasks. While face
detection has reached maturity, detecting people under a full variation of
camera view-points, human poses, lighting conditions and occlusions is still a
difficult challenge. In this work we focus on detecting human heads in natural
scenes. Starting from the recent local R-CNN object detector, we extend it with
two types of contextual cues. First, we leverage person-scene relations and
propose a Global CNN model trained to predict positions and scales of heads
directly from the full image. Second, we explicitly model pairwise relations
among objects and train a Pairwise CNN model using a structured-output
surrogate loss. The Local, Global and Pairwise models are combined into a joint
CNN framework. To train and test our full model, we introduce a large dataset
composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate
our method and demonstrate improvements of person head detection against
several recent baselines in three datasets. We also show improvements of the
detection speed provided by our model.Comment: To appear in International Conference on Computer Vision (ICCV), 201
Fast, Exact and Multi-Scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs
In this work we propose a structured prediction technique that combines the
virtues of Gaussian Conditional Random Fields (G-CRF) with Deep Learning: (a)
our structured prediction task has a unique global optimum that is obtained
exactly from the solution of a linear system (b) the gradients of our model
parameters are analytically computed using closed form expressions, in contrast
to the memory-demanding contemporary deep structured prediction approaches that
rely on back-propagation-through-time, (c) our pairwise terms do not have to be
simple hand-crafted expressions, as in the line of works building on the
DenseCRF, but can rather be `discovered' from data through deep architectures,
and (d) out system can trained in an end-to-end manner. Building on standard
tools from numerical analysis we develop very efficient algorithms for
inference and learning, as well as a customized technique adapted to the
semantic segmentation task. This efficiency allows us to explore more
sophisticated architectures for structured prediction in deep learning: we
introduce multi-resolution architectures to couple information across scales in
a joint optimization framework, yielding systematic improvements. We
demonstrate the utility of our approach on the challenging VOC PASCAL 2012
image segmentation benchmark, showing substantial improvements over strong
baselines. We make all of our code and experiments available at
{https://github.com/siddharthachandra/gcrf}Comment: Our code is available at https://github.com/siddharthachandra/gcr
Body-Part Joint Detection and Association via Extended Object Representation
The detection of human body and its related parts (e.g., face, head or hands)
have been intensively studied and greatly improved since the breakthrough of
deep CNNs. However, most of these detectors are trained independently, making
it a challenging task to associate detected body parts with people. This paper
focuses on the problem of joint detection of human body and its corresponding
parts. Specifically, we propose a novel extended object representation that
integrates the center location offsets of body or its parts, and construct a
dense single-stage anchor-based Body-Part Joint Detector (BPJDet). Body-part
associations in BPJDet are embedded into the unified representation which
contains both the semantic and geometric information. Therefore, BPJDet does
not suffer from error-prone association post-matching, and has a better
accuracy-speed trade-off. Furthermore, BPJDet can be seamlessly generalized to
jointly detect any body part. To verify the effectiveness and superiority of
our method, we conduct extensive experiments on the CityPersons, CrowdHuman and
BodyHands datasets. The proposed BPJDet detector achieves state-of-the-art
association performance on these three benchmarks while maintains high accuracy
of detection. Code is in https://github.com/hnuzhy/BPJDet.Comment: accepted by ICME202
Relational Learning for Joint Head and Human Detection
Head and human detection have been rapidly improved with the development of
deep convolutional neural networks. However, these two tasks are often studied
separately without considering their inherent correlation, leading to that 1)
head detection is often trapped in more false positives, and 2) the performance
of human detector frequently drops dramatically in crowd scenes. To handle
these two issues, we present a novel joint head and human detection network,
namely JointDet, which effectively detects head and human body simultaneously.
Moreover, we design a head-body relationship discriminating module to perform
relational learning between heads and human bodies, and leverage this learned
relationship to regain the suppressed human detections and reduce head false
positives. To verify the effectiveness of the proposed method, we annotate head
bounding boxes of the CityPersons and Caltech-USA datasets, and conduct
extensive experiments on the CrowdHuman, CityPersons and Caltech-USA datasets.
As a consequence, the proposed JointDet detector achieves state-of-the-art
performance on these three benchmarks. To facilitate further studies on the
head and human detection problem, all new annotations, source codes and trained
models will be public