29,654 research outputs found
Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
Drone systems have been deployed by various law enforcement agencies to
monitor hostiles, spy on foreign drug cartels, conduct border control
operations, etc. This paper introduces a real-time drone surveillance system to
identify violent individuals in public areas. The system first uses the Feature
Pyramid Network to detect humans from aerial images. The image region with the
human is used by the proposed ScatterNet Hybrid Deep Learning (SHDL) network
for human pose estimation. The orientations between the limbs of the estimated
pose are next used to identify the violent individuals. The proposed deep
network can learn meaningful representations quickly using ScatterNet and
structural priors with relatively fewer labeled examples. The system detects
the violent individuals in real-time by processing the drone images in the
cloud. This research also introduces the aerial violent individual dataset used
for training the deep network which hopefully may encourage researchers
interested in using deep learning for aerial surveillance. The pose estimation
and violent individuals identification performance is compared with the
state-of-the-art techniques.Comment: To Appear in the Efficient Deep Learning for Computer Vision (ECV)
workshop at IEEE Computer Vision and Pattern Recognition (CVPR) 2018. Youtube
demo at this: https://www.youtube.com/watch?v=zYypJPJipY
PifPaf: Composite Fields for Human Pose Estimation
We propose a new bottom-up method for multi-person 2D human pose estimation
that is particularly well suited for urban mobility such as self-driving cars
and delivery robots. The new method, PifPaf, uses a Part Intensity Field (PIF)
to localize body parts and a Part Association Field (PAF) to associate body
parts with each other to form full human poses. Our method outperforms previous
methods at low resolution and in crowded, cluttered and occluded scenes thanks
to (i) our new composite field PAF encoding fine-grained information and (ii)
the choice of Laplace loss for regressions which incorporates a notion of
uncertainty. Our architecture is based on a fully convolutional, single-shot,
box-free design. We perform on par with the existing state-of-the-art bottom-up
method on the standard COCO keypoint task and produce state-of-the-art results
on a modified COCO keypoint task for the transportation domain.Comment: CVPR 201
Joint Multi-Person Pose Estimation and Semantic Part Segmentation
Human pose estimation and semantic part segmentation are two complementary
tasks in computer vision. In this paper, we propose to solve the two tasks
jointly for natural multi-person images, in which the estimated pose provides
object-level shape prior to regularize part segments while the part-level
segments constrain the variation of pose locations. Specifically, we first
train two fully convolutional neural networks (FCNs), namely Pose FCN and Part
FCN, to provide initial estimation of pose joint potential and semantic part
potential. Then, to refine pose joint location, the two types of potentials are
fused with a fully-connected conditional random field (FCRF), where a novel
segment-joint smoothness term is used to encourage semantic and spatial
consistency between parts and joints. To refine part segments, the refined pose
and the original part potential are integrated through a Part FCN, where the
skeleton feature from pose serves as additional regularization cues for part
segments. Finally, to reduce the complexity of the FCRF, we induce human
detection boxes and infer the graph inside each box, making the inference forty
times faster.
Since there's no dataset that contains both part segments and pose labels, we
extend the PASCAL VOC part dataset with human pose joints and perform extensive
experiments to compare our method against several most recent strategies. We
show that on this dataset our algorithm surpasses competing methods by a large
margin in both tasks.Comment: This paper has been accepted by CVPR 201
Ego-Downward and Ambient Video based Person Location Association
Using an ego-centric camera to do localization and tracking is highly needed
for urban navigation and indoor assistive system when GPS is not available or
not accurate enough. The traditional hand-designed feature tracking and
estimation approach would fail without visible features. Recently, there are
several works exploring to use context features to do localization. However,
all of these suffer severe accuracy loss if given no visual context
information. To provide a possible solution to this problem, this paper
proposes a camera system with both ego-downward and third-static view to
perform localization and tracking in a learning approach. Besides, we also
proposed a novel action and motion verification model for cross-view
verification and localization. We performed comparative experiments based on
our collected dataset which considers the same dressing, gender, and background
diversity. Results indicate that the proposed model can achieve
improvement in accuracy performance. Eventually, we tested the model on
multi-people scenarios and obtained an average accuracy
Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification
Person Re-identification (ReID) is to identify the same person across
different cameras. It is a challenging task due to the large variations in
person pose, occlusion, background clutter, etc How to extract powerful
features is a fundamental problem in ReID and is still an open problem today.
In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn
powerful features over full body and body parts, which can well capture the
local context knowledge by stacking multi-scale convolutions in each layer.
Moreover, instead of using predefined rigid parts, we propose to learn and
localize deformable pedestrian parts using Spatial Transformer Networks (STN)
with novel spatial constraints. The learned body parts can release some
difficulties, eg pose variations and background clutters, in part-based
representation. Finally, we integrate the representation learning processes of
full body and body parts into a unified framework for person ReID through
multi-class person identification tasks. Extensive evaluations on current
challenging large-scale person ReID datasets, including the image-based
Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed
method achieves the state-of-the-art results.Comment: Accepted by CVPR 201
- …