119 research outputs found
PifPaf: Composite Fields for Human Pose Estimation
We propose a new bottom-up method for multi-person 2D human pose estimation
that is particularly well suited for urban mobility such as self-driving cars
and delivery robots. The new method, PifPaf, uses a Part Intensity Field (PIF)
to localize body parts and a Part Association Field (PAF) to associate body
parts with each other to form full human poses. Our method outperforms previous
methods at low resolution and in crowded, cluttered and occluded scenes thanks
to (i) our new composite field PAF encoding fine-grained information and (ii)
the choice of Laplace loss for regressions which incorporates a notion of
uncertainty. Our architecture is based on a fully convolutional, single-shot,
box-free design. We perform on par with the existing state-of-the-art bottom-up
method on the standard COCO keypoint task and produce state-of-the-art results
on a modified COCO keypoint task for the transportation domain.Comment: CVPR 201
Recurrent Attention Models for Depth-Based Person Identification
We present an attention-based model that reasons on human body shape and
motion dynamics to identify individuals in the absence of RGB information,
hence in the dark. Our approach leverages unique 4D spatio-temporal signatures
to address the identification problem across days. Formulated as a
reinforcement learning task, our model is based on a combination of
convolutional and recurrent neural networks with the goal of identifying small,
discriminative regions indicative of human identity. We demonstrate that our
model produces state-of-the-art results on several published datasets given
only depth images. We further study the robustness of our model towards
viewpoint, appearance, and volumetric changes. Finally, we share insights
gleaned from interpretable 2D, 3D, and 4D visualizations of our model's
spatio-temporal attention.Comment: Computer Vision and Pattern Recognition (CVPR) 201
Characterizing and Improving Stability in Neural Style Transfer
Recent progress in style transfer on images has focused on improving the
quality of stylized images and speed of methods. However, real-time methods are
highly unstable resulting in visible flickering when applied to videos. In this
work we characterize the instability of these methods by examining the solution
set of the style transfer objective. We show that the trace of the Gram matrix
representing style is inversely related to the stability of the method. Then,
we present a recurrent convolutional network for real-time video style transfer
which incorporates a temporal consistency loss and overcomes the instability of
prior methods. Our networks can be applied at any resolution, do not re- quire
optical flow at test time, and produce high quality, temporally consistent
stylized videos in real-time
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
We present a unified framework for understanding human social behaviors in
raw image sequences. Our model jointly detects multiple individuals, infers
their social actions, and estimates the collective actions with a single
feed-forward pass through a neural network. We propose a single architecture
that does not rely on external detection algorithms but rather is trained
end-to-end to generate dense proposal maps that are refined via a novel
inference scheme. The temporal consistency is handled via a person-level
matching Recurrent Neural Network. The complete model takes as input a sequence
of frames and outputs detections along with the estimates of individual actions
and collective activities. We demonstrate state-of-the-art performance of our
algorithm on multiple publicly available benchmarks
- …