1,157 research outputs found
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
TARGET POSE ESTIMATION VIA DEEP LEARNING FOR MILITARY SYSTEMS
Target pose estimation and aimpoint selection is crucial in direct energy weapon systems, as it allows the system to point to a specific and strategic area of the target. However, it is a challenging task because a dedicated attitude sensor is required. Motivated by new emerging deep learning capabilities, the present work proposes a deep learning model to estimate a target spacecraft attitude in terms of Euler angles. Data for the deep learning model were experimentally generated from 3D UAV models, incorporating effects such as atmospheric backgrounds and turbulence. The targets pose was derived from the training, validation, and prediction of 2D keypoints. With a keypoint detection model it is possible to detect interest points in an image, which allows us to estimate pose, angles, and dimensions of the target in question. Utilizing a weak-perspective direct linear transformation algorithm, the pose of a 3D object with respect to a camera from 3D to 2D correspondences could be determined. Additionally, from this correspondence, an aimpoint, mimicking laser tracking could be determined on the target. This work evaluates these methods and their accuracy against experimentally generated data with simulated real-world environments.Outstanding ThesisEnsign, United States NavyApproved for public release. Distribution is unlimited
ATRW: A Benchmark for Amur Tiger Re-identification in the Wild
Monitoring the population and movements of endangered species is an important
task to wildlife conversation. Traditional tagging methods do not scale to
large populations, while applying computer vision methods to camera sensor data
requires re-identification (re-ID) algorithms to obtain accurate counts and
moving trajectory of wildlife. However, existing re-ID methods are largely
targeted at persons and cars, which have limited pose variations and
constrained capture environments. This paper tries to fill the gap by
introducing a novel large-scale dataset, the Amur Tiger Re-identification in
the Wild (ATRW) dataset. ATRW contains over 8,000 video clips from 92 Amur
tigers, with bounding box, pose keypoint, and tiger identity annotations. In
contrast to typical re-ID datasets, the tigers are captured in a diverse set of
unconstrained poses and lighting conditions. We demonstrate with a set of
baseline algorithms that ATRW is a challenging dataset for re-ID. Lastly, we
propose a novel method for tiger re-identification, which introduces precise
pose parts modeling in deep neural networks to handle large pose variation of
tigers, and reaches notable performance improvement over existing re-ID
methods. The dataset is public available at https://cvwc2019.github.io/ .Comment: ACM Multimedia (MM) 202
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
To address the challenging task of instance-aware human part parsing, a new
bottom-up regime is proposed to learn category-level human semantic
segmentation as well as multi-person pose estimation in a joint and end-to-end
manner. It is a compact, efficient and powerful framework that exploits
structural information over different human granularities and eases the
difficulty of person partitioning. Specifically, a dense-to-sparse projection
field, which allows explicitly associating dense human semantics with sparse
keypoints, is learnt and progressively improved over the network feature
pyramid for robustness. Then, the difficult pixel grouping problem is cast as
an easier, multi-person joint assembling task. By formulating joint association
as maximum-weight bipartite matching, a differentiable solution is developed to
exploit projected gradient descent and Dykstra's cyclic projection algorithm.
This makes our method end-to-end trainable and allows back-propagating the
grouping error to directly supervise multi-granularity human representation
learning. This is distinguished from current bottom-up human parsers or pose
estimators which require sophisticated post-processing or heuristic greedy
algorithms. Experiments on three instance-aware human parsing datasets show
that our model outperforms other bottom-up alternatives with much more
efficient inference.Comment: CVPR 2021 (Oral). Code: https://github.com/tfzhou/MG-HumanParsin
Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation
Human-object interactions (HOI) recognition and pose estimation are two
closely related tasks. Human pose is an essential cue for recognizing actions
and localizing the interacted objects. Meanwhile, human action and their
interacted objects' localizations provide guidance for pose estimation. In this
paper, we propose a turbo learning framework to perform HOI recognition and
pose estimation simultaneously. First, two modules are designed to enforce
message passing between the tasks, i.e. pose aware HOI recognition module and
HOI guided pose estimation module. Then, these two modules form a closed loop
to utilize the complementary information iteratively, which can be trained in
an end-to-end manner. The proposed method achieves the state-of-the-art
performance on two public benchmarks including Verbs in COCO (V-COCO) and
HICO-DET datasets.Comment: AAAI201
- …