61,154 research outputs found
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks.
These tasks hold clues to each-others solutions, however as these relations can
be complex this remains a rarely utilized property. When task relations are
explicitly defined based on domain knowledge multi-task learning (MTL) offers
such concurrent solutions, while exploiting relatedness between multiple tasks
performed over the same dataset. In most cases however, this relatedness is not
explicitly defined and the domain expert knowledge that defines it is not
available. To address this issue, we introduce Selective Sharing, a method that
learns the inter-task relatedness from secondary latent features while the
model trains. Using this insight, we can automatically group tasks and allow
them to share knowledge in a mutually beneficial way. We support our method
with experiments on 5 datasets in classification, regression, and ranking tasks
and compare to strong baselines and state-of-the-art approaches showing a
consistent improvement in terms of accuracy and parameter counts. In addition,
we perform an activation region analysis showing how Selective Sharing affects
the learned representation.Comment: To appear in ICMR 2019 (Oral + Lightning Talk + Poster
CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification
Unsupervised person re-ID is the task of identifying people on a target data
set for which the ID labels are unavailable during training. In this paper, we
propose to unify two trends in unsupervised person re-ID: clustering &
fine-tuning and adversarial learning. On one side, clustering groups training
images into pseudo-ID labels, and uses them to fine-tune the feature extractor.
On the other side, adversarial learning is used, inspired by domain adaptation,
to match distributions from different domains. Since target data is distributed
across different camera viewpoints, we propose to model each camera as an
independent domain, and aim to learn domain-independent features.
Straightforward adversarial learning yields negative transfer, we thus
introduce a conditioning vector to mitigate this undesirable effect. In our
framework, the centroid of the cluster to which the visual sample belongs is
used as conditioning vector of our conditional adversarial network, where the
vector is permutation invariant (clusters ordering does not matter) and its
size is independent of the number of clusters. To our knowledge, we are the
first to propose the use of conditional adversarial networks for unsupervised
person re-ID. We evaluate the proposed architecture on top of two
state-of-the-art clustering-based unsupervised person re-identification (re-ID)
methods on four different experimental settings with three different data sets
and set the new state-of-the-art performance on all four of them. Our code and
model will be made publicly available at
https://team.inria.fr/perception/canu-reid/
- …