241 research outputs found
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
Unsupervised Learning of Edges
Data-driven approaches for edge detection have proven effective and achieve
top results on modern benchmarks. However, all current data-driven edge
detectors require manual supervision for training in the form of hand-labeled
region segments or object boundaries. Specifically, human annotators mark
semantically meaningful edges which are subsequently used for training. Is this
form of strong, high-level supervision actually necessary to learn to
accurately detect edges? In this work we present a simple yet effective
approach for training edge detectors without human supervision. To this end we
utilize motion, and more specifically, the only input to our method is noisy
semi-dense matches between frames. We begin with only a rudimentary knowledge
of edges (in the form of image gradients), and alternate between improving
motion estimation and edge detection in turn. Using a large corpus of video
data, we show that edge detectors trained using our unsupervised scheme
approach the performance of the same methods trained with full supervision
(within 3-5%). Finally, we show that when using a deep network for the edge
detector, our approach provides a novel pre-training scheme for object
detection.Comment: Camera ready version for CVPR 201
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
We address the challenging task of Localization via Embodied Dialog (LED).
Given a dialog from two agents, an Observer navigating through an unknown
environment and a Locator who is attempting to identify the Observer's
location, the goal is to predict the Observer's final location in a map. We
develop a novel LED-Bert architecture and present an effective pretraining
strategy. We show that a graph-based scene representation is more effective
than the top-down 2D maps used in prior works. Our approach outperforms
previous baselines
Does Continual Learning = Catastrophic Forgetting?
Continual learning is known for suffering from catastrophic forgetting, a
phenomenon where earlier learned concepts are forgotten at the expense of more
recent samples. In this work, we challenge the assumption that continual
learning is inevitably associated with catastrophic forgetting by presenting a
set of tasks that surprisingly do not suffer from catastrophic forgetting when
learned continually. We provide evidence that these reconstruction-type tasks
exhibit positive forward transfer and that single-view 3D shape reconstruction
improves the performance on learned and novel categories over time. We provide
the novel analysis of knowledge transfer ability by looking at the output
distribution shift across sequential learning tasks. Finally, we show that the
robustness of these tasks leads to the potential of having a proxy
representation learning task for continual classification. The codebase,
dataset, and pre-trained models released with this article can be found at
https://github.com/rehg-lab/CLRec
- …