7,214 research outputs found
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
The goal of this paper is to take a single 2D image of a scene and recover
the 3D structure in terms of a small set of factors: a layout representing the
enclosing surfaces as well as a set of objects represented in terms of shape
and pose. We propose a convolutional neural network-based approach to predict
this representation and benchmark it on a large dataset of indoor scenes. Our
experiments evaluate a number of practical design questions, demonstrate that
we can infer this representation, and quantitatively and qualitatively
demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation
In this work, we address the problem of spatio-temporal action detection in
temporally untrimmed videos. It is an important and challenging task as finding
accurate human actions in both temporal and spatial space is important for
analyzing large-scale video data. To tackle this problem, we propose a cascade
proposal and location anticipation (CPLA) model for frame-level action
detection. There are several salient points of our model: (1) a cascade region
proposal network (casRPN) is adopted for action proposal generation and shows
better localization accuracy compared with single region proposal network
(RPN); (2) action spatio-temporal consistencies are exploited via a location
anticipation network (LAN) and thus frame-level action detection is not
conducted independently. Frame-level detections are then linked by solving an
linking score maximization problem, and temporally trimmed into spatio-temporal
action tubes. We demonstrate the effectiveness of our model on the challenging
UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.Comment: Accepted at BMVC 2017 (oral
- …