5,743 research outputs found
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation
In this work, we address the problem of spatio-temporal action detection in
temporally untrimmed videos. It is an important and challenging task as finding
accurate human actions in both temporal and spatial space is important for
analyzing large-scale video data. To tackle this problem, we propose a cascade
proposal and location anticipation (CPLA) model for frame-level action
detection. There are several salient points of our model: (1) a cascade region
proposal network (casRPN) is adopted for action proposal generation and shows
better localization accuracy compared with single region proposal network
(RPN); (2) action spatio-temporal consistencies are exploited via a location
anticipation network (LAN) and thus frame-level action detection is not
conducted independently. Frame-level detections are then linked by solving an
linking score maximization problem, and temporally trimmed into spatio-temporal
action tubes. We demonstrate the effectiveness of our model on the challenging
UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.Comment: Accepted at BMVC 2017 (oral
- …