7,095 research outputs found
Cascaded Boundary Regression for Temporal Action Detection
Temporal action detection in long videos is an important problem.
State-of-the-art methods address this problem by applying action classifiers on
sliding windows. Although sliding windows may contain an identifiable portion
of the actions, they may not necessarily cover the entire action instance,
which would lead to inferior performance. We adapt a two-stage temporal action
detection pipeline with Cascaded Boundary Regression (CBR) model.
Class-agnostic proposals and specific actions are detected respectively in the
first and the second stage. CBR uses temporal coordinate regression to refine
the temporal boundaries of the sliding windows. The salient aspect of the
refinement process is that, inside each stage, the temporal boundaries are
adjusted in a cascaded way by feeding the refined windows back to the system
for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and
achieve state-of-the-art performance on both datasets. The performance gain is
especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14
is improved from 19.0% to 31.0%
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
Robust face detection in the wild is one of the ultimate components to
support various facial related problems, i.e. unconstrained face recognition,
facial periocular recognition, facial landmarking and pose estimation, facial
expression recognition, 3D facial model construction, etc. Although the face
detection problem has been intensely studied for decades with various
commercial applications, it still meets problems in some real-world scenarios
due to numerous challenges, e.g. heavy facial occlusions, extremely low
resolutions, strong illumination, exceptionally pose variations, image or video
compression artifacts, etc. In this paper, we present a face detection approach
named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN)
to robustly solve the problems mentioned above. Similar to the region-based
CNNs, our proposed network consists of the region proposal component and the
region-of-interest (RoI) detection component. However, far apart of that
network, there are two main contributions in our proposed network that play a
significant role to achieve the state-of-the-art performance in face detection.
Firstly, the multi-scale information is grouped both in region proposal and RoI
detection to deal with tiny face regions. Secondly, our proposed network allows
explicit body contextual reasoning in the network inspired from the intuition
of human vision system. The proposed approach is benchmarked on two recent
challenging face detection databases, i.e. the WIDER FACE Dataset which
contains high degree of variability, as well as the Face Detection Dataset and
Benchmark (FDDB). The experimental results show that our proposed approach
trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE
Dataset by a large margin, and consistently achieves competitive results on
FDDB against the recent state-of-the-art face detection methods
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation
In this work, we address the problem of spatio-temporal action detection in
temporally untrimmed videos. It is an important and challenging task as finding
accurate human actions in both temporal and spatial space is important for
analyzing large-scale video data. To tackle this problem, we propose a cascade
proposal and location anticipation (CPLA) model for frame-level action
detection. There are several salient points of our model: (1) a cascade region
proposal network (casRPN) is adopted for action proposal generation and shows
better localization accuracy compared with single region proposal network
(RPN); (2) action spatio-temporal consistencies are exploited via a location
anticipation network (LAN) and thus frame-level action detection is not
conducted independently. Frame-level detections are then linked by solving an
linking score maximization problem, and temporally trimmed into spatio-temporal
action tubes. We demonstrate the effectiveness of our model on the challenging
UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.Comment: Accepted at BMVC 2017 (oral
S4Net: Single Stage Salient-Instance Segmentation
We consider an interesting problem-salient instance segmentation in this
paper. Other than producing bounding boxes, our network also outputs
high-quality instance-level segments. Taking into account the
category-independent property of each target, we design a single stage salient
instance segmentation framework, with a novel segmentation branch. Our new
branch regards not only local context inside each detection window but also its
surrounding context, enabling us to distinguish the instances in the same scope
even with obstruction. Our network is end-to-end trainable and runs at a fast
speed (40 fps when processing an image with resolution 320x320). We evaluate
our approach on a publicly available benchmark and show that it outperforms
other alternative solutions. We also provide a thorough analysis of the design
choices to help readers better understand the functions of each part of our
network. The source code can be found at
\url{https://github.com/RuochenFan/S4Net}
- …