159 research outputs found
Temporal activity detection in untrimmed videos with recurrent neural networks
This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network (RNN) that learns to classify video clips of 16 frames. After clip prediction, we post-process the output of the RNN to assign a single activity label to each video, and determine the temporal boundaries of the activity within the video. We show how our system can achieve competitive results in both tasks with a simple architecture. We evaluate our method in the ActivityNet Challenge 2016, achieving a 0.5874 mAP and a 0.2237 mAP in the classification and detection tasks, respectively. Our code and models are publicly available at: https://imatge-upc.github.io/activitynet-2016-cvprw/Peer ReviewedPostprint (published version
When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks
Discovering and exploiting the causality in deep neural networks (DNNs) are
crucial challenges for understanding and reasoning causal effects (CE) on an
explainable visual model. "Intervention" has been widely used for recognizing a
causal relation ontologically. In this paper, we propose a causal inference
framework for visual reasoning via do-calculus. To study the intervention
effects on pixel-level features for causal reasoning, we introduce pixel-wise
masking and adversarial perturbation. In our framework, CE is calculated using
features in a latent space and perturbed prediction from a DNN-based model. We
further provide the first look into the characteristics of discovered CE of
adversarially perturbed images generated by gradient-based methods
\footnote{~~https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvImg}.
Experimental results show that CE is a competitive and robust index for
understanding DNNs when compared with conventional methods such as
class-activation mappings (CAMs) on the Chest X-Ray-14 dataset for
human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds
promises for detecting adversarial examples as it possesses distinct
characteristics in the presence of adversarial perturbations.Comment: Noted our camera-ready version has changed the title. "When Causal
Intervention Meets Adversarial Examples and Image Masking for Deep Neural
Networks" as the v3 official paper title in IEEE Proceeding. Please use it in
your formal reference. Accepted at IEEE ICIP 2019. Pytorch code has released
on https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvIm
RED: Reinforced Encoder-Decoder Networks for Action Anticipation
Action anticipation aims to detect an action before it happens. Many real
world applications in robotics and surveillance are related to this predictive
capability. Current methods address this problem by first anticipating visual
representations of future frames and then categorizing the anticipated
representations to actions. However, anticipation is based on a single past
frame's representation, which ignores the history trend. Besides, it can only
anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED)
network for action anticipation. RED takes multiple history representations as
input and learns to anticipate a sequence of future representations. One
salient aspect of RED is that a reinforcement module is adopted to provide
sequence-level supervision; the reward function is designed to encourage the
system to make correct predictions as early as possible. We test RED on
TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation
and achieve state-of-the-art performance on all datasets
Cascaded Boundary Regression for Temporal Action Detection
Temporal action detection in long videos is an important problem.
State-of-the-art methods address this problem by applying action classifiers on
sliding windows. Although sliding windows may contain an identifiable portion
of the actions, they may not necessarily cover the entire action instance,
which would lead to inferior performance. We adapt a two-stage temporal action
detection pipeline with Cascaded Boundary Regression (CBR) model.
Class-agnostic proposals and specific actions are detected respectively in the
first and the second stage. CBR uses temporal coordinate regression to refine
the temporal boundaries of the sliding windows. The salient aspect of the
refinement process is that, inside each stage, the temporal boundaries are
adjusted in a cascaded way by feeding the refined windows back to the system
for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and
achieve state-of-the-art performance on both datasets. The performance gain is
especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14
is improved from 19.0% to 31.0%
- …