4,576 research outputs found
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
Every moment counts in action recognition. A comprehensive understanding of
human activity in video requires labeling every frame according to the actions
occurring, placing multiple labels densely over a video sequence. To study this
problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new
dataset of dense labels over unconstrained internet videos. Modeling multiple,
dense labels benefits from temporal relations within and across classes. We
define a novel variant of long short-term memory (LSTM) deep networks for
modeling these temporal relations via multiple input and output connections. We
show that this model improves action labeling accuracy and further enables
deeper understanding tasks ranging from structured retrieval to action
prediction.Comment: To appear in IJC
Learning Latent Super-Events to Detect Multiple Activities in Videos
In this paper, we introduce the concept of learning latent super-events from
activity videos, and present how it benefits activity detection in continuous
videos. We define a super-event as a set of multiple events occurring together
in videos with a particular temporal organization; it is the opposite concept
of sub-events. Real-world videos contain multiple activities and are rarely
segmented (e.g., surveillance videos), and learning latent super-events allows
the model to capture how the events are temporally related in videos. We design
temporal structure filters that enable the model to focus on particular
sub-intervals of the videos, and use them together with a soft attention
mechanism to learn representations of latent super-events. Super-event
representations are combined with per-frame or per-segment CNNs to provide
frame-level annotations. Our approach is designed to be fully differentiable,
enabling end-to-end learning of latent super-event representations jointly with
the activity detector using them. Our experiments with multiple public video
datasets confirm that the proposed concept of latent super-event learning
significantly benefits activity detection, advancing the state-of-the-arts.Comment: CVPR 201
Recognizing and Curating Photo Albums via Event-Specific Image Importance
Automatic organization of personal photos is a problem with many real world
ap- plications, and can be divided into two main tasks: recognizing the event
type of the photo collection, and selecting interesting images from the
collection. In this paper, we attempt to simultaneously solve both tasks:
album-wise event recognition and image- wise importance prediction. We
collected an album dataset with both event type labels and image importance
labels, refined from an existing CUFED dataset. We propose a hybrid system
consisting of three parts: A siamese network-based event-specific image
importance prediction, a Convolutional Neural Network (CNN) that recognizes the
event type, and a Long Short-Term Memory (LSTM)-based sequence level event
recognizer. We propose an iterative updating procedure for event type and image
importance score prediction. We experimentally verified that image importance
score prediction and event type recognition can each help the performance of
the other.Comment: Accepted as oral in BMVC 201
- …