4,719 research outputs found
Beat-Event Detection in Action Movie Franchises
While important advances were recently made towards temporally localizing and
recognizing specific human actions or activities in videos, efficient detection
and classification of long video chunks belonging to semantically defined
categories such as "pursuit" or "romance" remains challenging.We introduce a
new dataset, Action Movie Franchises, consisting of a collection of Hollywood
action movie franchises. We define 11 non-exclusive semantic categories -
called beat-categories - that are broad enough to cover most of the movie
footage. The corresponding beat-events are annotated as groups of video shots,
possibly overlapping.We propose an approach for localizing beat-events based on
classifying shots into beat-categories and learning the temporal constraints
between shots. We show that temporal constraints significantly improve the
classification performance. We set up an evaluation protocol for beat-event
localization as well as for shot classification, depending on whether movies
from the same franchise are present or not in the training data
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
The THUMOS Challenge on Action Recognition for Videos "in the Wild"
Automatically recognizing and localizing wide ranges of human actions has
crucial importance for video understanding. Towards this goal, the THUMOS
challenge was introduced in 2013 to serve as a benchmark for action
recognition. Until then, video action recognition, including THUMOS challenge,
had focused primarily on the classification of pre-segmented (i.e., trimmed)
videos, which is an artificial task. In THUMOS 2014, we elevated action
recognition to a more practical level by introducing temporally untrimmed
videos. These also include `background videos' which share similar scenes and
backgrounds as action videos, but are devoid of the specific actions. The three
editions of the challenge organized in 2013--2015 have made THUMOS a common
benchmark for action classification and detection and the annual challenge is
widely attended by teams from around the world.
In this paper we describe the THUMOS benchmark in detail and give an overview
of data collection and annotation procedures. We present the evaluation
protocols used to quantify results in the two THUMOS tasks of action
classification and temporal detection. We also present results of submissions
to the THUMOS 2015 challenge and review the participating approaches.
Additionally, we include a comprehensive empirical study evaluating the
differences in action recognition between trimmed and untrimmed videos, and how
well methods trained on trimmed videos generalize to untrimmed videos. We
conclude by proposing several directions and improvements for future THUMOS
challenges.Comment: Preprint submitted to Computer Vision and Image Understandin
Developmentally regulated multisensory integration for prey localization in the medicinal leech
Medicinal leeches, like many aquatic animals, use water disturbances to localize their prey, so they need to be able to determine if a wave disturbance is created by prey or by another source. Many aquatic predators perform this separation by responding only to those wave frequencies representing their prey. As leeches' prey preference changes over the course of their development, we examined their responses at three different life stages. We found that juveniles more readily localize wave sources of lower frequencies (2 Hz) than their adult counterparts (8–12 Hz), and that adolescents exhibited elements of both juvenile and adult behavior, readily localizing sources of both frequencies. Leeches are known to be able to localize the source of waves through the use of either mechanical or visual information. We separately characterized their ability to localize various frequencies of stimuli using unimodal cues. Within a single modality, the frequency–response curves of adults and juveniles were virtually indistinguishable. However, the differences between the responses for each modality (visual and mechanosensory) were striking. The optimal visual stimulus had a much lower frequency (2 Hz) than the optimal mechanical stimulus (12 Hz). These frequencies matched, respectively, the juvenile and the adult preferred frequency for multimodally sensed waves. This suggests that, in the multimodal condition, adult behavior is driven more by mechanosensory information and juvenile behavior more by visual. Indeed, when stimuli of the two modalities were placed in conflict with one another, adult leeches, unlike juveniles, were attracted to the mechanical stimulus much more strongly than to the visual stimulus
- …