4,719 research outputs found

    Beat-Event Detection in Action Movie Franchises

    Get PDF
    While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie franchises. We define 11 non-exclusive semantic categories - called beat-categories - that are broad enough to cover most of the movie footage. The corresponding beat-events are annotated as groups of video shots, possibly overlapping.We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots. We show that temporal constraints significantly improve the classification performance. We set up an evaluation protocol for beat-event localization as well as for shot classification, depending on whether movies from the same franchise are present or not in the training data

    Activity recognition from videos with parallel hypergraph matching on GPUs

    Full text link
    In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult discrete energy function mixing geometric or structural terms with data attached terms involving appearance features. Traditional methods solve this minimization problem approximately, for instance with spectral techniques. In this work, instead of solving the problem approximatively, the exact solution for the optimal assignment is calculated in parallel on GPUs. The graphical structure is simplified and regularized, which allows to derive an efficient recursive minimization algorithm. The algorithm distributes subproblems over the calculation units of a GPU, which solves them in parallel, allowing the system to run faster than real-time on medium-end GPUs

    The THUMOS Challenge on Action Recognition for Videos "in the Wild"

    Get PDF
    Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artificial task. In THUMOS 2014, we elevated action recognition to a more practical level by introducing temporally untrimmed videos. These also include `background videos' which share similar scenes and backgrounds as action videos, but are devoid of the specific actions. The three editions of the challenge organized in 2013--2015 have made THUMOS a common benchmark for action classification and detection and the annual challenge is widely attended by teams from around the world. In this paper we describe the THUMOS benchmark in detail and give an overview of data collection and annotation procedures. We present the evaluation protocols used to quantify results in the two THUMOS tasks of action classification and temporal detection. We also present results of submissions to the THUMOS 2015 challenge and review the participating approaches. Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. We conclude by proposing several directions and improvements for future THUMOS challenges.Comment: Preprint submitted to Computer Vision and Image Understandin

    Developmentally regulated multisensory integration for prey localization in the medicinal leech

    Get PDF
    Medicinal leeches, like many aquatic animals, use water disturbances to localize their prey, so they need to be able to determine if a wave disturbance is created by prey or by another source. Many aquatic predators perform this separation by responding only to those wave frequencies representing their prey. As leeches' prey preference changes over the course of their development, we examined their responses at three different life stages. We found that juveniles more readily localize wave sources of lower frequencies (2 Hz) than their adult counterparts (8–12 Hz), and that adolescents exhibited elements of both juvenile and adult behavior, readily localizing sources of both frequencies. Leeches are known to be able to localize the source of waves through the use of either mechanical or visual information. We separately characterized their ability to localize various frequencies of stimuli using unimodal cues. Within a single modality, the frequency–response curves of adults and juveniles were virtually indistinguishable. However, the differences between the responses for each modality (visual and mechanosensory) were striking. The optimal visual stimulus had a much lower frequency (2 Hz) than the optimal mechanical stimulus (12 Hz). These frequencies matched, respectively, the juvenile and the adult preferred frequency for multimodally sensed waves. This suggests that, in the multimodal condition, adult behavior is driven more by mechanosensory information and juvenile behavior more by visual. Indeed, when stimuli of the two modalities were placed in conflict with one another, adult leeches, unlike juveniles, were attracted to the mechanical stimulus much more strongly than to the visual stimulus
    • …
    corecore