40,354 research outputs found

    Spatio-temporal Video Re-localization by Warp LSTM

    Full text link
    The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task

    Video Stream Retrieval of Unseen Queries using Semantic Memory

    Get PDF
    Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account for the breadth of possible queries, we adopt a no-example approach to query retrieval, which uses a query's semantic relatedness to pre-trained concept classifiers. To adapt to shifting video content, we propose memory pooling and memory welling methods that favor recent information over long past content. We identify two stream retrieval tasks, instantaneous retrieval at any particular time and continuous retrieval over a prolonged duration, and propose means for evaluating them. Three large scale video datasets are adapted to the challenge of stream retrieval. We report results for our search methods on the new stream retrieval tasks, as well as demonstrate their efficacy in a traditional, non-streaming video task.Comment: Presented at BMVC 2016, British Machine Vision Conference, 201

    Indexing of fictional video content for event detection and summarisation

    Get PDF
    This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach

    Indexing of fictional video content for event detection and summarisation

    Get PDF
    This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach

    Indexing of fictional video content for event detection and summarisation

    Get PDF
    This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach
    corecore