Search CORE

3,977 research outputs found

Deep attentive video summarization with distribution consistency learning

Author: Han Jungong
Ji Zhong
Li Xi
Pang Yanwei
Zhao Yuxiao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2021
Field of study

This article studies supervised video summarization by formulating it into a sequence-to-sequence learning framework, in which the input and output are sequences of original video frames and their predicted importance scores, respectively. Two critical issues are addressed in this article: short-term contextual attention insufficiency and distribution inconsistency. The former lies in the insufficiency of capturing the short-term contextual attention information within the video sequence itself since the existing approaches focus a lot on the long-term encoder-decoder attention. The latter refers to the distributions of predicted importance score sequence and the ground-truth sequence is inconsistent, which may lead to a suboptimal solution. To better mitigate the first issue, we incorporate a self-attention mechanism in the encoder to highlight the important keyframes in a short-term context. The proposed approach alongside the encoder-decoder attention constitutes our deep attentive models for video summarization. For the second one, we propose a distribution consistency learning method by employing a simple yet effective regularization loss term, which seeks a consistent distribution for the two sequences. Our final approach is dubbed as Attentive and Distribution consistent video Summarization (ADSum). Extensive experiments on benchmark data sets demonstrate the superiority of the proposed ADSum approach against state-of-the-art approaches

Warwick Research Archives Portal Repository

The Evolution of First Person Vision Methods: A Survey

Author: Betancourt Alejandro
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

arXiv.org e-Print Archive

CiteSeerX

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova

Video Storytelling: Textual Summaries for Events

Author: Kankanhalli Mohan S.
Li Junnan
Wong Yongkang
Zhao Qi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/07/2019
Field of study

Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this work, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future. Second, we propose a Narrator model to discover the underlying storyline. The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the Video Story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines, and show that our method achieves better performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi

arXiv.org e-Print Archive

ScholarBank@NUS

Rushes video summarization using a collaborative approach

Author: Bailer Werner
Bredin Hervé
Byrne Daragh
Dumont Emilie
Essid Slim
Haller Martin
Jones Gareth J.F.
Krutz Andreas
Merialdo Bernard
O'Connor Noel E.
Piatrik Tomas
Rehatschek Herwig
Sikora Thomas
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

This paper describes the video summarization system developed by the partners of the K-Space European Network of Excellence for the TRECVID 2008 BBC rushes summarization evaluation. We propose an original method based on individual content segmentation and selection tools in a collaborative system. Our system is organized in several steps. First, we segment the video, secondly we identify relevant and redundant segments, and finally, we select a subset of segments to concatenate and build the final summary with video acceleration incorporated. We analyze the performance of our system through the TRECVID evaluation

DCU Online Research Access Service

EURECOM Repository

Activity-driven content adaptation for effective video summarisation

Author: Feng Y.
Jiang J.
Ren Jinchang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided

University of Strathclyde Institutional Repository

Surrey Research Insight

Video summarisation: A conceptual framework and survey of the state of the art

Author: Arthur G. Money
Babaguchi
Boyatzis
Cernekova
Chang
Chang
Crockford
Dey
Dimitrova
Ekin
Ferman
Gianluigi
Hanjalic
Hanjalic
Harry Agius
Joffe
Kim
Lee
Lew
Li
Li
Lienhart
Ma
Moriyama
Ngo
Otsuka
Shih
Silverman
Taylor
Tjondronegoro
Tseng
Wang
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/02/2008
Field of study

This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

Crossref

Brunel University Research Archive