Search CORE

1,103 research outputs found

Summarization of human activity videos via low-rank approximation

Author: Mademlis Ioannis
Tefas Anastasios
Nikolaidis Nikos
Pitas Ioannis
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2017
Field of study

Crossref

Biblioteca Digital de la Comunidad de Madrid

Explore Bristol Research

Summarization of human activity videos via low-rank approximation

Author: Mademlis Ioannis
Nikolaidis Nikos
Pitas Ioannis
Tefas Anastasios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Crossref

Explore Bristol Research

Second-order Temporal Pooling for Action Recognition

Author: Cherian Anoop
Gould Stephen
Publication venue
Publication date: 06/08/2018
Field of study

Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV

arXiv.org e-Print Archive

The Australian National University

Generalized Rank Pooling for Activity Recognition

Author: Cherian Anoop
Fernando Basura
Gould Stephen
Harandi Mehrtash
Publication venue
Publication date: 22/07/2017
Field of study

Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity. Usually, this pooling step discards the temporal order of the frames, which could otherwise be used for better recognition. Towards this end, we propose a novel pooling method, generalized rank pooling (GRP), that takes as input, features from the intermediate layers of a CNN that is trained on tiny sub-sequences, and produces as output the parameters of a subspace which (i) provides a low-rank approximation to the features and (ii) preserves their temporal order. We propose to use these parameters as a compact representation for the video sequence, which is then used in a classification setup. We formulate an objective for computing this subspace as a Riemannian optimization problem on the Grassmann manifold, and propose an efficient conjugate gradient scheme for solving it. Experiments on several activity recognition datasets show that our scheme leads to state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

Crossref

Automatic summarization of rushes video using bipartite graphs

Author: A Ferman
AF Smeaton
AF Smeaton
Alan F. Smeaton
C Liu
C Ngo
C Taskiran
D Byrne
J Canny
Liang Bai
Noel E. O’Connor
P Over
P Over
Songyang Lao
Y Dai
Y Ma
Yanli Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In this paper we present a new approach for automatic summarization of rushes, or unstructured video. Our approach is composed of three major steps. First, based on shot and sub-shot segmentations, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes video. Finally, the presence of faces and motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by TRECVid, using the same dataset and evaluation metrics used in the TRECVid video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement on our own work in terms of the fraction of the TRECVid summary ground truth included and is competitive with the best of other approaches in TRECVid 2007

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Analysing user physiological responses for affective video summarisation

Author: Agius
Allanson
Arthur G. Money
Boiten
Bradley
Brown
Cacioppo
Carlson
Cernekova
Coicca
Colombo
Detenber
Dimitrova
Ekman
Frazier
Fridja
Gomez
Gomez
Greenwald
Gross
Hagemann
Hanjalic
Hanjalic
Harry Agius
Kramer
Lang
Lew
Li
McIntyre
Money
Nasoz
Palomba
Philippot
Picard
Picard
Piferi
Power
Scheirer
Simon
Simons
Smeulders
Steinbeis
Suziki
van Reekum
VanDiest
Ward
Winton
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

This is the post-print version of the final paper published in Displays. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2009 Elsevier B.V.Video summarisation techniques aim to abstract the most significant content from a video stream. This is typically achieved by processing low-level image, audio and text features which are still quite disparate from the high-level semantics that end users identify with (the ‘semantic gap’). Physiological responses are potentially rich indicators of memorable or emotionally engaging video content for a given user. Consequently, we investigate whether they may serve as a suitable basis for a video summarisation technique by analysing a range of user physiological response measures, specifically electro-dermal response (EDR), respiration amplitude (RA), respiration rate (RR), blood volume pulse (BVP) and heart rate (HR), in response to a range of video content in a variety of genres including horror, comedy, drama, sci-fi and action. We present an analysis framework for processing the user responses to specific sub-segments within a video stream based on percent rank value normalisation. The application of the analysis framework reveals that users respond significantly to the most entertaining video sub-segments in a range of content domains. Specifically, horror content seems to elicit significant EDR, RA, RR and BVP responses, and comedy content elicits comparatively lower levels of EDR, but does seem to elicit significant RA, RR, BVP and HR responses. Drama content seems to elicit less significant physiological responses in general, and both sci-fi and action content seem to elicit significant EDR responses. We discuss the implications this may have for future affective video summarisation approaches

Crossref

Brunel University Research Archive