Search CORE

465 research outputs found

Temporal Attention-Gated Model for Robust Sequence Classification

Author: Baltrušaitis Tadas
Morency Louis-Philippe
Pei Wenjie
Tax David M. J.
Publication venue
Publication date: 15/04/2017
Field of study

Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201

arXiv.org e-Print Archive

Crossref

An Overview of Multimodal Techniques for the Characterization of Sport Programmes

Author: Leonardi Riccardo
Nicola Adami
Pierangelo Migliorati
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2003
Field of study

The problem of content characterization of sports videos is of great interest because sports video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of sports videos. We focus this analysis on the typology of the signal (audio, video, text captions, ...) from which the low-level features are extracted. First we consider the techniques based on visual information, then the methods based on audio information, and finally the algorithms based on audio-visual cues, used in a multi-modal fashion. This analysis shows that each type of signal carries some peculiar information, and the multi-modal approach can fully exploit the multimedia information associated to the sports video. Moreover, we observe that the characterization is performed either considering what happens in a specific time segment, observing therefore the features in a "static" way, or trying to capture their "dynamic" evolution in time. The effectiveness of each approach depends mainly on the kind of sports it relates to, and the type of highlights we are focusing on

Crossref

Archivio istituzionale della ricerca - Università di Brescia

Video summarisation: A conceptual framework and survey of the state of the art

Author: Arthur G. Money
Babaguchi
Boyatzis
Cernekova
Chang
Chang
Crockford
Dey
Dimitrova
Ekin
Ferman
Gianluigi
Hanjalic
Hanjalic
Harry Agius
Joffe
Kim
Lee
Lew
Li
Li
Lienhart
Ma
Moriyama
Ngo
Otsuka
Shih
Silverman
Taylor
Tjondronegoro
Tseng
Wang
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/02/2008
Field of study

This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

Crossref

Brunel University Research Archive

Automated classification of cricket pitch frames in cricket video

Author: Bananki Jayanth Sandesh
Srinivasa Gowri
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2014
Field of study

The automated detection of the cricket pitch in a video recording of a cricket match is a fundamental step in content-based indexing and summarization of cricket videos. In this paper, we propose visualcontent based algorithms to automate the extraction of video frames with the cricket pitch in focus. As a preprocessing step, we first select a subset of frames with a view of the cricket field, of which the cricket pitch forms a part. This filtering process reduces the search space by eliminating frames that contain a view of the audience, close-up shots of specific players, advertisements, etc. The subset of frames containing the cricket field is then subject to statistical modeling of the grayscale (brightness) histogram (SMoG). Since SMoG does not utilize color or domain-specific information such as the region in the frame where the pitch is expected to be located, we propose an alternative algorithm: component quantization based region of interest extraction (CQRE) for the extraction of pitch frames. Experimental results demonstrate that, regardless of the quality of the input, successive application of the two methods outperforms either one applied exclusively. The SMoG-CQRE combination for pitch frame classification yields an average accuracy of 98:6% in the best case (a high resolution video with good contrast) and an average accuracy of 87:9% in the worst case (a low resolution video with poor contrast). Since, the extraction of pitch frames forms the first step in analyzing the important events in a match, we also present a post-processing step, viz. , an algorithm to detect players in the extracted pitch frames

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB