Search CORE

10,087 research outputs found

Audio-based event detection for sports video

Author: A.K. Jain
D. Keislar
D. Pye
J.P. Cambell Jr.
K. Kobla
L. Rabiner
Y. Wang
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

In this paper, we present an audio-based event detection approach shown to be effective when applied to the Sports broadcast data. The main benefit of this approach is the ability to recognise patterns that indicate high levels of crowd response which can be correlated to key events. By applying Hidden Markov Model-based classifiers, where the predefined content classes are parameterised using Mel-Frequency Cepstral Coefficients, we were able to eliminate the need for defining a heuristic set of rules to determine event detection, thus avoiding a two-class approach shown not to be suitable for this problem. Experimentation indicated that this is an effective method for classifying crowd response in Soccer matches, thus providing a basis for automatic indexing and summarisation

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Enlighten

Recommended from our members

Temporal hybridity: Mixing live video footage with instant replay in real time

Author: Broth M
Engström A
Juhlin O
Perry M
Publication venue: ACM Press
Publication date: 01/01/2010
Field of study

Copyright @ 2010 ACMIn this paper we explore the production of streaming media that involves live and recorded content. To examine this, we report on how the production practices and process are conducted through an empirical study of the production of live television, involving the use of live and non-live media under highly time critical conditions. In explaining how this process is managed both as an individual and collective activity, we develop the concept of temporal hybridity to explain the properties of these kinds of production system and show how temporally separated media are used, understood and coordinated. Our analysis is examined in the light of recent developments in computing technology and we present some design implications to support amateur video production.The research was partly made possible by a grant from the Swedish Governmental Agency for Innovation Systems to the Mobile Life VinnExcellence Center, in partnership with SonyEricsson, Ericsson, Microsoft Research, Nokia Research, TeliaSonera and the City of Stockholm

Brunel University Research Archive

Combining Residual Networks with LSTMs for Lipreading

Author: Stafylakis Themos
Tzimiropoulos Georgios
Publication venue
Publication date: 22/05/2017
Field of study

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0, yielding 6.8 absolute improvement over the current state-of-the-art, without using information about word boundaries during training or testing.Comment: Submitted to Interspeech 201

arXiv.org e-Print Archive

Nottingham eTheses

Crossref

An audio-based sports video segmentation and event detection algorithm

Author: Baillie M.
Jose J.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

CiteSeerX

University of Strathclyde Institutional Repository

Enlighten

Face detection and clustering for video indexing applications

Author: Czirjék Csaba
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue: Acivs 2003
Publication date: 01/09/2003
Field of study

This paper describes a method for automatically detecting human faces in generic video sequences. We employ an iterative algorithm in order to give a confidence measure for the presence or absence of faces within video shots. Skin colour filtering is carried out on a selected number of frames per video shot, followed by the application of shape and size heuristics. Finally, the remaining candidate regions are normalized and projected into an eigenspace, the reconstruction error being the measure of confidence for presence/absence of face. Following this, the confidence score for the entire video shot is calculated. In order to cluster extracted faces into a set of face classes, we employ an incremental procedure using a PCA-based dissimilarity measure in con-junction with spatio-temporal correlation. Experiments were carried out on a representative broadcast news test corpus

Irish Universities

DCU Online Research Access Service

Indexing, browsing and searching of digital video

Author: Abe
Avaro
Brown
Chang
Chang
Choi
Goodrum
Hauptmann
Hirschman
Jarina
Kavanagh
Kazman
Koegel Buford
Kravtchenko
Le Gall
Lee
Lienhart
Marchionini
Maybury
McTear
Myers
Myllymaki
Poynton
Puri
Rasmussen
Rorvig
Rowley
Smyth
Sparck Jones
Stein
Wactlar
Wallace
Witbrock
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

A new framework for sign language recognition based on 3D handshape identification and linguistic modeling

Author: Dilsizian Mark
Metaxas Dimitris
Neidle Carol
Wang Shu
Yanovich Polina
Publication venue: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Publication date: 01/01/2014
Field of study

Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method less dependent than others on laboratory conditions and able to deal with variations in background and skin regions (such as the face, forearms, or other hands); (2) allows for identification of 3D hand configurations that are linguistically important in American Sign Language (ASL); and (3) incorporates statistical information reflecting linguistic constraints in sign production. For purposes of large-scale computer-based sign language recognition from video, the ability to distinguish hand configurations accurately is critical. Our current method estimates the 3D hand configuration to distinguish among 77 hand configurations linguistically relevant for ASL. Constraining the problem in this way makes recognition of 3D hand configuration more tractable and provides the information specifically needed for sign recognition. Further improvements are obtained by incorporation of statistical information about linguistic dependencies among handshapes within a sign derived from an annotated corpus of almost 10,000 sign tokens

Boston University Institutional Repository (OpenBU)

Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events

Author: Jonker W.
Petkovic M.
Publication venue: IEEE
Publication date: 01/01/2001
Field of study

As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us

CiteSeerX

Pure OAI Repository

University of Twente Research Information