Search CORE

74 research outputs found

Multimodal framework based on audio‐visual features for summarisation of cricket videos

Author: Adnan Syed
Irtaza Aun
Javed Ali
Mahmood Muhammad Tariq
Malik Hafiz
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/03/2019
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/166171/1/ipr2bf02094.pd

Deep Blue Documents at the University of Michigan

Audio-visual football video analysis, from structure detection to attention analysis

Author: Ren Reede
Publication venue
Publication date: 01/01/2008
Field of study

Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ﬁelds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciﬁc techniques for content-based sports video analysis to utilise these characteristics. For an efﬁcient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiﬁcation. Replay segments convey the most important contents in sports videos. It is an efﬁcient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ﬁve-layer adaboost classiﬁer and a logo template matching throughout an entire video. The ﬁve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ﬁlter out logo transition candidates. Subsequently, a logo template is constructed and employed to ﬁnd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ﬁve-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiﬁed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufﬁx tree is proposed to ﬁnd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reﬂection bias among modality salient signals and combines these signals by reﬂectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ﬁlled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ﬁnd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ﬁnal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

Content-based video indexing for sports applications using integrated multi-modal approach

Author: Tjondronegoro Dian W.
Publication venue: Deakin University, Faculty of Science and Technology, School of Information Technology
Publication date: 01/01/2005
Field of study

This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals

Deakin Research Online

Audiovisual framework for automatic soccer highlights generation

Author: Raventós Mayoral Arnau
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

Extracting low-level and mid-level descriptors from a soccer match to generate a summary of soccer highlights.Automatic generation of sports highlights from recorded audiovisual content has been object of great interest in recent years. The problem is indeed especially important in the production of second and third division highlights videos where the quantity of raw material is significant and does not contain manual annotations. In this thesis, a new approach for automatic generation of soccer highlights is proposed. The approach is based on the segmentation of the video sequence into shots that will be further ana- lyzed to determine its relevance and interest. For every video shot a set of low and mid level audio-visual descriptors are computed and combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. The main novelties of this work have been the temporal combination of two shot boundary detectors; the selection of keyframes using motion and color features; the generation of new soccer audio mid-level descriptors; the robust detection of soccer players; the employment of a novel object detection technique to spot goal-posts and finally, the creation of a flexible and user-friendly highlight gen- eration framework. The thesis is mainly devoted to the description of the global visual segmentation module, the selection of audiovisual descriptors and the general scheme for evaluating the measures of relevance. Several results have been produced using real soccer video sequences that prove the validity of the proposed framework

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Video analysis for replay detection in sport events

Author: Martínez Junyent David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

The postproduction cost of a sport event video requires lots of resources dedication and expenses of time trying to find the best highlights moments that will be used, for instance, in creating the summary of the event. This process can be optimized and improved in efficiency. During the event, the most important moments are repeated to offer to the audience the outstanding scene several times and from different points of view. The objective of the project is to automatically find the replays in live or pre-recorded transmission and accelerating the post-production process. The results will be part of the project CENIT-E BUSCAMEDIA CEN20091026, developed in the studios of Televisió de Catalunya (TVC) and which are focused on automated generation through content analysis. A software has been developed to detect the replays for different kind of sport events, principally soccer. This, implements many operation modes detailed during this report. We find from a mode rather manual to a full automatic mode, and moreover the percentages of success are presented after testing then using some videos from the TVC database. The structure of the work has been divided into five major sections: The first chapter begins by introducing us to the context in which it places the project, proposing the objectives to be achieved, and also discusses the data and tools used for their development. Subsequently, there is exposed the state of the art with a collection of methods used for the detection of repeats, which are the foundations on which we developed our methodology. The third chapter is the longest and complex. This contains the entire process of experimentation and improvements planned from the inception until the system implemented. In addition, the following section talks about the technical and exhibits the algorithm implemented in form of block diagram detailing all the operation modes. Finally, the last chapter contains all the results and conclusions after applying the algorithm on a set of videos taken from the database o f TVC, as well as its application in other areas such as Formula1 videos.Català: El cost de postproducció d‟un vídeo d‟un esdeveniment esportiu requereix la dedicació de molt recursos i temps en situar sobre el vídeo els moments destacats que s‟utilitzaran, per exemple, en la creació del resum del l‟esdeveniment. Aquest procés pot ser optimitzat i millorat en quant a eficiència. Durant el transcurs d‟aquest, els moments més destacats solen repetir-se per tal d‟oferir l‟escena varies vegades i des de diferents punts de vista. Aquest treball té com a objectiu principal la detecció d‟aquestes repeticions per tal d‟identificar els moments destacats i senyalitzar-ho per tal d‟agilitzar el procés de postproducció. Els resultats formaran part del projecte CENIT-E BUSCAMEDIA CEN20091026, desenvolupat als estudis de Televisió de Catalunya (TVC) i que tracta de generació automàtica mitjançant l‟anàlisi de continguts. S‟ha desenvolupat un software capaç de detectar les repeticions que apareixen en diferents tipus d‟esdeveniments esportius, principalment futbol. Aquest, implementa diferents modes d‟operació que veurem explicats en detall al llarg de la memòria. Trobem des d‟un mode mes aviat manual fins a un completament automàtic i es mostren els percentatge d‟èxit obtinguts després de realitzar proves funcionals utilitzant vídeos de la basa de dades de TVC. L‟estructura del treball s‟ha dividit en cinc grans apartats: El primer capítol comença introduint-nos en el context on es situa el projecte, proposant els objectius que es volen assolir, així com també parla sobre les dades i eines utilitzades pel seu desenvolupament. Posteriorment, s‟exposarà l‟estat de l‟art amb un recull dels mètodes més emprats per la detecció de repeticions i que han estat els fonaments sobre els que hem desenvolupat la nostra metodologia. El tercer capítol és el més llarg i complex. Conté tot el procés d‟experimentació i millores plantejat des de l‟inici fins arribar al sistema que s‟ha implementat. D‟altra banda, el següent apartat ens fa cinc cèntims de la part tècnica i exposa en forma de diagrama de blocs l‟algorisme implementat, explicant els mètodes possibles per utilitzar el sistema. Finalment, l‟últim capítol recull tot els resultats i conclusions extretes després d‟aplicar l‟algorisme en un conjunt de vídeos extrets de la base de dades de TVC, així com també l‟aplicació del mateix en altres àmbits com vídeos de Formula1

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Audiovisual processing for sports-video summarisation technology

Author: Sadlier David A.
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description ‘field-sports’. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable

Irish Universities

DCU Online Research Access Service

Automatic thumbnail selection for soccer videos using machine learning

Author: Halvorsen Pål
Hammou Malek
Hicks Steven
Husa Andreas
Johansen Dag
Kupka Tomas
Midoglu Cise
Riegler Michael
Publication venue: Association for Computing Machinery (ACM)
Publication date: 05/08/2022
Field of study

Thumbnail selection is a very important aspect of online sport video presentation, as thumbnails capture the essence of important events, engage viewers, and make video clips attractive to watch. Traditional solutions in the soccer domain for presenting highlight clips of important events such as goals, substitutions, and cards rely on the manual or static selection of thumbnails. However, such approaches can result in the selection of sub-optimal video frames as snapshots, which degrades the overall quality of the video clip as perceived by viewers, and consequently decreases viewership, not to mention that manual processes are expensive and time consuming. In this paper, we present an automatic thumbnail selection system for soccer videos which uses machine learning to deliver representative thumbnails with high relevance to video content and high visual quality in near real-time. Our proposed system combines a software framework which integrates logo detection, close-up shot detection, face detection, and image quality analysis into a modular and customizable pipeline, and a subjective evaluation framework for the evaluation of results. We evaluate our proposed pipeline quantitatively using various soccer datasets, in terms of complexity, runtime, and adherence to a pre-defined rule-set, as well as qualitatively through a user study, in terms of the perception of output thumbnails by end-users. Our results show that an automatic end-to-end system for the selection of thumbnails based on contextual relevance and visual quality can yield attractive highlight clips, and can be used in conjunction with existing soccer broadcast pipelines which require real-time operation

Munin - Open Research Archive

Video Abstracting at a Semantical Level

Author: von Wenzlawowicz Till
Publication venue
Publication date: 01/01/2018
Field of study

One the most common form of a video abstract is the movie trailer. Contemporary movie trailers share a common structure across genres which allows for an automatic generation and also reflects the corresponding moviea s composition. In this thesis a system for the automatic generation of trailers is presented. In addition to action trailers, the system is able to deal with further genres such as Horror and comedy trailers, which were first manually analyzed in order to identify their basic structures. To simplify the modeling of trailers and the abstract generation itself a new video abstracting application was developed. This application is capable of performing all steps of the abstract generation automatically and allows for previews and manual optimizations. Based on this system, new abstracting models for horror and comedy trailers were created and the corresponding trailers have been automatically generated using the new abstracting models. In an evaluation the automatic trailers were compared to the original Trailers and showed a similar structure. However, the automatically generated trailers still do not exhibit the full perfection of the Hollywood originals as they lack intentional storylines across shots

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Robust short clip representation and fast search through large video collections

Author: YUAN JUNSONG
Publication venue
Publication date: 18/10/2005
Field of study

Master'sMASTER OF ENGINEERIN

ScholarBank@NUS

Recommended from our members

User-centred video abstraction

Author: Darabi Kaveh
Publication venue
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThe rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques

Brunel University Research Archive