7,684 research outputs found

    An Overview of Multimodal Techniques for the Characterization of Sport Programmes

    Get PDF
    The problem of content characterization of sports videos is of great interest because sports video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of sports videos. We focus this analysis on the typology of the signal (audio, video, text captions, ...) from which the low-level features are extracted. First we consider the techniques based on visual information, then the methods based on audio information, and finally the algorithms based on audio-visual cues, used in a multi-modal fashion. This analysis shows that each type of signal carries some peculiar information, and the multi-modal approach can fully exploit the multimedia information associated to the sports video. Moreover, we observe that the characterization is performed either considering what happens in a specific time segment, observing therefore the features in a "static" way, or trying to capture their "dynamic" evolution in time. The effectiveness of each approach depends mainly on the kind of sports it relates to, and the type of highlights we are focusing on

    Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

    Get PDF
    This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited. T. Theodorou, I. Mpoas, A. Lazaridis, N. Fakotakis, 'Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News', International Journal on Artificial Intelligence Tools, Vol. 26 (2), April 2017, 1750005 (13 pages), DOI: 10.1142/S021821301750005. Ā© The Author(s).In this paper we describe an automatic sound recognition scheme for radio broadcast news based on principal component clustering with respect to the discrimination ability of the principal components. Specifically, streams of broadcast news transmissions, labeled based on the audio event, are decomposed using a large set of audio descriptors and project into the principal component space. A data-driven algorithm clusters the relevance of the components. The component subspaces are used by sound type classifier. This methodology showed that the k-nearest neighbor and the artificial intelligent network provide good results. Also, this methodology showed that discarding unnecessary dimension works in favor on the outcome, as it hardly deteriorates the effectiveness of the algorithms.Peer reviewe

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ļ¬elds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciļ¬c techniques for content-based sports video analysis to utilise these characteristics. For an efļ¬cient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewerā€™s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiļ¬cation. Replay segments convey the most important contents in sports videos. It is an efļ¬cient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ļ¬ve-layer adaboost classiļ¬er and a logo template matching throughout an entire video. The ļ¬ve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ļ¬lter out logo transition candidates. Subsequently, a logo template is constructed and employed to ļ¬nd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ļ¬ve-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiļ¬ed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufļ¬x tree is proposed to ļ¬nd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of ā€œnotice ā€. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reļ¬‚ection bias among modality salient signals and combines these signals by reļ¬‚ectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ļ¬lled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ļ¬nd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ļ¬nal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Learning efficient temporal information in deep networks: From the viewpoints of applications and modeling

    Get PDF
    With the introduction of deep learning, machine learning has dominated several technology areas, giving birth to high-performance applications that can even challenge human-level accuracy. However, the complexity of deep models is also exploding as a by-product of the revolution of machine learning. Such enormous model complexity has raised the new challenge of improving the efficiency in deep models to reduce deployment expense, especially for systems with high throughput demands or devices with limited power. The dissertation aims to improve the efficiency of temporal-sensitive deep models in four different directions. First, we develop a bandwidth extension mapping to avoid deploying multiple speech recognition systems corresponding to wideband and narrowband signals. Second, we apply a multi-modality approach to compensate for the performance of an excitement scoring system, where the input video sequences are aggressively down-sampled to reduce throughput. Third, we formulate the motion feature in the feature space by directly inducing the temporal information from intermediate layers of deep networks instead of relying on an additional optical flow stream. Finally, we model a spatiotemporal sampling network inspired by the human visual perception mechanism to reduce input frames and regions adaptively

    Cricket Player Profiling: Unraveling Strengths and Weaknesses Using Text Commentary Data

    Full text link
    Devising player-specific strategies in cricket necessitates a meticulous understanding of each player's unique strengths and weaknesses. Nevertheless, the absence of a definitive computational approach to extract such insights from cricket players poses a significant challenge. This paper seeks to address this gap by establishing computational models designed to extract the rules governing player strengths and weaknesses, thereby facilitating the development of tailored strategies for individual players. The complexity of this endeavor lies in several key areas: the selection of a suitable dataset, the precise definition of strength and weakness rules, the identification of an appropriate learning algorithm, and the validation of the derived rules. To tackle these challenges, we propose the utilization of unstructured data, specifically cricket text commentary, as a valuable resource for constructing comprehensive strength and weakness rules for cricket players. We also introduce computationally feasible definitions for the construction of these rules, and present a dimensionality reduction technique for the rule-building process. In order to showcase the practicality of this approach, we conduct an in-depth analysis of cricket player strengths and weaknesses using a vast corpus of more than one million text commentaries. Furthermore, we validate the constructed rules through two distinct methodologies: intrinsic and extrinsic. The outcomes of this research are made openly accessible, including the collected data, source code, and results for over 250 cricket players, which can be accessed at https://bit.ly/2PKuzx8.Comment: The initial work was published in the ICMLA 2019 conferenc
    • ā€¦
    corecore