41 research outputs found

    Goal event detection in soccer videos via collaborative multimodal analysis

    Get PDF
    Detecting semantic events in sports video is crucial for video indexing and retrieval. Most existing works have exclusively relied on video content features, namely, directly available and extractable data from the visual and/or aural channels. Sole reliance on such data however, can be problematic due to the high-level semantic nature of video and the difficulty to properly align detected events with their exact time of occurrences. This paper proposes a framework for soccer goal event detection through collaborative analysis of multimodal features. Unlike previous approaches, the visual and aural contents are not directly scrutinized. Instead, an external textual source (i.e., minute-by-minute reports from sports websites) is used to initially localize the event search space. This step is vital as the event search space can significantly be reduced. This also makes further visual and aural analysis more efficient since excessive and unnecessary non-eventful segments are discarded, culminating in the accurate identification of the actual goal event segment. Experiments conducted on thirteen soccer matches are very promising with high accuracy rates being reported

    Feature based dynamic intra-video indexing

    Get PDF
    A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    It's about THYME: On the design and implementation of a time-aware reactive storage system for pervasive edge computing environments

    Get PDF
    This work was partially supported by Fundacao para a Ciencia e a Tecnologia (FCT-MCTES) through project DeDuCe (PTDC/CCI-COM/32166/2017), NOVA LINCS UIDB/04516/2020, and grant SFRH/BD/99486/2014; and by the European Union through project LightKone (grant agreement n. 732505).Nowadays, smart mobile devices generate huge amounts of data in all sorts of gatherings. Much of that data has localized and ephemeral interest, but can be of great use if shared among co-located devices. However, mobile devices often experience poor connectivity, leading to availability issues if application storage and logic are fully delegated to a remote cloud infrastructure. In turn, the edge computing paradigm pushes computations and storage beyond the data center, closer to end-user devices where data is generated and consumed, enabling the execution of certain components of edge-enabled systems directly and cooperatively on edge devices. In this article, we address the challenge of supporting reliable and efficient data storage and dissemination among co-located wireless mobile devices without resorting to centralized services or network infrastructures. We propose THYME, a novel time-aware reactive data storage system for pervasive edge computing environments, that exploits synergies between the storage substrate and the publish/subscribe paradigm. We present the design of THYME and elaborate a three-fold evaluation, through an analytical study, and both simulation and real world experimentations, characterizing the scenarios best suited for its use. The evaluation shows that THYME allows the notification and retrieval of relevant data with low overhead and latency, and also with low energy consumption, proving to be a practical solution in a variety of situations.publishersversionpublishe

    Metadata enhanced content management in media companies

    Get PDF
    Media companies are facing new opportunities and challenges. Communications, computing, and content industries are converging into a single, horizontally connected content value chain, where changes are frequent and activities are highly interdependent. However, before convergence and digital content are taken seriously, media companies must understand what is expected from them, how their operations will be affected, and why they should be involved. The production, distribution, and use of content rely heavily on computers and automation. This requires the content essence to be enhanced with explicit descriptions of semantics, or more specifically, semantic metadata. However, semantic metadata is useful only if its nature is understood clearly, and when its structure and usage are well defined. For this purpose, ontologies are needed to capture the essential characteristics of the content domain into a limited set of meaningful concepts. The creation and management of ontologies and semantic metadata require skills and activities that do not necessarily exist in traditional print-based publishing or broadcasting. Companies developing ontologies must understand the essential characteristics of available content, user needs, and planned or existing use of content. Furthermore, they must be able to express this information explicitly in an ontology and then reflect changes in the environment back to that ontology. Content production and distribution should be flexible and able to support the reuse of content. This thesis introduces two abstract models, a component model and a process model. Both models assist in the understanding and analysis of electronic publishing of content for multiple media products and on multiple media platforms. When semantic metadata, ontologies, and improved publishing processes are available, new advanced content-based products, such as personalized information feeds, are possible. The SmartPush project, for which the author was the project manager and worked as a researcher, has shown that semantic metadata is useful in creating advanced content-based products, and that media companies are willing to alter their existing publishing processes. Media companies participating in the SmartPush project have acknowledged the impact of our work on their plans and operations. Their acknowledgement emphasizes the practical importance of semantic metadata, ontologies, improved electronic publishing process, and personalization research.reviewe

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences
    corecore