515 research outputs found

    Semantic Analysis of High-definition MPEG-2 Soccer Video Using Bayesian Network

    Get PDF
    近年,インターネットのブロードバンド化に伴い,映像配信が普及し,また,地上デジタル放送や,BS・CSデジタル放送などの衛星放送により,ユーザが試聴できる番組の数が急増してきている.パソコンやレコーダのハードディスクの容量も増え,大量の番組(コンテンツ)を保存することが可能となったが,その反面,膨大な映像データの中から,視聴者の求めるシーンを素早く検索する技術の必要性がこれまでにも増して高まって来ている.本研究はサッカー映像のリプレーシーンとゴール付近のハイライトシーンの検出方法を提案する.シーンの検出には,MPEG-2エンコーダによって圧縮されたハイビジョンサッカー映像から抽出した特徴量とハイライトシーンとの間の因果関係をベイジアンネットワークで記述する手法を用いる.ベイジアンネットワークを用いることにより,抽出された特徴量からハイライトシーンの発生を確率的に推論することが可能になる.すでにベイジアンネットワークを用いたサッカー映像のハイライトシーンの検出法は提案されているが,それらの方法では,フレーム毎に画素単位でさまざまな画像処理を映像に施すことによって求めた特徴量を利用している.そのため,画面が大きくなると計算コストも大きくなるので,リアルタイム処理には専用の処理装置が必要になる.本研究で提案する方法はMPEG-2圧縮データに含まれている符号化パラメータから特徴量を計算するので,従来法に比べて計算量が少なく,ハイビジョンなどの高解像度映像であっても,通常のPCを用いてリアルタイム処理が可能である.また,従来法では各種シーンに対してベイジアンネットワークが提案されているが,いずれも,ネットワークモデル中のシーンに関わるイベントがすべてフレーム単位で定義されている.例えば,従来法のゴールシーンに関わる,ゴールゲートの出現,観客の声,リプレーの発生等のイベントは全てフレーム単位で数えている.しかし,各イベントの開始・終了フレームを明確に判定する手法が明らかにされておらず,場合によっては人の手で行わなう必要がある.そのため,ベイジアンネットワークを学習する時に、各種イベントの時間帯の与え方に誤差が含まれる可能性がある.さらに、テストビデオから,シーン検出する時,シーンの始終時間帯の検出も困難である.本研究の提案手法では,まず,MPEG-2圧縮データから直接抽出した符号化パラメータの特徴的な変化から,カメラの切り換えに伴う画面の切り替るカット点を検出し,隣接する二つのカット点間をショットとして定義する.さらに各ショットの特徴量を調べることにより,ショットをいくつかのイベントクラスに分類する.さらに,シーンをある特徴的なイベントの発生として捉えることにより,シーンの検出を行う.本手法では,各イベントの開始・終了時刻をショットのカット点によって明確に与えることができることができ,しかもMPEG-2圧縮データから自動的に求めることが可能である.提案方式の性能評価のために,実際のビデオデータを使用した検出実験を行ったところ,ゴール付近で起こるイベントシーンの再現率が86.17%,適合率90.76%,またリプレーシーンの再現率が81.00%, 適合率92.57%という検出結果が得られた.一方,従来法の検出結果では,同一のビデオデータではないが,ゴール付近で起こるイベントシーンの再現率71.1%,適合率89.8%であり,提案方式のほうが従来法に比べ,再現率,適合率ともに上回り,とくに再現率の向上が顕著である.以上のことより,提案法の有効性が確認された.電気通信大学201

    Bring it to the Pitch: Combining Video and Movement Data to Enhance Team Sport Analysis

    Get PDF
    Analysts in professional team sport regularly perform analysis to gain strategic and tactical insights into player and team behavior. Goals of team sport analysis regularly include identification of weaknesses of opposing teams, or assessing performance and improvement potential of a coached team. Current analysis workflows are typically based on the analysis of team videos. Also, analysts can rely on techniques from Information Visualization, to depict e.g., player or ball trajectories. However, video analysis is typically a time-consuming process, where the analyst needs to memorize and annotate scenes. In contrast, visualization typically relies on an abstract data model, often using abstract visual mappings, and is not directly linked to the observed movement context anymore. We propose a visual analytics system that tightly integrates team sport video recordings with abstract visualization of underlying trajectory data. We apply appropriate computer vision techniques to extract trajectory data from video input. Furthermore, we apply advanced trajectory and movement analysis techniques to derive relevant team sport analytic measures for region, event and player analysis in the case of soccer analysis. Our system seamlessly integrates video and visualization modalities, enabling analysts to draw on the advantages of both analysis forms. Several expert studies conducted with team sport analysts indicate the effectiveness of our integrated approach

    Public Perception of Male Athletes Vs. Female Athletes in the Media

    Get PDF
    In this experiment, my goal was to determine if public perception of female athletes differed from public perception of male athletes. Female athletes are underrepresented in the media (Eastman and Billings, 2000), and because of this, public perception of male athletes might differ from their perceptions of female athletes in the media. I hypothesized that my respondents would best remember the female athletes appearance, best remember the male athletes interview content and that the female and male respondents who took my experiment would evaluate each athlete differently based on their own gender and the athletes’ gender. My results indicated that the respondents who watched the female student-athletes’ interview were more likely to write more detailed responses about dress and appearance, while at the same time, adding negative and malicious comments about them. Those who watched the male student-athletes’ interview were simpler in their dress and appearance descriptions, and the male student-athlete rarely received negative comments. Additionally, female respondents were more likely to pay attention to the male student-athletes’ interview than the female student-athletes interview. The male respondents were less diligent than the female respondents in recalling the interview content from both the male and female student-athletes, but more likely to recall the information from the male student-athletes’ interview. Female respondents were also more likely to detect emotions over the male student respondents. I believe, the results from my research boils down to female athletes being more critically judged in the media because of their underrepresentation (Eastman and Billings, 2000). In order to help stop this negativity female athletes receive, like the female in my experiment, I believe having more media training that provides insights on what to wear and how to look could lead to more positive comments for viewers watching female athletes on television. My vision is that the content of this thesis sparks further research so female athletes can be viewed the same way as male athletes

    Hierarchical Multimodal Attention for Deep Video Summarization

    Get PDF
    International audienceThe way people consume sports on TV has drastically evolved in the last years, particularly under the combined effects of the legalization of sport betting and the huge increase of sport analytics. Several companies are nowadays sending observers in the stadiums to collect live data of all the events happening on the field during the match. Those data contain meaningful information providing a very detailed description of all the actions occurring during the match to feed the coaches and staff, the fans, the viewers, and the gamblers. Exploiting all these data, sport broadcasters want to generate extra content such as match highlights, match summaries, players and teams analytics, etc., to appeal subscribers. This paper explores the problem of summarizing professional soccer matches as automatically as possible using both the aforementioned event-stream data collected from the field and the content broadcasted on TV. We have designed an architecture, introducing first (1) a Multiple Instance Learning method that takes into account the sequential dependency among events and then (2) a hierarchical multimodal attention layer that grasps the importance of each event in an action. We evaluate our approach on matches from two professional European soccer leagues, showing its capability to identify the best actions for automatic summarization by comparing with real summaries made by human operators

    A Fuzzy Logic-Based System for Soccer Video Scenes Classification

    Get PDF
    Massive global video surveillance worldwide captures data but lacks detailed activity information to flag events of interest, while the human burden of monitoring video footage is untenable. Artificial intelligence (AI) can be applied to raw video footage to identify and extract required information and summarize it in linguistic formats. Video summarization automation usually involves text-based data such as subtitles, segmenting text and semantics, with little attention to video summarization in the processing of video footage only. Classification problems in recorded videos are often very complex and uncertain due to the dynamic nature of the video sequence and light conditions, background, camera angle, occlusions, indistinguishable scene features, etc. Video scene classification forms the basis of linguistic video summarization, an open research problem with major commercial importance. Soccer video scenes present added challenges due to specific objects and events with similar features (e.g. “people” include audiences, coaches, and players), as well as being constituted from a series of quickly changing and dynamic frames with small inter-frame variations. There is an added difficulty associated with the need to have light weight video classification systems working in real time with massive data sizes. In this thesis, we introduce a novel system based on Interval Type-2 Fuzzy Logic Classification Systems (IT2FLCS) whose parameters are optimized by the Big Bang–Big Crunch (BB-BC) algorithm, which allows for the automatic scenes classification using optimized rules in broadcasted soccer matches video. The type-2 fuzzy logic systems would be unequivocal to present a highly interpretable and transparent model which is very suitable for the handling the encountered uncertainties in video footages and converting the accumulated data to linguistic formats which can be easily stored and analysed. Meanwhile the traditional black box techniques, such as support vector machines (SVMs) and neural networks, do not provide models which could be easily analysed and understood by human users. The BB-BC optimization is a heuristic, population-based evolutionary approach which is characterized by the ease of implementation, fast convergence and low computational cost. We employed the BB-BC to optimize our system parameters of fuzzy logic membership functions and fuzzy rules. Using the BB-BC we are able to balance the system transparency (through generating a small rule set) together with increasing the accuracy of scene classification. Thus, the proposed fuzzy-based system allows achieving relatively high classification accuracy with a small number of rules thus increasing the system interpretability and allowing its real-time processing. The type-2 Fuzzy Logic Classification System (T2FLCS) obtained 87.57% prediction accuracy in the scene classification of our testing group data which is better than the type-1 fuzzy classification system and neural networks counterparts. The BB-BC optimization algorithms decrease the size of rule bases both in T1FLCS and T2FLCS; the T2FLCS finally got 85.716% with reduce rules, outperforming the T1FLCS and neural network counterparts, especially in the “out-of-range data” which validates the T2FLCSs capability to handle the high level of faced uncertainties. We also presented a novel approach based on the scenes classification system combined with the dynamic time warping algorithm to implement the video events detection for real world processing. The proposed system could run on recorded or live video clips and output a label to describe the event in order to provide the high level summarization of the videos to the user

    More playful user interfaces:interfaces that invite social and physical interaction

    Get PDF

    Audiovisual framework for automatic soccer highlights generation

    Get PDF
    Extracting low-level and mid-level descriptors from a soccer match to generate a summary of soccer highlights.Automatic generation of sports highlights from recorded audiovisual content has been object of great interest in recent years. The problem is indeed especially important in the production of second and third division highlights videos where the quantity of raw material is significant and does not contain manual annotations. In this thesis, a new approach for automatic generation of soccer highlights is proposed. The approach is based on the segmentation of the video sequence into shots that will be further ana- lyzed to determine its relevance and interest. For every video shot a set of low and mid level audio-visual descriptors are computed and combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. The main novelties of this work have been the temporal combination of two shot boundary detectors; the selection of keyframes using motion and color features; the generation of new soccer audio mid-level descriptors; the robust detection of soccer players; the employment of a novel object detection technique to spot goal-posts and finally, the creation of a flexible and user-friendly highlight gen- eration framework. The thesis is mainly devoted to the description of the global visual segmentation module, the selection of audiovisual descriptors and the general scheme for evaluating the measures of relevance. Several results have been produced using real soccer video sequences that prove the validity of the proposed framework

    Inférence de la grammaire structurelle d’une émission TV récurrente à partir du contenu

    Get PDF
    TV program structuring raises as a major theme in last decade for the task of high quality indexing. In this thesis, we address the problem of unsupervised TV program structuring from the point of view of grammatical inference, i.e., discovering a common structural model shared by a collection of episodes of a recurrent program. Using grammatical inference makes it possible to rely on only minimal domain knowledge. In particular, we assume no prior knowledge on the structural elements that might be present in a recurrent program and very limited knowledge on the program type, e.g., to name structural elements, apart from the recurrence. With this assumption, we propose an unsupervised framework operating in two stages. The first stage aims at determining the structural elements that are relevant to the structure of a program. We address this issue making use of the property of element repetitiveness in recurrent programs, leveraging temporal density analysis to filter out irrelevant events and determine valid elements. Having discovered structural elements, the second stage is to infer a grammar of the program. We explore two inference techniques based either on multiple sequence alignment or on uniform resampling. A model of the structure is derived from the grammars and used to predict the structure of new episodes. Evaluations are performed on a selection of four different types of recurrent programs. Focusing on structural element determination, we analyze the effect on the number of determined structural elements, fixing the threshold applied on the density function as well as the size of collection of episodes. For structural grammar inference, we discuss the quality of the grammars obtained and show that they accurately reflect the structure of the program. We also demonstrate that the models obtained by grammatical inference can accurately predict the structure of unseen episodes, conducting a quantitative and comparative evaluation of the two methods by segmenting the new episodes into their structural components. Finally, considering the limitations of our work, we discuss a number of open issues in structure discovery and propose three new research directions to address in future work.Dans cette thèse, on aborde le problème de structuration des programmes télévisés de manière non supervisée à partir du point de vue de l'inférence grammaticale, focalisant sur la découverte de la structure des programmes récurrents à partir une collection homogène. On vise à découvrir les éléments structuraux qui sont pertinents à la structure du programme, et à l’inférence grammaticale de la structure des programmes. Des expérimentations montrent que l'inférence grammaticale permet de utiliser minimum des connaissances de domaine a priori pour atteindre la découverte de la structure des programmes

    Audiovisual processing for sports-video summarisation technology

    Get PDF
    In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description ‘field-sports’. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable
    corecore