    Semantic Analysis of High-definition MPEG-2 Soccer Video Using Bayesian Network

    近年,インターネットのブロードバンド化に伴い,映像配信が普及し,また,地上デジタル放送や,BS・CSデジタル放送などの衛星放送により,ユーザが試聴できる番組の数が急増してきている.パソコンやレコーダのハードディスクの容量も増え,大量の番組(コンテンツ)を保存することが可能となったが,その反面,膨大な映像データの中から,視聴者の求めるシーンを素早く検索する技術の必要性がこれまでにも増して高まって来ている.本研究はサッカー映像のリプレーシーンとゴール付近のハイライトシーンの検出方法を提案する.シーンの検出には,MPEG-2エンコーダによって圧縮されたハイビジョンサッカー映像から抽出した特徴量とハイライトシーンとの間の因果関係をベイジアンネットワークで記述する手法を用いる.ベイジアンネットワークを用いることにより,抽出された特徴量からハイライトシーンの発生を確率的に推論することが可能になる.すでにベイジアンネットワークを用いたサッカー映像のハイライトシーンの検出法は提案されているが,それらの方法では,フレーム毎に画素単位でさまざまな画像処理を映像に施すことによって求めた特徴量を利用している.そのため,画面が大きくなると計算コストも大きくなるので,リアルタイム処理には専用の処理装置が必要になる.本研究で提案する方法はMPEG-2圧縮データに含まれている符号化パラメータから特徴量を計算するので,従来法に比べて計算量が少なく,ハイビジョンなどの高解像度映像であっても,通常のPCを用いてリアルタイム処理が可能である.また,従来法では各種シーンに対してベイジアンネットワークが提案されているが,いずれも,ネットワークモデル中のシーンに関わるイベントがすべてフレーム単位で定義されている.例えば,従来法のゴールシーンに関わる,ゴールゲートの出現,観客の声,リプレーの発生等のイベントは全てフレーム単位で数えている.しかし,各イベントの開始・終了フレームを明確に判定する手法が明らかにされておらず,場合によっては人の手で行わなう必要がある.そのため,ベイジアンネットワークを学習する時に、各種イベントの時間帯の与え方に誤差が含まれる可能性がある.さらに、テストビデオから,シーン検出する時,シーンの始終時間帯の検出も困難である.本研究の提案手法では,まず,MPEG-2圧縮データから直接抽出した符号化パラメータの特徴的な変化から,カメラの切り換えに伴う画面の切り替るカット点を検出し,隣接する二つのカット点間をショットとして定義する.さらに各ショットの特徴量を調べることにより,ショットをいくつかのイベントクラスに分類する.さらに,シーンをある特徴的なイベントの発生として捉えることにより,シーンの検出を行う.本手法では,各イベントの開始・終了時刻をショットのカット点によって明確に与えることができることができ,しかもMPEG-2圧縮データから自動的に求めることが可能である.提案方式の性能評価のために,実際のビデオデータを使用した検出実験を行ったところ,ゴール付近で起こるイベントシーンの再現率が86.17%,適合率90.76%,またリプレーシーンの再現率が81.00%, 適合率92.57%という検出結果が得られた.一方,従来法の検出結果では,同一のビデオデータではないが,ゴール付近で起こるイベントシーンの再現率71.1%,適合率89.8%であり,提案方式のほうが従来法に比べ,再現率,適合率ともに上回り,とくに再現率の向上が顕著である.以上のことより,提案法の有効性が確認された.電気通信大学201

    Semantic Based Sport Video Browsing

    ISBIS 2016: Meeting on Statistics in Business and Industry

    This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647. The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by: David Banks, Duke University Amílcar Oliveira, DCeT - Universidade Aberta and CEAUL Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL Nalini Ravishankar, University of Connecticut Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH Martina Vandebroek, KU Leuven Vincenzo Esposito Vinzi, ESSEC Business Schoo

    Audio-visual football video analysis, from structure detection to attention analysis

    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Synchronization of passes in event and spatiotemporal soccer data

    The majority of soccer analysis studies investigates specific scenarios through the implementation of computational techniques, which involve the examination of either spatiotemporal position data (movement of players and the ball on the pitch) or event data (relating to significant situations during a match). Yet, only a few applications perform a joint analysis of both data sources despite the various involved advantages emerging from such an approach. One possible reason for this is a non-systematic error in the event data, causing a temporal misalignment of the two data sources. To address this problem, we propose a solution that combines the SwiftEvent online algorithm (Gensler and Sick in Pattern Anal Appl 21:543–562, 2018) with a subsequent refinement step that corrects pass timestamps by exploiting the statistical properties of passes in the position data. We evaluate our proposed algorithm on ground-truth pass labels of four top-flight soccer matches from the 2014/15 season. Results show that the percentage of passes within half a second to ground truth increases from 14 to 70%, while our algorithm also detects localization errors (noise) in the position data. A comparison with other models shows that our algorithm is superior to baseline models and comparable to a deep learning pass detection method (while requiring significantly less data). Hence, our proposed lightweight framework offers a viable solution that enables groups facing limited access to (recent) data sources to effectively synchronize passes in the event and position data

    Data-driven action-value functions for evaluating players in professional team sports

    As more and larger event stream datasets for professional sports become available, there is growing interest in modeling the complex play dynamics to evaluate player performance. Among these models, a common player evaluation method is assigning values to player actions. Traditional action-values metrics, however, consider very limited game context and player information. Furthermore, they provide directly related to goals (e.g., shots), not all actions. Recent work has shown that reinforcement learning provided powerful methods for addressing quantifying the value of player actions in sports. This dissertation develops deep reinforcement learning (DRL) methods for estimating action values in sports. We make several contributions to DRL for sports. First, we develop neural network architectures that learn an action-value Q-function from sports events logs to estimate each team\u27s expected success given the current match context. Specifically, our architecture models the game history with a recurrent network and predicts the probability that a team scores the next goal. From the learned Q-values, we derive a Goal Impact Metric (GIM) for evaluating a player\u27s performance over a game season. We show that the resulting player rankings are consistent with standard player metrics and temporally consistent within and across seasons. Second, we address the interpretability of the learned Q-values. While neural networks provided accurate estimates, the black-box structure prohibits understanding the influence of different game features on the action values. To interpret the Q-function and understand the influence of game features on action values, we design an interpretable mimic learning framework for the DRL. The framework is based on a Linear Model U-Tree (LMUT) as a transparent mimic model, which facilitates extracting the function rules and computing the feature importance for action values. Third, we incorporate information about specific players into the action values, by introducing a deep player representation framework. In this framework, each player is assigned a latent feature vector called an embedding, with the property that statistically similar players are mapped to nearby embeddings. To compute embeddings that summarize the statistical information about players, we implement a Variational Recurrent Ladder Agent Encoder (VaRLAE) to learn a contextualized representation for when and how players are likely to act. We learn and evaluate deep Q-functions from event data for both ice hockey and soccer. These are challenging continuous-flow games where game context and medium-term consequences are crucial for properly assessing the impact of a player\u27s actions