1,646 research outputs found

    Automatic annotation of tennis games: An integration of audio, vision, and learning

    Get PDF
    Fully automatic annotation of tennis game using broadcast video is a task with a great potential but with enormous challenges. In this paper we describe our approach to this task, which integrates computer vision, machine listening, and machine learning. At the low level processing, we improve upon our previously proposed state-of-the-art tennis ball tracking algorithm and employ audio signal processing techniques to detect key events and construct features for classifying the events. At high level analysis, we model event classification as a sequence labelling problem, and investigate four machine learning techniques using simulated event sequences. Finally, we evaluate our proposed approach on three real world tennis games, and discuss the interplay between audio, vision and learning. To the best of our knowledge, our system is the only one that can annotate tennis game at such a detailed level

    Towards Structured Analysis of Broadcast Badminton Videos

    Full text link
    Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics. It can only be viewed sequentially or manually tagged with higher-level labels which is time consuming and prone to errors. In this work, we propose an end-to-end framework for automatic attributes tagging and analysis of sport videos. We use commonly available broadcast videos of matches and, unlike previous approaches, does not rely on special camera setups or additional sensors. Our focus is on Badminton as the sport of interest. We propose a method to analyze a large corpus of badminton broadcast videos by segmenting the points played, tracking and recognizing the players in each point and annotating their respective badminton strokes. We evaluate the performance on 10 Olympic matches with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player detection score ([email protected]), 97.98% player identification accuracy, and stroke segmentation edit scores of 80.48%. We further show that the automatically annotated videos alone could enable the gameplay analysis and inference by computing understandable metrics such as player's reaction time, speed, and footwork around the court, etc.Comment: 9 page

    Automatic camera selection for activity monitoring in a multi-camera system for tennis

    Get PDF
    In professional tennis training matches, the coach needs to be able to view play from the most appropriate angle in order to monitor players' activities. In this paper, we describe and evaluate a system for automatic camera selection from a network of synchronised cameras within a tennis sporting arena. This work combines synchronised video streams from multiple cameras into a single summary video suitable for critical review by both tennis players and coaches. Using an overhead camera view, our system automatically determines the 2D tennis-court calibration resulting in a mapping that relates a player's position in the overhead camera to their position and size in another camera view in the network. This allows the system to determine the appearance of a player in each of the other cameras and thereby choose the best view for each player via a novel technique. The video summaries are evaluated in end-user studies and shown to provide an efficient means of multi-stream visualisation for tennis player activity monitoring

    TennisSense: a platform for extracting semantic information from multi-camera tennis data

    Get PDF
    In this paper, we introduce TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches. Our algorithms for extracting useful metadata from the overhead court camera are described and evaluated. We track the tennis ball using motion images for ball candidate detection and then link ball candidates into locally linear tracks. From these tracks we can infer when serves and rallies take place. Using background subtraction and hysteresis-type blob tracking, we track the tennis players positions. The performance of both modules is evaluated using ground-truthed data. The extracted metadata provides valuable information for indexing and efficient browsing of hours of multi-camera tennis footage and we briefly illustrative how this data is used by our tennis-coach playback interface

    Event detection in field sports video using audio-visual features and a support vector machine

    Get PDF
    In this paper, we propose a novel audio-visual feature-based framework for event detection in broadcast video of multiple different field sports. Features indicating significant events are selected and robust detectors built. These features are rooted in characteristics common to all genres of field sports. The evidence gathered by the feature detectors is combined by means of a support vector machine, which infers the occurrence of an event based on a model generated during a training phase. The system is tested generically across multiple genres of field sports including soccer, rugby, hockey, and Gaelic football and the results suggest that high event retrieval and content rejection statistics are achievable

    Toward next generation coaching tools for court based racquet sports

    Get PDF
    Even with today’s advances in automatic indexing of multimedia content, existing coaching tools for court sports lack the ability to automatically index a competitive match into key events. This paper proposes an automatic event indexing and event retrieval system for tennis, which can be used to coach from beginners upwards. Event indexing is possible using either visual or inertial sensing, with the latter potentially providing system portability. To achieve maximum performance in event indexing, multi-sensor data integration is implemented, where data from both sensors is merged to automatically index key tennis events. A complete event retrieval system is also presented to allow coaches to build advanced queries which existing sports coaching solutions cannot facilitate without an inordinate amount of manual indexing

    A novel Markov logic rule induction strategy for characterizing sports video footage

    Get PDF
    The grounding of high-level semantic concepts is a key requirement of video annotation systems. Rule induction can thus constitute an invaluable intermediate step in characterizing protocol-governed domains, such as broadcast sports footage. We here set out a novel “clause grammar template” approach to the problem of rule-induction in video footage of court games that employs a second-order meta grammar for Markov Logic Network construction. The aim is to build an adaptive system for sports video annotation capable, in principle, both of learning ab initio and also adaptively transferring learning between distinct rule domains. The method is tested with respect to both a simulated game predicate generator and also real data derived from tennis footage via computer-vision based approaches including HOG3D based player-action classification, Hough-transform based court detection, and graph-theoretic ball-tracking. Experiments demonstrate that the method exhibits both error resilience and learning transfer in the court domain context. Moreover the clause template approach naturally generalizes to any suitably-constrained, protocol-governed video domain characterized by feature noise or detector error

    Combining inertial and visual sensing for human action recognition in tennis

    Get PDF
    In this paper, we present a framework for both the automatic extraction of the temporal location of tennis strokes within a match and the subsequent classification of these as being either a serve, forehand or backhand. We employ the use of low-cost visual sensing and low-cost inertial sensing to achieve these aims, whereby a single modality can be used or a fusion of both classification strategies can be adopted if both modalities are available within a given capture scenario. This flexibility allows the framework to be applicable to a variety of user scenarios and hardware infrastructures. Our proposed approach is quantitatively evaluated using data captured from elite tennis players. Results point to the extremely accurate performance of the proposed approach irrespective of input modality configuration
    corecore