439 research outputs found

    Multi-sensor human action recognition with particular application to tennis event-based indexing

    Get PDF
    The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency

    Feature based dynamic intra-video indexing

    Get PDF
    A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate

    Algorithms for the Analysis of Spatio-Temporal Data from Team Sports

    Get PDF
    Modern object tracking systems are able to simultaneously record trajectories—sequences of time-stamped location points—for large numbers of objects with high frequency and accuracy. The availability of trajectory datasets has resulted in a consequent demand for algorithms and tools to extract information from these data. In this thesis, we present several contributions intended to do this, and in particular, to extract information from trajectories tracking football (soccer) players during matches. Football player trajectories have particular properties that both facilitate and present challenges for the algorithmic approaches to information extraction. The key property that we look to exploit is that the movement of the players reveals information about their objectives through cooperative and adversarial coordinated behaviour, and this, in turn, reveals the tactics and strategies employed to achieve the objectives. While the approaches presented here naturally deal with the application-specific properties of football player trajectories, they also apply to other domains where objects are tracked, for example behavioural ecology, traffic and urban planning

    Automated Detection of Complex Tactical Patterns in Football—Using Machine Learning Techniques to Identify Tactical Behavior

    Get PDF
    Football tactics is a topic of public interest, where decisions are predominantly made based on gut instincts from domain-experts. Sport science literature often highlights the need for evidence-based research on football tactics, however the limited capabilities in modeling the dynamics of football has prevented researchers from gaining usable insights. Recent technological advances have made high quality football data more available and affordable. Particularly, positional data providing player and ball coordinates at every instance of a match can be combined with event data containing spatio-temporal information on any event taking place on the pitch (e.g. passes, shots, fouls). On the other hand, the application of machine learning methods to domain-specific problems yields a paradigm shift in many industries including sports. The need for more informed decisions as well as automating time consuming processes—accelerated by the availability of data—has motivated many scientific investigations in football analytics. This thesis is part of a research program combining methodologies from sports and data science to address the following problems: the synchronization of positional and event data, objectively quantifying offensive actions, as well as the detection of tactical patterns. Although various basic insights from the overall research program are integrated, this thesis focuses primarily on the latter one. Specifically, positional and event data are used to apply machine learning techniques to identify eight established tactical patterns in football: namely high-/mid-/low-block defending, build-up/attacking play in the offense, counterpressing and counterattacks during transitions, and patterns when defending corner-kicks, e.g. player-/zonal- or post-marking. For each pattern, we consolidate definitions with football experts and label large amounts of data manually using video recordings. The inter-labeler reliability is used to ensure that each pattern is well-defined. Unsupervised techniques are used for the purpose of exploration, and supervised machine learning methods based on expert-labeled data for the final detection. As an outlook, semi-supervised methods were used to reduce the labeling effort. This thesis proves that the detection of tactical patterns can optimize everyday processes in professional clubs, and leverage the domain of tactical analysis in sport science by gaining unseen insights. Additionally, we add value to the machine learning domain by evaluating recent methods in supervised and semi supervised machine learning on challenging, real-world problems

    Fast human behavior analysis for scene understanding

    Get PDF
    Human behavior analysis has become an active topic of great interest and relevance for a number of applications and areas of research. The research in recent years has been considerably driven by the growing level of criminal behavior in large urban areas and increase of terroristic actions. Also, accurate behavior studies have been applied to sports analysis systems and are emerging in healthcare. When compared to conventional action recognition used in security applications, human behavior analysis techniques designed for embedded applications should satisfy the following technical requirements: (1) Behavior analysis should provide scalable and robust results; (2) High-processing efficiency to achieve (near) real-time operation with low-cost hardware; (3) Extensibility for multiple-camera setup including 3-D modeling to facilitate human behavior understanding and description in various events. The key to our problem statement is that we intend to improve behavior analysis performance while preserving the efficiency of the designed techniques, to allow implementation in embedded environments. More specifically, we look into (1) fast multi-level algorithms incorporating specific domain knowledge, and (2) 3-D configuration techniques for overall enhanced performance. If possible, we explore the performance of the current behavior-analysis techniques for improving accuracy and scalability. To fulfill the above technical requirements and tackle the research problems, we propose a flexible behavior-analysis framework consisting of three processing-layers: (1) pixel-based processing (background modeling with pixel labeling), (2) object-based modeling (human detection, tracking and posture analysis), and (3) event-based analysis (semantic event understanding). In Chapter 3, we specifically contribute to the analysis of individual human behavior. A novel body representation is proposed for posture classification based on a silhouette feature. Only pure binary-shape information is used for posture classification without texture/color or any explicit body models. To this end, we have studied an efficient HV-PCA shape-based descriptor with temporal modeling, which achieves a posture-recognition accuracy rate of about 86% and outperforms other existing proposals. As our human motion scheme is efficient and achieves a fast performance (6-8 frames/second), it enables a fast surveillance system or further analysis of human behavior. In addition, a body-part detection approach is presented. The color and body ratio are combined to provide clues for human body detection and classification. The conventional assumption of up-right body posture is not required. Afterwards, we design and construct a specific framework for fast algorithms and apply them in two applications: tennis sports analysis and surveillance. Chapter 4 deals with tennis sports analysis and presents an automatic real-time system for multi-level analysis of tennis video sequences. First, we employ a 3-D camera model to bridge the pixel-level, object-level and scene-level of tennis sports analysis. Second, a weighted linear model combining the visual cues in the real-world domain is proposed to identify various events. The experimentally found event extraction rate of the system is about 90%. Also, audio signals are combined to enhance the scene analysis performance. The complete proposed application is efficient enough to obtain a real-time or near real-time performance (2-3 frames/second for 720×576 resolution, and 5-7 frames/second for 320×240 resolution, with a P-IV PC running at 3GHz). Chapter 5 addresses surveillance and presents a full real-time behavior-analysis framework, featuring layers at pixel, object, event and visualization level. More specifically, this framework captures the human motion, classifies its posture, infers the semantic event exploiting interaction modeling, and performs the 3-D scene reconstruction. We have introduced our system design based on a specific software architecture, by employing the well-known "4+1" view model. In addition, human behavior analysis algorithms are directly designed for real-time operation and embedded in an experimental runtime AV content-analysis architecture. This executable system is designed to be generic for multiple streaming applications with component-based architectures. To evaluate the performance, we have applied this networked system in a single-camera setup. The experimental platform operates with two Pentium Quadcore engines (2.33 GHz) and 4-GB memory. Performance evaluations have shown that this networked framework is efficient and achieves a fast performance (13-15 frames/second) for monocular video sequences. Moreover, a dual-camera setup is tested within the behavior-analysis framework. After automatic camera calibration is conducted, the 3-D reconstruction and communication among different cameras are achieved. The extra view in the multi-camera setup improves the human tracking and event detection in case of occlusion. This extension of multiple-view fusion improves the event-based semantic analysis by 8.3-16.7% in accuracy rate. The detailed studies of two experimental intelligent applications, i.e., tennis sports analysis and surveillance, have proven their value in several extensive tests in the framework of the European Candela and Cantata ITEA research programs, where our proposed system has demonstrated competitive performance with respect to accuracy and efficiency

    Mimicking human player strategies in fighting games using game artificial intelligence techniques

    Get PDF
    Fighting videogames (also known as fighting games) are ever growing in popularity and accessibility. The isolated console experiences of 20th century gaming has been replaced by online gaming services that allow gamers to play from almost anywhere in the world with one another. This gives rise to competitive gaming on a global scale enabling them to experience fresh play styles and challenges by playing someone new. Fighting games can typically be played either as a single player experience, or against another human player, whether it is via a network or a traditional multiplayer experience. However, there are two issues with these approaches. First, the single player offering in many fighting games is regarded as being simplistic in design, making the moves by the computer predictable. Secondly, while playing against other human players can be more varied and challenging, this may not always be achievable due to the logistics involved in setting up such a bout. Game Artificial Intelligence could provide a solution to both of these issues, allowing a human player s strategy to be learned and then mimicked by the AI fighter. In this thesis, game AI techniques have been researched to provide a means of mimicking human player strategies in strategic fighting games with multiple parameters. Various techniques and their current usages are surveyed, informing the design of two separate solutions to this problem. The first solution relies solely on leveraging k nearest neighbour classification to identify which move should be executed based on the in-game parameters, resulting in decisions being made at the operational level and being fed from the bottom-up to the strategic level. The second solution utilises a number of existing Artificial Intelligence techniques, including data driven finite state machines, hierarchical clustering and k nearest neighbour classification, in an architecture that makes decisions at the strategic level and feeds them from the top-down to the operational level, resulting in the execution of moves. This design is underpinned by a novel algorithm to aid the mimicking process, which is used to identify patterns and strategies within data collated during bouts between two human players. Both solutions are evaluated quantitatively and qualitatively. A conclusion summarising the findings, as well as future work, is provided. The conclusions highlight the fact that both solutions are proficient in mimicking human strategies, but each has its own strengths depending on the type of strategy played out by the human. More structured, methodical strategies are better mimicked by the data driven finite state machine hybrid architecture, whereas the k nearest neighbour approach is better suited to tactical approaches, or even random button bashing that does not always conform to a pre-defined strategy

    Multi-Sensory Deep Learning Architectures for Slam Dunk Scene Classification

    Get PDF
    Basketball teams at all levels of the game invest a considerable amount of time and effort into collecting, segmenting, and analysing footage from their upcoming opponents previous games. This analysis helps teams identify and exploit the potential weaknesses of their opponents and is commonly cited as one of the key elements required to achieve success in the modern game. The growing importance of this type of analysis has prompted research into the application of computer vision and audio classification techniques to help teams classify scoring sequences and key events using game footage. However, this research tends to focus on classifying scenes based on information from a single sensory source (visual or audio), and fails to analyse the wealth of multi-sensory information available within the footage. This dissertation aims to demonstrate that by analysing the full range of audio and visual features contained in broadcast game footage through a multi-sensory deep learning architecture one can create a more effective key scene classification system when compared to a single sense model. Additionally, this dissertation explores the performance impact of training the audio component of a multi-sensory architecture using different representations of the audio features
    corecore