82 research outputs found
Learning macromanagement in starcraft from replays using deep learning
The real-time strategy game StarCraft has proven to be a challenging
environment for artificial intelligence techniques, and as a result, current
state-of-the-art solutions consist of numerous hand-crafted modules. In this
paper, we show how macromanagement decisions in StarCraft can be learned
directly from game replays using deep learning. Neural networks are trained on
789,571 state-action pairs extracted from 2,005 replays of highly skilled
players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting
the next build action. By integrating the trained network into UAlbertaBot, an
open source StarCraft bot, the system can significantly outperform the game's
built-in Terran bot, and play competitively against UAlbertaBot with a fixed
rush strategy. To our knowledge, this is the first time macromanagement tasks
are learned directly from replays in StarCraft. While the best hand-crafted
strategies are still the state-of-the-art, the deep network approach is able to
express a wide range of different strategies and thus improving the network's
performance further with deep reinforcement learning is an immediately
promising avenue for future research. Ultimately this approach could lead to
strong StarCraft bots that are less reliant on hard-coded strategies.Comment: 8 pages, to appear in the proceedings of the IEEE Conference on
Computational Intelligence and Games (CIG 2017
Audio-visual football video analysis, from structure detection to attention analysis
Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics.
For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification.
Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection.
An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains.
Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA
Integrated analysis of audiovisual signals and external information sources for event detection in team sports video
Ph.DDOCTOR OF PHILOSOPH
Event detection in soccer video based on audio/visual keywords
Master'sMASTER OF SCIENC
Audiovisual processing for sports-video summarisation technology
In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description ‘field-sports’. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable
Content-based video indexing for sports applications using integrated multi-modal approach
This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Much research in artificial intelligence is concerned with the development of
autonomous agents that can interact effectively with other agents. An important
aspect of such agents is the ability to reason about the behaviours of other
agents, by constructing models which make predictions about various properties
of interest (such as actions, goals, beliefs) of the modelled agents. A variety
of modelling approaches now exist which vary widely in their methodology and
underlying assumptions, catering to the needs of the different sub-communities
within which they were developed and reflecting the different practical uses
for which they are intended. The purpose of the present article is to provide a
comprehensive survey of the salient modelling methods which can be found in the
literature. The article concludes with a discussion of open problems which may
form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence
Journal. The arXiv version also contains a table of contents after the
abstract, but is otherwise identical to the AIJ version. Keywords: autonomous
agents, multiagent systems, modelling other agents, opponent modellin
Towards a crowdsourced solution for the authoring bottleneck in interactive narratives
Interactive Storytelling research has produced a wealth of technologies that can be
employed to create personalised narrative experiences, in which the audience takes
a participating rather than observing role. But so far this technology has not led
to the production of large scale playable interactive story experiences that realise
the ambitions of the field. One main reason for this state of affairs is the difficulty
of authoring interactive stories, a task that requires describing a huge amount of
story building blocks in a machine friendly fashion. This is not only technically
and conceptually more challenging than traditional narrative authoring but also a
scalability problem.
This thesis examines the authoring bottleneck through a case study and a literature
survey and advocates a solution based on crowdsourcing. Prior work has already
shown that combining a large number of example stories collected from crowd workers
with a system that merges these contributions into a single interactive story can be
an effective way to reduce the authorial burden. As a refinement of such an approach,
this thesis introduces the novel concept of Crowd Task Adaptation. It argues that in
order to maximise the usefulness of the collected stories, a system should dynamically
and intelligently analyse the corpus of collected stories and based on this analysis
modify the tasks handed out to crowd workers.
Two authoring systems, ENIGMA and CROSCAT, which show two radically different
approaches of using the Crowd Task Adaptation paradigm have been implemented and
are described in this thesis. While ENIGMA adapts tasks through a realtime dialog
between crowd workers and the system that is based on what has been learned from
previously collected stories, CROSCAT modifies the backstory given to crowd workers
in order to optimise the distribution of branching points in the tree structure that
combines all collected stories. Two experimental studies of crowdsourced authoring
are also presented. They lead to guidelines on how to employ crowdsourced authoring
effectively, but more importantly the results of one of the studies demonstrate the
effectiveness of the Crowd Task Adaptation approach
- …