Search CORE

46 research outputs found

Neuroevolution in Games: State of the Art and Open Challenges

Author: Risi Sebastian
Togelius Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper surveys research on applying neuroevolution (NE) to games. In neuroevolution, artificial neural networks are trained through evolutionary algorithms, taking inspiration from the way biological brains evolved. We analyse the application of NE in games along five different axes, which are the role NE is chosen to play in a game, the different types of neural networks used, the way these networks are evolved, how the fitness is determined and what type of input the network receives. The article also highlights important open research challenges in the field.Comment: - Added more references - Corrected typos - Added an overview table (Table 1

arXiv.org e-Print Archive

CiteSeerX

Crossref

The IT University of Copenhagen's Repository

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Crossref

Strategies for Evolving Diverse and Effective Behaviours in Pursuit Domains

Author: Cowan Tyler James
Publication venue: 'Brock University Library'
Publication date: 25/11/2021
Field of study

Evolutionary algorithms have a tendency to overuse and exploit particular behaviours in their search for optimality, even across separate runs. The resulting set of monotonous solutions caused by this tendency is a problem in many applications. This research explores different strategies designed to encourage an interesting set of diverse behaviours while still maintaining an appreciable level of efficacy. Embodied agents are situated within an open plane and play against each other in various pursuit game scenarios. The pursuit games consist of a single predator agent and twenty prey agents, with the goal always requiring the predator to catch as many prey as possible before the time limit is reached. The predator's controller is evolved through genetic programming while the preys' controllers are hand-crafted. The fitness of a solution is first calculated in a traditional manner. Inspired by Lehman and Stanley's novelty search strategy, the fitness is then combined with the diversity of the solution to produce the final fitness score. The original fitness score is determined by the number of captured prey, and the diversity score is determined through the combination of four behaviour measurements. Among many promising results, a particular diversity-based evaluation strategy and weighting combination was found to provide solutions that exhibit an excellent balance between diversity and efficacy. The results were analyzed quantitatively and qualitatively, showing the emergence of diverse and effective behaviours

Brock University Digital Repository

Recommended from our members

Curriculum learning in reinforcement learning

Author: Narvekar Sanmit Santosh
Publication venue
Publication date: 21/07/2021
Field of study

In recent years, reinforcement learning (RL) has been increasingly successful at solving complex tasks. Despite these successes, one of the fundamental challenges is that many RL methods require large amounts of experience, and thus can be slow to train in practice. Transfer learning is a recent area of research that has been shown to speed up learning on a complex task by transferring knowledge from one or more easier source tasks. Most existing transfer learning methods treat this transfer of knowledge as a one-step process, where knowledge from all the sources are directly transferred to the target. However, for complex tasks, it may be more beneficial (and even necessary) to gradually acquire skills over multiple tasks in sequence, where each subsequent task requires and builds upon knowledge gained in a previous task. This idea is pervasive throughout human learning, where people learn complex skills gradually by training via a curriculum. The goal of this thesis is to explore whether autonomous reinforcement learning agents can also benefit by training via a curriculum, and whether such curricula can be designed fully autonomously. In order to answer these questions, this thesis first formalizes the concept of a curriculum, and the methodology of curriculum learning in reinforcement learning. Curriculum learning consists of 3 main elements: 1) task generation, which creates a suitable set of source tasks; 2) sequencing, which focuses on how to order these tasks into a curriculum; and 3) transfer learning, which considers how to transfer knowledge between tasks in the curriculum. This thesis introduces several methods to both create suitable source tasks and automatically sequence them into a curriculum. We show that these methods produce curricula that are tailored to the individual sensing and action capabilities of different agents, and show how the curricula learned can be adapted for new, but related target tasks. Together, these methods form the components of an autonomous curriculum design agent, that can suggest a training curriculum customized to both the unique abilities of each agent and the task in question. We expect this research on the curriculum learning approach will increase the applicability and scalability of RL methods by providing a faster way of training reinforcement learning agents, compared to learning tabula rasa.Computer Science

Texas ScholarWorks

Artiﬁcial intelligence in co-operative games with partial observability

Author: Williams Piers
Publication venue
Publication date: 06/02/2019
Field of study

This thesis investigates Artificial Intelligence in co-operative games that feature Partial Observability. Most video games feature a combination of both co-operation, as well as Partial Observability. Co-operative games are games that feature a team of at least two agents, that must achieve a shared goal of some kind. Partial Observability is the restriction of how much of an environment that an agent can observe. The research performed in this thesis examines the challenge of creating Artificial Intelligence for co-operative games that feature Partial Observability. The main contributions are that Monte-Carlo Tree Search outperforms Genetic Algorithm based agents in solving co-operative problems without communication, the creation of a co-operative Partial Observability competition promoting Artificial Intelligence research as well as an investigation of the effect of varying Partial Observability to Artificial Intelligence, and finally the creation of a high performing Monte-Carlo Tree Search agent for the game Hanabi that uses agent modelling to rationalise about other players

University of Essex Research Repository

MAPiS 2019 - First MAP-i Seminar: proceedings

Author: Duarte Fernando
Muhammad Shamsuddeen
Rua Rui
Silva Vanessa
Publication venue: UA Editora
Publication date: 01/01/2019
Field of study

This book contains a selection of Informatics papers accepted for presentation and discussion at “MAPiS 2019 - First MAP-i Seminar”, held in Aveiro, Portugal, January 31, 2019. MAPiS is the first conference organized by the MAP-i first year students, in the context of the Seminar course. The MAP-i Doctoral Programme in Computer Science is a joint Doctoral Programme in Computer Science of the University of Minho, the University of Aveiro and the University of Porto. This programme aims to form highly-qualified professionals, fostering their capacity and knowledge to the research area. This Conference was organized by the first grade students attending the Seminar Course. The aim of the course was to introduce concepts which are complementary to scientific and technological education, but fundamental to both completing a PhD successfully and entailing a career on scientific research. The students had contact with the typical procedures and difficulties of organizing and participate in such a complex event. These students were in charge of the organization and management of all the aspects of the event, such as the accommodation of participants or revision of the papers. The works presented in the Conference and the papers submitted were also developed by these students, fomenting their enthusiasm regarding the investigation in the Informatics area. (...)publishe

Repositório Institucional da Universidade de Aveiro

A Systematic Survey of Control Techniques and Applications: From Autonomous Vehicles to Connected and Automated Vehicles

Author: Deng Zhiyun
Gao Letian
Hu Chuan
Hua Min
Huang Yanjun
Liu Changsheng
Liu Wei
Song Shunhui
Xia Xin
Xiong Lu
Publication venue
Publication date: 09/03/2023
Field of study

Vehicle control is one of the most critical challenges in autonomous vehicles (AVs) and connected and automated vehicles (CAVs), and it is paramount in vehicle safety, passenger comfort, transportation efficiency, and energy saving. This survey attempts to provide a comprehensive and thorough overview of the current state of vehicle control technology, focusing on the evolution from vehicle state estimation and trajectory tracking control in AVs at the microscopic level to collaborative control in CAVs at the macroscopic level. First, this review starts with vehicle key state estimation, specifically vehicle sideslip angle, which is the most pivotal state for vehicle trajectory control, to discuss representative approaches. Then, we present symbolic vehicle trajectory tracking control approaches for AVs. On top of that, we further review the collaborative control frameworks for CAVs and corresponding applications. Finally, this survey concludes with a discussion of future research directions and the challenges. This survey aims to provide a contextualized and in-depth look at state of the art in vehicle control for AVs and CAVs, identifying critical areas of focus and pointing out the potential areas for further exploration

arXiv.org e-Print Archive

Stabilising liberal societies in a world of radical innovation: committed actors, adaptive rules, and the origins of social order

Author: Finighan Reuben
Publication venue
Publication date: 01/07/2023
Field of study

Long-standing questions about social order, and about liberal democratic capitalist orders in particular, remain unsettled. They are of renewed importance in our age of crisis and democratic backsliding. Adam Smith addressed two such questions at the founding of political economy: First, what are the forces that sustain all societies, and liberal societies in particular? Second, what combination of market and state makes such societies prosperous and powerful? A third question, addressed by Hayek, Polanyi, and Keynes in their own period of crisis and backsliding, pertains to interactions between the two: how does the combination of market and state affect the stability of liberal democracy? If we are to answer these questions, I argue we need a realistic theory of innovation. Real-world innovation is Schumpeterian: it is uncertain and often radical, so the future may unexpectedly break with the past. Real-world innovation is Baumolian: it is socially ambiguous, and may be productive or extractive. Consequently, the innovations of political and economic entrepreneurs bring the rise, but also the fall, of societies. Given the last two decades, we may be more open to the idea that Fukuyama’s “End of History” never arrives. Our task is to stabilise and optimise cooperation in both politics and the market. “Cooperation” is defined as the alignment of private returns with social returns; it is exemplified by Smith’s “invisible hand”, and is the precondition for growth. The usual formal methods for identifying cooperative equilibria fail in a world of Schumpeterian and Baumolian innovation. Beyond the short-run, there are no lasting Nash equilibria. Game forms are destroyed and remade. The institutional forces that we hope will restore cooperative equilibria are themselves subject to innovative attack. How, in this unstable world, is it possible to sustain cooperation over long periods of time? And how can we model and predict cooperation? This thesis adopts an analytic strategy that makes this problem tractable. I borrow concepts and formal models from evolutionary sociobiology, a field that deals with cooperation under radical and ambiguous innovation. As in Acemoglu and Robinson’s Narrow Corridor, the core concept is the adversarial innovation race (the “Red Queen’s race”). Most important in this thesis is the race between 4 innovating cooperators and defectors. Social order becomes the probabilistic outcome of a dynamic process—of whether cooperator or defector innovations are superior in a given period. Under the right circumstances, outcomes are predictable. All complex social orders, anthropic and biological, combine “commitment” and “rules” (which, in the definitions of this thesis, includes institutions) into a self-sustaining system. Commitments are essential. They are motives that are exogenous to the innovation race; while all else changes, they continue to draw the system towards a cooperative equilibrium. They come in two forms: one is an intrinsic interest in others’ payoffs, and one is an extrinsic dependence on others’ payoffs. However, commitments are impotent, and indeed are destroyed, if there are no rules or institutions that can control defectors—or if committed actors fail to invest sufficiently in adapting rules so that they keep up in the race against defectors. In short, social order depends on (A) commitments (i.e. motives to run the race that are innovation-proof) that (B) are channelled into the adaptation of rules, to run the race against defectors. Accordingly, the outcomes of innovation races are predictable under two circumstances: when (A) there is no source of commitment to group payoffs, or (B) when committed actors perversely disinvest from running the race, so play the “sleeping Hare” of Aesop’s fable. In either case, loss of the race and collapse of cooperation is guaranteed. On the first question raised by Smith, I present an impossibility theorem for any society built from rules—from institutions, incentives, and so on—alone. Both liberal and authoritarian orders rest on commitment. Smith’s Theory of Moral Sentiments is supported: the “very existence” of liberal orders rests on other-regarding preferences (which, I show, is a product of trust). It is the only innovation-proof force available to them. Authoritarian orders can be explained via the ruler’s extrinsic commitments alone, though other-regarding preferences sometimes play an important role. On the second question, every regime of economic regulation is within the innovation race and vulnerable to unanticipated counter-innovations. I show that every regulatory regime can be described as a particular “division of regulatory labour” between institutional actors and market actors. Institutional actors and market actors are essential complements, with distinct comparative advantages. A 5 principal task for the institutional regulator is to structurally simplify complex markets; otherwise, those defectors that have advantages in the innovation race (of which there are many) will predictably exploit both regulator and market actor. Central planners and Hayekian liberals (and libertarians) endorse extreme divisions of labour between regulator and market actor. They are mirror images and fail in predictable ways. Central planners refuse to use market actors, so allocate hyper-complex (and impossible) regulatory tasks to the state. This produces broad inefficiencies and blocks productive innovation. Hayekian liberals refuse to adapt institutions, so allocate hyper-complex (and impossible) tasks to market actors. This produces crises specifically in complex markets—finance, healthcare, insurance, education, and so on—and soaring rents. Its end point is anarchy. Hayekian liberals suppose advance knowledge of the consequences of basic market institutions. But the unforeseeability of innovation, and distributed nature of knowledge, are double-edged swords: markets produce both productive and extractive innovations that the theorist cannot foresee. To block institutional adaptation is to play the sleeping Hare, and guarantees loss of the innovation race. On the third question, central planning and Hayek’s classical liberalism ultimately lead to authoritarianism. In the case of central planning, Hayek’s argument is supported: to attempt the impossible tasks allocated to it, the state must concentrate power, and voters cannot win the political innovation race to control such a state. In the case of Hayekian liberalism, the state cannot run the market innovation race. Market anarchy and crisis erode the commitments on which liberal orders depend, fuelling distrust and parochiality. As Smith observes, “faction” and “fanaticism” are the greatest threats to the liberal order. To use Hayek’s terms, central planning and his own classical liberalism are “fatal conceits”: they suppose access to distributed and future knowledge that no one possesses. They are both “roads to serfdom”: one via excessive control, the other via anarchy. I describe the “middle of the road”, where commitments are channelled into the adaptive, mixed economic strategy advocated by Keynes. As after the Great Depression, this in turn can create economic outcomes that sustain other-regarding commitments. There, the liberal order can make its home

LSE Theses Online