Search CORE

2,225 research outputs found

Advancing the Applicability of Reinforcement Learning to Autonomous Control

Author: Hans Alexander
Publication venue
Publication date: 13/10/2014
Field of study

Mit dateneffizientem Reinforcement Learning (RL) konnten beeindruckendeErgebnisse erzielt werden, z.B. für die Regelung von Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel manuelle Arbeit, wasbisher RL für die autonome Regelung untauglich erscheinen ließ. Dievorliegende Arbeit adressiert einige der verbleibenden Probleme, insbesonderein Bezug auf die Zuverlässigkeit der Policy-Erstellung. Es werden zunächst RL-Probleme mit diskreten Zustands- und Aktionsräumenbetrachtet. Für solche Probleme wird häufig ein MDP aus Beobachtungengeschätzt, um dann auf Basis dieser MDP-Schätzung eine Policy abzuleiten. DieArbeit beschreibt, wie die Schätzer-Unsicherheit des MDP in diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-Schätzung zuverringern. Außerdem wird so effiziente Exploration sowie Policy-Bewertungermöglicht. Anschließend wendet sich die Arbeit Problemen mit kontinuierlichenZustandsräumen zu und konzentriert sich auf auf RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch nicht so zuverlässig, wie fürdie autonome Regelung nötig wäre. Die Arbeit schlägt die Verwendung vonEnsembles vor, um die Zuverlässigkeit von NFQ zu erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen Ensembles für eine zuverlässigere Erstellungguter Policies. Im nächsten Schritt werden Möglichkeiten der Policy-Bewertung beikontinuierlichen Zustandsräumen besprochen. Die Arbeit schlägt vor, FittedPolicy Evaluation (FPE), eine Variante von FQI für Policy Evaluation, mitanderen Regressionsverfahren und/oder anderen Datensätzen zu kombinieren, umein Maß für die Policy-Qualität zu erhalten. Experimente zeigen, dassExtra-Tree-FPE ein realistisches Qualitätsmaß für NFQ-generierte Policies liefernkann. Schließlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit sichändernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das EvolvingEnsemble, dessen Policy sich langsam ändert, indem alte, untaugliche Policiesentfernt und neue hinzugefügt werden. Es zeigt sich, dass das EvolvingEnsemble deutlich besser funktioniert als einfachere Ansätze.With data-efficient reinforcement learning (RL) methods impressive resultscould be achieved, e.g., in the context of gas turbine control. However, inpractice the application of RL still requires much human intervention, whichhinders the application of RL to autonomous control. This thesis addressessome of the remaining problems, particularly regarding the reliability of thepolicy generation process. The thesis first discusses RL problems with discrete state and action spaces.In that context, often an MDP is estimated from observations. It is describedhow to incorporate the estimators' uncertainties into the policy generationprocess. This information can then be used to reduce the risk of obtaining apoor policy due to flawed MDP estimates. Moreover, it is discussed how to usethe knowledge of uncertainty for efficient exploration and the assessment ofpolicy quality without requiring the policy's execution. The thesis then moves on to continuous state problems and focuses on methodsbased on fitted Q-iteration (FQI), particularly neural fitted Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is not asreliable as required for autonomous control. The thesis proposes to useensembles to increase reliability. Several ways of ensemble usage in an NFQcontext are discussed and evaluated on a number of benchmark domains. It showsthat in all considered domains with ensembles good policies can be producedmore reliably. Next, policy assessment in continuous domains is discussed. The thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI to policyevaluation, combined with a different function approximator and/or differentdataset to obtain a measure for policy quality. Results of experiments showthat extra-tree FPE, applied to policies generated by NFQ, produces valuefunctions that can well be used to reason about the true policy quality. Finally, the thesis combines ensembles and policy assessment to derive methodsthat can deal with changing environments. The major contribution is theevolving ensemble. The policy of the evolving ensemble changes slowly as newpolicies are added and old policies removed. It turns out that the evolvingensemble approaches work considerably better than simpler approaches likesingle policies learned with recent observations or simple ensembles

Digitale Bibliothek Thüringen

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Intelligent flight control systems

Author: Stengel Robert F.
Publication venue
Publication date
Field of study

The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms

NASA Technical Reports Server

Recent Advances in General Game Playing

Author: HyunSoo Park
Jacek Mańdziuk
Kyung-Joong Kim
Maciej Świechowski
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing

Crossref

Directory of Open Access Journals

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Author: Canaan Rodrigo
Gao Xianbo
Menzel Stefan
Nealen Andy
Togelius Julian
Publication venue
Publication date: 29/04/2020
Field of study

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.Comment: arXiv admin note: text overlap with arXiv:1907.0384

arXiv.org e-Print Archive

Neuro_Dynamic Programming and Reinforcement Learning for Optimal Energy Management of a Series Hydraulic Hybrid Vehicle Considering Engine Transient Emissions.

Author: Johri Rajit
Publication venue
Publication date
Field of study

Sequential decision problems under uncertainty are encountered in various fields such as optimal control and operations research. In this dissertation, Neuro-Dynamic Programming (NDP) and Reinforcement Learning (RL) are applied to address policy optimization problems with multiple objectives and large design state space. Dynamic Programming (DP) is well suited for determining an optimal solution for constrained nonlinear model based systems. However, DP suffers from curse of dimensionality i.e. computational effort grows exponentially with state space. The new algorithms address this problem and enable practical application of DP to a much broader range of problems. The other contribution is to design fast and computationally efficient transient emission models. The power management problem for a hybrid vehicle can be formulated as an infinite time horizon stochastic sequential decision-making problem. In the past, policy optimization has been applied successfully to design optimal supervisory controller for best fuel economy. Static emissions have been considered too but engine research has shown that transient operation can have significant impact on real-world emissions. Modeling transient emissions results in addition of more states. Therefore, the problem with multiple objectives i.e. minimize fuel consumption and transient particulate and NOX emissions, becomes computationally intractable by DP. This research captures the insight with models and brings it into the supervisory controller design. A self-learning supervisory controller is designed based on the principles of NDP and RL. The controller starts “naïve” i.e. with no knowledge to control the onboard power but learns to do so in an optimal manner after interacting with the system. The controller tries to minimize multiple objectives and continues to evolve until a global solution is achieved. Virtual sensors for predicting real-time transient particulate and NOX emissions are developed using neuro-fuzzy modeling technique, which utilizes a divide-and-conquer strategy. The highly nonlinear engine operating space is partitioned into smaller subspaces and a separate local model is trained to for each subspace. Finally, the supervisory controller along with virtual emission sensors is implemented and evaluated using the Engine-In-the-Loop (EIL) setup. EIL is a unique facility to systematically evaluate control methodologies through concurrent running of real engine and a virtual hybrid powertrain.Ph.D.Mechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89829/1/rajit_1.pd

Deep Blue Documents at the University of Michigan

Towards a Unified Approach to Learning and Adaptation

Author: Kassahun Yohannes
Publication venue
Publication date: 01/01/2006
Field of study

The aim of this thesis is to develop a system that enables autonomous and situated agents to learn and adapt to the environment in which they live and operate. In doing so, the system exploits both adaptation through learning and evolution. A unified approach to learning and adaptation, which combines the principles of neural networks, reinforcement learning and evolutionary methods, is used as a basis for the development of the system. In this regard, a novel method, called Evolutionary Acquisition of Neural Topologies (EANT), of evolving the structures and weights of neural networks is developed. The method introduces an efficient and compact genetic encoding of a neural network onto a linear genome that encodes the topology of the neural network implicitly in the ordering of the elements of the linear genome. Moreover, it enables one to evaluate the neural network without decoding it. The presented genetic encoding is complete in that it can represent any type of neural network. In addition to this, it is closed under both structural mutation and a specially designed crossover operator which exploits the fact that structures originating from some initial structure have some common parts. For evolving the structure and weights of neural networks, the method uses a biologically inspired meta-level evolutionary process where new structures are explored at larger timescale and existing structures are exploited at smaller timescale. The evolutionary process starts with networks of minimal structures whose initial complexity is specified by the domain expert. The introduction of neural structures by structural mutation results in a gradual increase in the complexity of the neural networks along the evolution. The evolutionary process stops searching for the solution when a solution with the necessary minimum complexity is found. This enables EANT to find optimal neural structures for solving a given learning task. The efficiency of EANT is tested on couple of learning tasks and its performance is found to be very good in comparison to other systems tested on the same tasks

MACAU: Open Access Repository of Kiel University