2,225 research outputs found

    Advancing the Applicability of Reinforcement Learning to Autonomous Control

    Get PDF
    ï»żMit dateneffizientem Reinforcement Learning (RL) konnten beeindruckendeErgebnisse erzielt werden, z.B. fĂŒr die Regelung von Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel manuelle Arbeit, wasbisher RL fĂŒr die autonome Regelung untauglich erscheinen ließ. Dievorliegende Arbeit adressiert einige der verbleibenden Probleme, insbesonderein Bezug auf die ZuverlĂ€ssigkeit der Policy-Erstellung. Es werden zunĂ€chst RL-Probleme mit diskreten Zustands- und AktionsrĂ€umenbetrachtet. FĂŒr solche Probleme wird hĂ€ufig ein MDP aus BeobachtungengeschĂ€tzt, um dann auf Basis dieser MDP-SchĂ€tzung eine Policy abzuleiten. DieArbeit beschreibt, wie die SchĂ€tzer-Unsicherheit des MDP in diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-SchĂ€tzung zuverringern. Außerdem wird so effiziente Exploration sowie Policy-Bewertungermöglicht. Anschließend wendet sich die Arbeit Problemen mit kontinuierlichenZustandsrĂ€umen zu und konzentriert sich auf auf RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch nicht so zuverlĂ€ssig, wie fĂŒrdie autonome Regelung nötig wĂ€re. Die Arbeit schlĂ€gt die Verwendung vonEnsembles vor, um die ZuverlĂ€ssigkeit von NFQ zu erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen Ensembles fĂŒr eine zuverlĂ€ssigere Erstellungguter Policies. Im nĂ€chsten Schritt werden Möglichkeiten der Policy-Bewertung beikontinuierlichen ZustandsrĂ€umen besprochen. Die Arbeit schlĂ€gt vor, FittedPolicy Evaluation (FPE), eine Variante von FQI fĂŒr Policy Evaluation, mitanderen Regressionsverfahren und/oder anderen DatensĂ€tzen zu kombinieren, umein Maß fĂŒr die Policy-QualitĂ€t zu erhalten. Experimente zeigen, dassExtra-Tree-FPE ein realistisches QualitĂ€tsmaß fĂŒr NFQ-generierte Policies liefernkann. Schließlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit sichĂ€ndernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das EvolvingEnsemble, dessen Policy sich langsam Ă€ndert, indem alte, untaugliche Policiesentfernt und neue hinzugefĂŒgt werden. Es zeigt sich, dass das EvolvingEnsemble deutlich besser funktioniert als einfachere AnsĂ€tze.With data-efficient reinforcement learning (RL) methods impressive resultscould be achieved, e.g., in the context of gas turbine control. However, inpractice the application of RL still requires much human intervention, whichhinders the application of RL to autonomous control. This thesis addressessome of the remaining problems, particularly regarding the reliability of thepolicy generation process. The thesis first discusses RL problems with discrete state and action spaces.In that context, often an MDP is estimated from observations. It is describedhow to incorporate the estimators' uncertainties into the policy generationprocess. This information can then be used to reduce the risk of obtaining apoor policy due to flawed MDP estimates. Moreover, it is discussed how to usethe knowledge of uncertainty for efficient exploration and the assessment ofpolicy quality without requiring the policy's execution. The thesis then moves on to continuous state problems and focuses on methodsbased on fitted Q-iteration (FQI), particularly neural fitted Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is not asreliable as required for autonomous control. The thesis proposes to useensembles to increase reliability. Several ways of ensemble usage in an NFQcontext are discussed and evaluated on a number of benchmark domains. It showsthat in all considered domains with ensembles good policies can be producedmore reliably. Next, policy assessment in continuous domains is discussed. The thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI to policyevaluation, combined with a different function approximator and/or differentdataset to obtain a measure for policy quality. Results of experiments showthat extra-tree FPE, applied to policies generated by NFQ, produces valuefunctions that can well be used to reason about the true policy quality. Finally, the thesis combines ensembles and policy assessment to derive methodsthat can deal with changing environments. The major contribution is theevolving ensemble. The policy of the evolving ensemble changes slowly as newpolicies are added and old policies removed. It turns out that the evolvingensemble approaches work considerably better than simpler approaches likesingle policies learned with recent observations or simple ensembles

    Intelligent flight control systems

    Get PDF
    The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms

    Recent Advances in General Game Playing

    Get PDF
    The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing

    Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

    Full text link
    Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.Comment: arXiv admin note: text overlap with arXiv:1907.0384

    Neuro_Dynamic Programming and Reinforcement Learning for Optimal Energy Management of a Series Hydraulic Hybrid Vehicle Considering Engine Transient Emissions.

    Full text link
    Sequential decision problems under uncertainty are encountered in various fields such as optimal control and operations research. In this dissertation, Neuro-Dynamic Programming (NDP) and Reinforcement Learning (RL) are applied to address policy optimization problems with multiple objectives and large design state space. Dynamic Programming (DP) is well suited for determining an optimal solution for constrained nonlinear model based systems. However, DP suffers from curse of dimensionality i.e. computational effort grows exponentially with state space. The new algorithms address this problem and enable practical application of DP to a much broader range of problems. The other contribution is to design fast and computationally efficient transient emission models. The power management problem for a hybrid vehicle can be formulated as an infinite time horizon stochastic sequential decision-making problem. In the past, policy optimization has been applied successfully to design optimal supervisory controller for best fuel economy. Static emissions have been considered too but engine research has shown that transient operation can have significant impact on real-world emissions. Modeling transient emissions results in addition of more states. Therefore, the problem with multiple objectives i.e. minimize fuel consumption and transient particulate and NOX emissions, becomes computationally intractable by DP. This research captures the insight with models and brings it into the supervisory controller design. A self-learning supervisory controller is designed based on the principles of NDP and RL. The controller starts “naïve” i.e. with no knowledge to control the onboard power but learns to do so in an optimal manner after interacting with the system. The controller tries to minimize multiple objectives and continues to evolve until a global solution is achieved. Virtual sensors for predicting real-time transient particulate and NOX emissions are developed using neuro-fuzzy modeling technique, which utilizes a divide-and-conquer strategy. The highly nonlinear engine operating space is partitioned into smaller subspaces and a separate local model is trained to for each subspace. Finally, the supervisory controller along with virtual emission sensors is implemented and evaluated using the Engine-In-the-Loop (EIL) setup. EIL is a unique facility to systematically evaluate control methodologies through concurrent running of real engine and a virtual hybrid powertrain.Ph.D.Mechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89829/1/rajit_1.pd

    Towards a Unified Approach to Learning and Adaptation

    Get PDF
    The aim of this thesis is to develop a system that enables autonomous and situated agents to learn and adapt to the environment in which they live and operate. In doing so, the system exploits both adaptation through learning and evolution. A unified approach to learning and adaptation, which combines the principles of neural networks, reinforcement learning and evolutionary methods, is used as a basis for the development of the system. In this regard, a novel method, called Evolutionary Acquisition of Neural Topologies (EANT), of evolving the structures and weights of neural networks is developed. The method introduces an efficient and compact genetic encoding of a neural network onto a linear genome that encodes the topology of the neural network implicitly in the ordering of the elements of the linear genome. Moreover, it enables one to evaluate the neural network without decoding it. The presented genetic encoding is complete in that it can represent any type of neural network. In addition to this, it is closed under both structural mutation and a specially designed crossover operator which exploits the fact that structures originating from some initial structure have some common parts. For evolving the structure and weights of neural networks, the method uses a biologically inspired meta-level evolutionary process where new structures are explored at larger timescale and existing structures are exploited at smaller timescale. The evolutionary process starts with networks of minimal structures whose initial complexity is specified by the domain expert. The introduction of neural structures by structural mutation results in a gradual increase in the complexity of the neural networks along the evolution. The evolutionary process stops searching for the solution when a solution with the necessary minimum complexity is found. This enables EANT to find optimal neural structures for solving a given learning task. The efficiency of EANT is tested on couple of learning tasks and its performance is found to be very good in comparison to other systems tested on the same tasks
    • 

    corecore