3,021 research outputs found
Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning
In the context of modern environmental and societal concerns, there is an
increasing demand for methods able to identify management strategies for civil
engineering systems, minimizing structural failure risks while optimally
planning inspection and maintenance (I&M) processes. Most available methods
simplify the I&M decision problem to the component level due to the
computational complexity associated with global optimization methodologies
under joint system-level state descriptions. In this paper, we propose an
efficient algorithmic framework for inference and decision-making under
uncertainty for engineering systems exposed to deteriorating environments,
providing optimal management strategies directly at the system level. In our
approach, the decision problem is formulated as a factored partially observable
Markov decision process, whose dynamics are encoded in Bayesian network
conditional structures. The methodology can handle environments under equal or
general, unequal deterioration correlations among components, through Gaussian
hierarchical structures and dynamic Bayesian networks. In terms of policy
optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC)
reinforcement learning approach, in which the policies are approximated by
actor neural networks guided by a critic network. By including deterioration
dependence in the simulated environment, and by formulating the cost model at
the system level, DDMAC policies intrinsically consider the underlying
system-effects. This is demonstrated through numerical experiments conducted
for both a 9-out-of-10 system and a steel frame under fatigue deterioration.
Results demonstrate that DDMAC policies offer substantial benefits when
compared to state-of-the-art heuristic approaches. The inherent consideration
of system-effects by DDMAC strategies is also interpreted based on the learned
policies
Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation
Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems that
can optimize traffic signals based on limited detected information could be a
cost-efficient solution for mitigating traffic congestion in the future. In
this paper, we focus on a particular problem in PD-ITSC - adaptation to
changing environments. To this end, we investigate different reinforcement
learning algorithms, including Q-learning, Proximal Policy Optimization (PPO),
Advantage Actor-Critic (A2C), and Actor-Critic with Kronecker-Factored Trust
Region (ACKTR). Our findings suggest that RL algorithms can find optimal
strategies under partial vehicle detection; however, policy-based algorithms
can adapt to changing environments more efficiently than value-based
algorithms. We use these findings to draw conclusions about the value of
different models for PD-ITSC systems.Comment: Accepted by ICMLA 201
Distributed reinforcement learning for self-reconfiguring modular robots
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 101-106).In this thesis, we study distributed reinforcement learning in the context of automating the design of decentralized control for groups of cooperating, coupled robots. Specifically, we develop a framework and algorithms for automatically generating distributed controllers for self-reconfiguring modular robots using reinforcement learning. The promise of self-reconfiguring modular robots is that of robustness, adaptability and versatility. Yet most state-of-the-art distributed controllers are laboriously handcrafted and task-specific, due to the inherent complexities of distributed, local-only control. In this thesis, we propose and develop a framework for using reinforcement learning for automatic generation of such controllers. The approach is profitable because reinforcement learning methods search for good behaviors during the lifetime of the learning agent, and are therefore applicable to online adaptation as well as automatic controller design. However, we must overcome the challenges due to the fundamental partial observability inherent in a distributed system such as a self reconfiguring modular robot. We use a family of policy search methods that we adapt to our distributed problem. The outcome of a local search is always influenced by the search space dimensionality, its starting point, and the amount and quality of available exploration through experience.(cont) We undertake a systematic study of the effects that certain robot and task parameters, such as the number of modules, presence of exploration constraints, availability of nearest-neighbor communications, and partial behavioral knowledge from previous experience, have on the speed and reliability of learning through policy search in self-reconfiguring modular robots. In the process, we develop novel algorithmic variations and compact search space representations for learning in our domain, which we test experimentally on a number of tasks. This thesis is an empirical study of reinforcement learning in a simulated lattice based self-reconfiguring modular robot domain. However, our results contribute to the broader understanding of automatic generation of group control and design of distributed reinforcement learning algorithms.by Paulina Varshavskaya.Ph.D
Relational Approach to Knowledge Engineering for POMDP-based Assistance Systems as a Translation of a Psychological Model
Assistive systems for persons with cognitive disabilities (e.g. dementia) are
difficult to build due to the wide range of different approaches people can
take to accomplishing the same task, and the significant uncertainties that
arise from both the unpredictability of client's behaviours and from noise in
sensor readings. Partially observable Markov decision process (POMDP) models
have been used successfully as the reasoning engine behind such assistive
systems for small multi-step tasks such as hand washing. POMDP models are a
powerful, yet flexible framework for modelling assistance that can deal with
uncertainty and utility. Unfortunately, POMDPs usually require a very labour
intensive, manual procedure for their definition and construction. Our previous
work has described a knowledge driven method for automatically generating POMDP
activity recognition and context sensitive prompting systems for complex tasks.
We call the resulting POMDP a SNAP (SyNdetic Assistance Process). The
spreadsheet-like result of the analysis does not correspond to the POMDP model
directly and the translation to a formal POMDP representation is required. To
date, this translation had to be performed manually by a trained POMDP expert.
In this paper, we formalise and automate this translation process using a
probabilistic relational model (PRM) encoded in a relational database. We
demonstrate the method by eliciting three assistance tasks from non-experts. We
validate the resulting POMDP models using case-based simulations to show that
they are reasonable for the domains. We also show a complete case study of a
designer specifying one database, including an evaluation in a real-life
experiment with a human actor
Strategies for Scaleable Communication and Coordination in Multi-Agent (UAV) Systems
A system is considered in which agents (UAVs) must cooperatively discover interest-points (i.e., burning trees, geographical features) evolving over a grid. The objective is to locate as many interest-points as possible in the shortest possible time frame. There are two main problems: a control problem, where agents must collectively determine the optimal action, and a communication problem, where agents must share their local states and infer a common global state. Both problems become intractable when the number of agents is large. This survey/concept paper curates a broad selection of work in the literature pointing to a possible solution; a unified control/communication architecture within the framework of reinforcement learning. Two components of this architecture are locally interactive structure in the state-space, and hierarchical multi-level clustering for system-wide communication. The former mitigates the complexity of the control problem and the latter adapts to fundamental throughput constraints in wireless networks. The challenges of applying reinforcement learning to multi-agent systems are discussed. The role of clustering is explored in multi-agent communication. Research directions are suggested to unify these components
Planning under risk and uncertainty
This thesis concentrates on the optimization of large-scale management policies under conditions of risk and uncertainty. In paper I, we address the problem of solving large-scale spatial and temporal natural resource management problems. To model these types of problems, the framework of graph-based Markov decision processes (GMDPs) can be used. Two algorithms for computation of high-quality management policies are presented: the first is based on approximate linear programming (ALP) and the second is based on mean-field approximation and approximate policy iteration (MF-API). The applicability and efficiency of the algorithms were demonstrated by their ability to compute near-optimal management policies for two large-scale management problems. It was concluded that the two algorithms compute policies of similar quality. However, the MF-API algorithm should be used when both the policy and the expected value of the computed policy are required, while the ALP algorithm may be preferred when only the policy is required. In paper II, a number of reinforcement learning algorithms are presented that can be used to compute management policies for GMDPs when the transition function can only be simulated because its explicit formulation is unknown. Studies of the efficiency of the algorithms for three management problems led us to conclude that some of these algorithms were able to compute near-optimal management policies. In paper III, we used the GMDP framework to optimize long-term forestry management policies under stochastic wind-damage events. The model was demonstrated by a case study of an estate consisting of 1,200 ha of forest land, divided into 623 stands. We concluded that managing the estate according to the risk of wind damage increased the expected net present value (NPV) of the whole estate only slightly, less than 2%, under different wind-risk assumptions. Most of the stands were managed in the same manner as when the risk of wind damage was not considered. However, the analysis rests on properties of the model that need to be refined before definite conclusions can be drawn
Multi-Objective Constraint Satisfaction for Mobile Robot Area Defense
In developing multi-robot cooperative systems, there are often competing objectives that need to be met. For example in automating area defense systems, multiple robots must work together to explore the entire area, and maintain consistent communications to alert the other agents and ensure trust in the system. This research presents an algorithm that tasks robots to meet the two specific goals of exploration and communication maintenance in an uncoordinated environment reducing the need for a user to pre-balance the objectives. This multi-objective problem is defined as a constraint satisfaction problem solved using the Non-dominated Sorting Genetic Algorithm II (NSGA-II). Both goals of exploration and communication maintenance are described as fitness functions in the algorithm that would satisfy their corresponding constraints. The exploration fitness was described in three ways to diversify the way exploration was measured, whereas the communication maintenance fitness was calculated as the number of independent clusters of agents. Applying the algorithm to the area defense problem, results show exploration and communication without coordination are two diametrically opposed goals, in which one may be favored, but only at the expense of the other. This work also presents suggestions for anyone looking to take further steps in developing a physically grounded solution to this area defense problem
- …