28 research outputs found

    Dynamic multi-objective optimisation using deep reinforcement learning::benchmark, algorithm and an application to identify vulnerable zones based on water quality

    Get PDF
    Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in São Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario

    Policy Explanation and Model Refinement in Decision-Theoretic Planning

    Get PDF
    Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations

    The Dynamic Multi-objective Multi-vehicle Covering Tour Problem

    Get PDF
    This work introduces a new routing problem called the Dynamic Multi-Objective Multi-vehicle Covering Tour Problem (DMOMCTP). The DMOMCTPs is a combinatorial optimization problem that represents the problem of routing multiple vehicles to survey an area in which unpredictable target nodes may appear during execution. The formulation includes multiple objectives that include minimizing the cost of the combined tour cost, minimizing the longest tour cost, minimizing the distance to nodes to be covered and maximizing the distance to hazardous nodes. This study adapts several existing algorithms to the problem with several operator and solution encoding variations. The efficacy of this set of solvers is measured against six problem instances created from existing Traveling Salesman Problem instances which represent several real countries. The results indicate that repair operators, variable length solution encodings and variable-length operators obtain a better approximation of the true Pareto front

    A Theory of Model Selection in Reinforcement Learning

    Full text link
    Reinforcement Learning (RL) is a machine learning paradigm where an agent learns to accomplish sequential decision-making tasks from experience. Applications of RL are found in robotics and control, dialog systems, medical treatment, etc. Despite the generality of the framework, most empirical successes of RL to-date are restricted to simulated environments, where hyperparameters are tuned by trial and error using large amounts of data. In contrast, collecting data with active intervention in the real world can be costly, time-consuming, and sometimes unsafe. Choosing the hyperparameters and understanding their effects in face of these data limitations, i.e., model selection, is an important yet open direction that we need to study to enable such applications of RL, which is the main theme of this thesis. More concretely, this thesis presents theoretical results that improve our understanding of 3 hyperparameters in RL: planning horizon, state representation (abstraction), and reward function. The 1st part of the thesis focuses on the interplay between planning horizon and limited amount of data, and establishes a formal explanation for how a long planning horizon can cause overfitting. The 2nd part considers the problem of choosing the right state abstraction using limited batch data; I show that cross-validation type methods require importance sampling and suffer from exponential variance, and a novel regularization-based algorithm enjoys an oracle-like property. The 3rd part investigates reward misspecification and tries to resolve it by leveraging expert demonstrations, which is inspired by AI safety concerns and bears close connections to inverse reinforcement learning. A recurring theme of the thesis is the deployment of formulations and techniques from other machine learning theory (mostly statistical learning theory): the planning horizon work explains the overfitting phenomenon by making a formal analogy to empirical risk minimization and by proving planning loss bounds that are similar to generalization error bounds; the main result in the abstraction selection work takes the form of an oracle inequality, which is a concept from structural risk minimization for model selection in supervised learning; the inverse RL work provides a mistake-bound type analysis under arbitrarily chosen environments, which can be viewed as a form of no-regret learning. Overall, by borrowing ideas from mature theories of machine learning, we can develop analogies for RL that allow us to better understand the impact of hyperparameters, and develop algorithms that automatically set them in an effective manner.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138518/1/nanjiang_1.pd

    Runtime Quantitative Verification of Self-Adaptive Systems

    Get PDF
    Software systems used in mission- and business-critical applications in domains including defence, healthcare, and finance must comply with strict dependability, performance, and other Quality-of-Service (QoS) requirements. Self-adaptive systems achieve this compliance under changing environmental conditions, evolving requirements and system failures by using closed-loop control to modify their behaviour and structure in response to these events. Runtime quantitative verification (RQV) is a mathematically-based approach that implements the closed-loop control of self-adaptive systems. Using runtime observations of a system and its environment, RQV updates stochastic models whose formal analysis underpins the adaptation decisions made within the control loop. The approach can identify and, under certain conditions, predict violation of QoS requirements, and can drive self-adaptation in ways guaranteed to restore or maintain compliance with these requirements. Despite its merits, RQV has significant computation and memory overheads, which restrict its applicability to small systems and to adaptations affecting only the configuration parameters of the system. In this thesis, we introduce RQV variants that improve the efficiency and scalability of the approach and extend its applicability to larger and more complex self-adaptive software systems, and to adaptations that modify the structure of a system. First, we integrate RQV with established efficiency improvement techniques from other software engineering areas. We use caching of recent analysis results, limited lookahead to precompute suitable adaptations for potential future changes, and nearly-optimal reconfiguration to eliminate the need for an exhaustive analysis of the entire reconfiguration space. Second, we introduce an RQV variant that incorporates evolutionary algorithms into the RQV process facilitating the efficient search through large reconfiguration spaces and enabling adaptations that include structural changes. Third, we propose an RQV-driven approach that decentralises the control loops in distributed self-adaptive systems. Finally, we devise an RQV-based methodology for the engineering of trustworthy self-adaptive systems. We evaluate the proposed RQV variants using prototype self-adaptive systems from several application domains, including an embedded system for unmanned underwater vehicles and a foreign exchange service-based system. Our results, subject to the adaptation scenarios used in the evaluation, demonstrate the effectiveness and generality of the new RQV variants

    Humanoid Robots

    Get PDF
    For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion

    Artificial Intelligence for Small Satellites Mission Autonomy

    Get PDF
    Space mission engineering has always been recognized as a very challenging and innovative branch of engineering: since the beginning of the space race, numerous milestones, key successes and failures, improvements, and connections with other engineering domains have been reached. Despite its relative young age, space engineering discipline has not gone through homogeneous times: alternation of leading nations, shifts in public and private interests, allocations of resources to different domains and goals are all examples of an intrinsic dynamism that characterized this discipline. The dynamism is even more striking in the last two decades, in which several factors contributed to the fervour of this period. Two of the most important ones were certainly the increased presence and push of the commercial and private sector and the overall intent of reducing the size of the spacecraft while maintaining comparable level of performances. A key example of the second driver is the introduction, in 1999, of a new category of space systems called CubeSats. Envisioned and designed to ease the access to space for universities, by standardizing the development of the spacecraft and by ensuring high probabilities of acceptance as piggyback customers in launches, the standard was quickly adopted not only by universities, but also by agencies and private companies. CubeSats turned out to be a disruptive innovation, and the space mission ecosystem was deeply changed by this. New mission concepts and architectures are being developed: CubeSats are now considered as secondary payloads of bigger missions, constellations are being deployed in Low Earth Orbit to perform observation missions to a performance level considered to be only achievable by traditional, fully-sized spacecraft. CubeSats, and more in general the small satellites technology, had to overcome important challenges in the last few years that were constraining and reducing the diffusion and adoption potential of smaller spacecraft for scientific and technology demonstration missions. Among these challenges were: the miniaturization of propulsion technologies, to enable concepts such as Rendezvous and Docking, or interplanetary missions; the improvement of telecommunication state of the art for small satellites, to enable the downlink to Earth of all the data acquired during the mission; and the miniaturization of scientific instruments, to be able to exploit CubeSats in more meaningful, scientific, ways. With the size reduction and with the consolidation of the technology, many aspects of a space mission are reduced in consequence: among these, costs, development and launch times can be cited. An important aspect that has not been demonstrated to scale accordingly is operations: even for small satellite missions, human operators and performant ground control centres are needed. In addition, with the possibility of having constellations or interplanetary distributed missions, a redesign of how operations are management is required, to cope with the innovation in space mission architectures. The present work has been carried out to address the issue of operations for small satellite missions. The thesis presents a research, carried out in several institutions (Politecnico di Torino, MIT, NASA JPL), aimed at improving the autonomy level of space missions, and in particular of small satellites. The key technology exploited in the research is Artificial Intelligence, a computer science branch that has gained extreme interest in research disciplines such as medicine, security, image recognition and language processing, and is currently making its way in space engineering as well. The thesis focuses on three topics, and three related applications have been developed and are here presented: autonomous operations by means of event detection algorithms, intelligent failure detection on small satellite actuator systems, and decision-making support thanks to intelligent tradespace exploration during the preliminary design of space missions. The Artificial Intelligent technologies explored are: Machine Learning, and in particular Neural Networks; Knowledge-based Systems, and in particular Fuzzy Logics; Evolutionary Algorithms, and in particular Genetic Algorithms. The thesis covers the domain (small satellites), the technology (Artificial Intelligence), the focus (mission autonomy) and presents three case studies, that demonstrate the feasibility of employing Artificial Intelligence to enhance how missions are currently operated and designed
    corecore