23,362 research outputs found

    Verification for Machine Learning, Autonomy, and Neural Networks Survey

    Full text link
    This survey presents an overview of verification techniques for autonomous systems, with a focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components (LECs) that accomplish tasks from classification to control. Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this article presents a survey of many of these recent approaches

    Probabilistic Exploration in Planning while Learning

    Full text link
    Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbitrarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy performs better than a typical exploration strategy.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995

    Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

    Full text link
    This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015. Few typos have been fixed in this versio

    Reinforcement Learning for Batch Bioprocess Optimization

    Full text link
    Bioprocesses have received a lot of attention to produce clean and sustainable alternatives to fossil-based materials. However, they are generally difficult to optimize due to their unsteady-state operation modes and stochastic behaviours. Furthermore, biological systems are highly complex, therefore plant-model mismatch is often present. To address the aforementioned challenges we propose a Reinforcement learning based optimization strategy for batch processes. In this work, we applied the Policy Gradient method from batch-to-batch to update a control policy parametrized by a recurrent neural network. We assume that a preliminary process model is available, which is exploited to obtain a preliminary optimal control policy. Subsequently, this policy is updatedbased on measurements from thetrueplant. The capabilities of our proposed approach were tested on three case studies (one of which is nonsmooth) using a more complex process model for thetruesystemembedded with adequate process disturbance. Lastly, we discussed the advantages and disadvantages of this strategy compared against current existing approaches such as nonlinear model predictive control

    A Tour of Reinforcement Learning: The View from Continuous Control

    Full text link
    This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and control might be combined to approach these challenges.Comment: minor revision with a few clarifying passages and corrected typo

    A model for system uncertainty in reinforcement learning

    Full text link
    This work provides a rigorous framework for studying continuous time control problems in uncertain environments. The framework considered models uncertainty in state dynamics as a measure on the space of functions. This measure is considered to change over time as agents learn their environment. This model can be seem as a variant of either Bayesian reinforcement learning or adaptive control. We study necessary conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton-Jacobi equations. This model provides one possible framework for studying the tradeoff between exploration and exploitation in reinforcement learning

    Controlled hierarchical filtering: Model of neocortical sensory processing

    Full text link
    A model of sensory information processing is presented. The model assumes that learning of internal (hidden) generative models, which can predict the future and evaluate the precision of that prediction, is of central importance for information extraction. Furthermore, the model makes a bridge to goal-oriented systems and builds upon the structural similarity between the architecture of a robust controller and that of the hippocampal entorhinal loop. This generative control architecture is mapped to the neocortex and to the hippocampal entorhinal loop. Implicit memory phenomena; priming and prototype learning are emerging features of the model. Mathematical theorems ensure stability and attractive learning properties of the architecture. Connections to reinforcement learning are also established: both the control network, and the network with a hidden model converge to (near) optimal policy under suitable conditions. Falsifying predictions, including the role of the feedback connections between neocortical areas are made.Comment: Technical Report, 38 pages, 10 figure

    A Game Theoretic Perspective on Self-organizing Optimization for Cognitive Small Cells

    Full text link
    In this article, we investigate self-organizing optimization for cognitive small cells (CSCs), which have the ability to sense the environment, learn from historical information, make intelligent decisions, and adjust their operational parameters. By exploring the inherent features, some fundamental challenges for self-organizing optimization in CSCs are presented and discussed. Specifically, the dense and random deployment of CSCs brings about some new challenges in terms of scalability and adaptation; furthermore, the uncertain, dynamic and incomplete information constraints also impose some new challenges in terms of convergence and robustness. For providing better service to the users and improving the resource utilization, four requirements for self-organizing optimization in CSCs are presented and discussed. Following the attractive fact that the decisions in game-theoretic models are exactly coincident with those in self-organizing optimization, i.e., distributed and autonomous, we establish a framework of game-theoretic solutions for self-organizing optimization in CSCs, and propose some featured game models. Specifically, their basic models are presented, some examples are discussed and future research directions are given.Comment: 8 Pages, 8 Figures, to appear in IEEE Communications Magazin

    Cautious Model Predictive Control using Gaussian Process Regression

    Full text link
    Gaussian process (GP) regression has been widely used in supervised machine learning due to its flexibility and inherent ability to describe uncertainty in function estimation. In the context of control, it is seeing increasing use for modeling of nonlinear dynamical systems from data, as it allows the direct assessment of residual model uncertainty. We present a model predictive control (MPC) approach that integrates a nominal system with an additive nonlinear part of the dynamics modeled as a GP. Approximation techniques for propagating the state distribution are reviewed and we describe a principled way of formulating the chance constrained MPC problem, which takes into account residual uncertainties provided by the GP model to enable cautious control. Using additional approximations for efficient computation, we finally demonstrate the approach in a simulation example, as well as in a hardware implementation for autonomous racing of remote controlled race cars, highlighting improvements with regard to both performance and safety over a nominal controller.Comment: Published in IEEE Transactions on Control Systems Technolog

    Computer Algebra Methods in Control Systems

    Full text link
    As dynamic and control systems become more complex, relying purely on numerical computations for systems analysis and design might become extremely expensive or totally infeasible. Computer algebra can act as an enabler for analysis and design of such complex systems. It also provides means for characterization of all solutions and studying them before realizing a particular solution. This note provides a brief survey on some of the applications of symbolic computations in control systems analysis and design.Comment: 10 page
    • …
    corecore