23,362 research outputs found
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
Probabilistic Exploration in Planning while Learning
Sequential decision tasks with incomplete information are characterized by
the exploration problem; namely the trade-off between further exploration for
learning more about the environment and immediate exploitation of the accrued
information for decision-making. Within artificial intelligence, there has been
an increasing interest in studying planning-while-learning algorithms for these
decision tasks. In this paper we focus on the exploration problem in
reinforcement learning and Q-learning in particular. The existing exploration
strategies for Q-learning are of a heuristic nature and they exhibit limited
scaleability in tasks with large (or infinite) state and action spaces.
Efficient experimentation is needed for resolving uncertainties when possible
plans are compared (i.e. exploration). The experimentation should be sufficient
for selecting with statistical significance a locally optimal plan (i.e.
exploitation). For this purpose, we develop a probabilistic hill-climbing
algorithm that uses a statistical selection procedure to decide how much
exploration is needed for selecting a plan which is, with arbitrarily high
probability, arbitrarily close to a locally optimal one. Due to its generality
the algorithm can be employed for the exploration strategy of robust
Q-learning. An experiment on a relatively complex control task shows that the
proposed exploration strategy performs better than a typical exploration
strategy.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems
This paper presents a novel method of global adaptive dynamic programming
(ADP) for the adaptive optimal control of nonlinear polynomial systems. The
strategy consists of relaxing the problem of solving the
Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is
solved via a new policy iteration method. The proposed method distinguishes
from previously known nonlinear ADP methods in that the neural network
approximation is avoided, giving rise to significant computational improvement.
Instead of semiglobally or locally stabilizing, the resultant control policy is
globally stabilizing for a general class of nonlinear polynomial systems.
Furthermore, in the absence of the a priori knowledge of the system dynamics,
an online learning method is devised to implement the proposed policy iteration
technique by generalizing the current ADP theory. Finally, three numerical
examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive
Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE
Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015.
Few typos have been fixed in this versio
Reinforcement Learning for Batch Bioprocess Optimization
Bioprocesses have received a lot of attention to produce clean and
sustainable alternatives to fossil-based materials. However, they are generally
difficult to optimize due to their unsteady-state operation modes and
stochastic behaviours. Furthermore, biological systems are highly complex,
therefore plant-model mismatch is often present. To address the aforementioned
challenges we propose a Reinforcement learning based optimization strategy for
batch processes.
In this work, we applied the Policy Gradient method from batch-to-batch to
update a control policy parametrized by a recurrent neural network. We assume
that a preliminary process model is available, which is exploited to obtain a
preliminary optimal control policy. Subsequently, this policy is updatedbased
on measurements from thetrueplant. The capabilities of our proposed approach
were tested on three case studies (one of which is nonsmooth) using a more
complex process model for thetruesystemembedded with adequate process
disturbance. Lastly, we discussed the advantages and disadvantages of this
strategy compared against current existing approaches such as nonlinear model
predictive control
A Tour of Reinforcement Learning: The View from Continuous Control
This manuscript surveys reinforcement learning from the perspective of
optimization and control with a focus on continuous control applications. It
surveys the general formulation, terminology, and typical experimental
implementations of reinforcement learning and reviews competing solution
paradigms. In order to compare the relative merits of various techniques, this
survey presents a case study of the Linear Quadratic Regulator (LQR) with
unknown dynamics, perhaps the simplest and best-studied problem in optimal
control. The manuscript describes how merging techniques from learning theory
and control can provide non-asymptotic characterizations of LQR performance and
shows that these characterizations tend to match experimental behavior. In
turn, when revisiting more complex applications, many of the observed phenomena
in LQR persist. In particular, theory and experiment demonstrate the role and
importance of models and the cost of generality in reinforcement learning
algorithms. This survey concludes with a discussion of some of the challenges
in designing learning systems that safely and reliably interact with complex
and uncertain environments and how tools from reinforcement learning and
control might be combined to approach these challenges.Comment: minor revision with a few clarifying passages and corrected typo
A model for system uncertainty in reinforcement learning
This work provides a rigorous framework for studying continuous time control
problems in uncertain environments. The framework considered models uncertainty
in state dynamics as a measure on the space of functions. This measure is
considered to change over time as agents learn their environment. This model
can be seem as a variant of either Bayesian reinforcement learning or adaptive
control. We study necessary conditions for locally optimal trajectories within
this model, in particular deriving an appropriate dynamic programming principle
and Hamilton-Jacobi equations. This model provides one possible framework for
studying the tradeoff between exploration and exploitation in reinforcement
learning
Controlled hierarchical filtering: Model of neocortical sensory processing
A model of sensory information processing is presented. The model assumes
that learning of internal (hidden) generative models, which can predict the
future and evaluate the precision of that prediction, is of central importance
for information extraction. Furthermore, the model makes a bridge to
goal-oriented systems and builds upon the structural similarity between the
architecture of a robust controller and that of the hippocampal entorhinal
loop. This generative control architecture is mapped to the neocortex and to
the hippocampal entorhinal loop. Implicit memory phenomena; priming and
prototype learning are emerging features of the model. Mathematical theorems
ensure stability and attractive learning properties of the architecture.
Connections to reinforcement learning are also established: both the control
network, and the network with a hidden model converge to (near) optimal policy
under suitable conditions. Falsifying predictions, including the role of the
feedback connections between neocortical areas are made.Comment: Technical Report, 38 pages, 10 figure
A Game Theoretic Perspective on Self-organizing Optimization for Cognitive Small Cells
In this article, we investigate self-organizing optimization for cognitive
small cells (CSCs), which have the ability to sense the environment, learn from
historical information, make intelligent decisions, and adjust their
operational parameters. By exploring the inherent features, some fundamental
challenges for self-organizing optimization in CSCs are presented and
discussed. Specifically, the dense and random deployment of CSCs brings about
some new challenges in terms of scalability and adaptation; furthermore, the
uncertain, dynamic and incomplete information constraints also impose some new
challenges in terms of convergence and robustness. For providing better service
to the users and improving the resource utilization, four requirements for
self-organizing optimization in CSCs are presented and discussed. Following the
attractive fact that the decisions in game-theoretic models are exactly
coincident with those in self-organizing optimization, i.e., distributed and
autonomous, we establish a framework of game-theoretic solutions for
self-organizing optimization in CSCs, and propose some featured game models.
Specifically, their basic models are presented, some examples are discussed and
future research directions are given.Comment: 8 Pages, 8 Figures, to appear in IEEE Communications Magazin
Cautious Model Predictive Control using Gaussian Process Regression
Gaussian process (GP) regression has been widely used in supervised machine
learning due to its flexibility and inherent ability to describe uncertainty in
function estimation. In the context of control, it is seeing increasing use for
modeling of nonlinear dynamical systems from data, as it allows the direct
assessment of residual model uncertainty. We present a model predictive control
(MPC) approach that integrates a nominal system with an additive nonlinear part
of the dynamics modeled as a GP. Approximation techniques for propagating the
state distribution are reviewed and we describe a principled way of formulating
the chance constrained MPC problem, which takes into account residual
uncertainties provided by the GP model to enable cautious control. Using
additional approximations for efficient computation, we finally demonstrate the
approach in a simulation example, as well as in a hardware implementation for
autonomous racing of remote controlled race cars, highlighting improvements
with regard to both performance and safety over a nominal controller.Comment: Published in IEEE Transactions on Control Systems Technolog
Computer Algebra Methods in Control Systems
As dynamic and control systems become more complex, relying purely on
numerical computations for systems analysis and design might become extremely
expensive or totally infeasible. Computer algebra can act as an enabler for
analysis and design of such complex systems. It also provides means for
characterization of all solutions and studying them before realizing a
particular solution. This note provides a brief survey on some of the
applications of symbolic computations in control systems analysis and design.Comment: 10 page
- …