85,101 research outputs found
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks
Variational quantum eigensolvers have recently received increased attention,
as they enable the use of quantum computing devices to find solutions to
complex problems, such as the ground energy and ground state of
strongly-correlated quantum many-body systems. In many applications, it is the
optimization of both continuous and discrete parameters that poses a formidable
challenge. Using reinforcement learning (RL), we present a hybrid policy
gradient algorithm capable of simultaneously optimizing continuous and discrete
degrees of freedom in an uncertainty-resilient way. The hybrid policy is
modeled by a deep autoregressive neural network to capture causality. We employ
the algorithm to prepare the ground state of the nonintegrable quantum Ising
model in a unitary process, parametrized by a generalized quantum approximate
optimization ansatz: the RL agent solves the discrete combinatorial problem of
constructing the optimal sequences of unitaries out of a predefined set and, at
the same time, it optimizes the continuous durations for which these unitaries
are applied. We demonstrate the noise-robust features of the agent by
considering three sources of uncertainty: classical and quantum measurement
noise, and errors in the control unitary durations. Our work exhibits the
beneficial synergy between reinforcement learning and quantum control
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression
Developing mathematical models of dynamic systems is central to many
disciplines of engineering and science. Models facilitate simulations, analysis
of the system's behavior, decision making and design of automatic control
algorithms. Even inherently model-free control techniques such as reinforcement
learning (RL) have been shown to benefit from the use of models, typically
learned online. Any model construction method must address the tradeoff between
the accuracy of the model and its complexity, which is difficult to strike. In
this paper, we propose to employ symbolic regression (SR) to construct
parsimonious process models described by analytic equations. We have equipped
our method with two different state-of-the-art SR algorithms which
automatically search for equations that fit the measured data: Single Node
Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In
addition to the standard problem formulation in the state-space domain, we show
how the method can also be applied to input-output models of the NARX
(nonlinear autoregressive with exogenous input) type. We present the approach
on three simulated examples with up to 14-dimensional state space: an inverted
pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep
neural networks and local linear regression shows that SR in most cases
outperforms these commonly used alternative methods. We demonstrate on a real
pendulum system that the analytic model found enables a RL controller to
successfully perform the swing-up task, based on a model constructed from only
100 data samples
- …