96,548 research outputs found
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting
We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting
We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem
Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve future
performance is critical for building lifelong learning agents. Although results
in supervised and reinforcement learning show that transfer may significantly
improve the learning performance, most of the literature on transfer is focused
on batch learning tasks. In this paper we study the problem of
\textit{sequential transfer in online learning}, notably in the multi-armed
bandit framework, where the objective is to minimize the cumulative regret over
a sequence of tasks by incrementally transferring knowledge from prior tasks.
We introduce a novel bandit algorithm based on a method-of-moments approach for
the estimation of the possible tasks and derive regret bounds for it
Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observable
Markov decision processes (POMDP) based on spectral decomposition methods.
While spectral methods have been previously employed for consistent learning of
(passive) latent variable models such as hidden Markov models, POMDPs are more
challenging since the learner interacts with the environment and possibly
changes the future observations in the process. We devise a learning algorithm
running through epochs, in each epoch we employ spectral techniques to learn
the POMDP parameters from a trajectory generated by a fixed policy. At the end
of the epoch, an optimization oracle returns the optimal memoryless planning
policy which maximizes the expected reward based on the estimated POMDP model.
We prove an order-optimal regret bound with respect to the optimal memoryless
policy and efficient scaling with respect to the dimensionality of observation
and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016),
Barcelona, Spai
Safe Local Exploration for Replanning in Cluttered Unknown Environments for Micro-Aerial Vehicles
In order to enable Micro-Aerial Vehicles (MAVs) to assist in complex,
unknown, unstructured environments, they must be able to navigate with
guaranteed safety, even when faced with a cluttered environment they have no
prior knowledge of. While trajectory optimization-based local planners have
been shown to perform well in these cases, prior work either does not address
how to deal with local minima in the optimization problem, or solves it by
using an optimistic global planner.
We present a conservative trajectory optimization-based local planner,
coupled with a local exploration strategy that selects intermediate goals. We
perform extensive simulations to show that this system performs better than the
standard approach of using an optimistic global planner, and also outperforms
doing a single exploration step when the local planner is stuck. The method is
validated through experiments in a variety of highly cluttered environments
including a dense forest. These experiments show the complete system running in
real time fully onboard an MAV, mapping and replanning at 4 Hz.Comment: Accepted to ICRA 2018 and RA-L 201
On structure, family and parameter estimation of hierarchical Archimedean copulas
Research on structure determination and parameter estimation of hierarchical
Archimedean copulas (HACs) has so far mostly focused on the case in which all
appearing Archimedean copulas belong to the same Archimedean family. The
present work addresses this issue and proposes a new approach for estimating
HACs that involve different Archimedean families. It is based on employing
goodness-of-fit test statistics directly into HAC estimation. The approach is
summarized in a simple algorithm, its theoretical justification is given and
its applicability is illustrated by several experiments, which include
estimation of HACs involving up to five different Archimedean families.Comment: 63 pages, one attachment in attachment.pd
Reconstructing large-scale structure with neutral hydrogen surveys
Upcoming 21-cm intensity surveys will use the hyperfine transition in emission to map out neutral hydrogen in large volumes of the universe. Unfortunately, large spatial scales are completely contaminated with spectrally smooth astrophysical foregrounds which are orders of magnitude brighter than the signal. This contamination also leaks into smaller radial and angular modes to form a foreground wedge, further limiting the usefulness of 21-cm observations for different science cases, especially cross-correlations with tracers that have wide kernels in the radial direction. In this paper, we investigate reconstructing these modes within a forward modeling framework. Starting with an initial density field, a suitable bias parameterization and non-linear dynamics to model the observed 21-cm field, our reconstruction proceeds by {combining} the likelihood of a forward simulation to match the observations (under given modeling error and a data noise model) {with the Gaussian prior on initial conditions and maximizing the obtained posterior}. For redshifts z=2 and 4, we are able to reconstruct 21cm field with cross correlation, rc > 0.8 on all scales for both our optimistic and pessimistic assumptions about foreground contamination and for different levels of thermal noise. The performance deteriorates slightly at z=6. The large-scale line-of-sight modes are reconstructed almost perfectly. We demonstrate how our method also provides a technique for density field reconstruction for baryon acoustic oscillations, outperforming standard methods on all scales. We also describe how our reconstructed field can provide superb clustering redshift estimation at high redshifts, where it is otherwise extremely difficult to obtain dense spectroscopic samples, as well as open up a wealth of cross-correlation opportunities with projected fields (e.g. lensing) which are restricted to modes transverse to the line of sight
- …