96,548 research outputs found

    Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

    Get PDF
    We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem

    Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

    Get PDF
    We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem

    Sequential Transfer in Multi-armed Bandit with Finite Set of Models

    Get PDF
    Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it

    Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

    Get PDF
    We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spai

    Safe Local Exploration for Replanning in Cluttered Unknown Environments for Micro-Aerial Vehicles

    Full text link
    In order to enable Micro-Aerial Vehicles (MAVs) to assist in complex, unknown, unstructured environments, they must be able to navigate with guaranteed safety, even when faced with a cluttered environment they have no prior knowledge of. While trajectory optimization-based local planners have been shown to perform well in these cases, prior work either does not address how to deal with local minima in the optimization problem, or solves it by using an optimistic global planner. We present a conservative trajectory optimization-based local planner, coupled with a local exploration strategy that selects intermediate goals. We perform extensive simulations to show that this system performs better than the standard approach of using an optimistic global planner, and also outperforms doing a single exploration step when the local planner is stuck. The method is validated through experiments in a variety of highly cluttered environments including a dense forest. These experiments show the complete system running in real time fully onboard an MAV, mapping and replanning at 4 Hz.Comment: Accepted to ICRA 2018 and RA-L 201

    On structure, family and parameter estimation of hierarchical Archimedean copulas

    Full text link
    Research on structure determination and parameter estimation of hierarchical Archimedean copulas (HACs) has so far mostly focused on the case in which all appearing Archimedean copulas belong to the same Archimedean family. The present work addresses this issue and proposes a new approach for estimating HACs that involve different Archimedean families. It is based on employing goodness-of-fit test statistics directly into HAC estimation. The approach is summarized in a simple algorithm, its theoretical justification is given and its applicability is illustrated by several experiments, which include estimation of HACs involving up to five different Archimedean families.Comment: 63 pages, one attachment in attachment.pd

    Reconstructing large-scale structure with neutral hydrogen surveys

    Get PDF
    Upcoming 21-cm intensity surveys will use the hyperfine transition in emission to map out neutral hydrogen in large volumes of the universe. Unfortunately, large spatial scales are completely contaminated with spectrally smooth astrophysical foregrounds which are orders of magnitude brighter than the signal. This contamination also leaks into smaller radial and angular modes to form a foreground wedge, further limiting the usefulness of 21-cm observations for different science cases, especially cross-correlations with tracers that have wide kernels in the radial direction. In this paper, we investigate reconstructing these modes within a forward modeling framework. Starting with an initial density field, a suitable bias parameterization and non-linear dynamics to model the observed 21-cm field, our reconstruction proceeds by {combining} the likelihood of a forward simulation to match the observations (under given modeling error and a data noise model) {with the Gaussian prior on initial conditions and maximizing the obtained posterior}. For redshifts z=2 and 4, we are able to reconstruct 21cm field with cross correlation, rc > 0.8 on all scales for both our optimistic and pessimistic assumptions about foreground contamination and for different levels of thermal noise. The performance deteriorates slightly at z=6. The large-scale line-of-sight modes are reconstructed almost perfectly. We demonstrate how our method also provides a technique for density field reconstruction for baryon acoustic oscillations, outperforming standard methods on all scales. We also describe how our reconstructed field can provide superb clustering redshift estimation at high redshifts, where it is otherwise extremely difficult to obtain dense spectroscopic samples, as well as open up a wealth of cross-correlation opportunities with projected fields (e.g. lensing) which are restricted to modes transverse to the line of sight
    corecore