497 research outputs found
Neural Network Memory Architectures for Autonomous Robot Navigation
This paper highlights the significance of including memory structures in
neural networks when the latter are used to learn perception-action loops for
autonomous robot navigation. Traditional navigation approaches rely on global
maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet,
maintaining an accurate global map may be challenging in real-world settings. A
possible way to mitigate this limitation is to use learning techniques that
forgo hand-engineered map representations and infer appropriate control
responses directly from sensed information. An important but unexplored aspect
of such approaches is the effect of memory on their performance. This work is a
first thorough study of memory structures for deep-neural-network-based robot
navigation, and offers novel tools to train such networks from supervision and
quantify their ability to generalize to unseen scenarios. We analyze the
separation and generalization abilities of feedforward, long short-term memory,
and differentiable neural computer networks. We introduce a new method to
evaluate the generalization ability by estimating the VC-dimension of networks
with a final linear readout layer. We validate that the VC estimates are good
predictors of actual test performance. The reported method can be applied to
deep learning problems beyond robotics
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention.
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and dose of intervention based on users' ongoing performances and changing needs. A Just-In-Time Adaptive Intervention (JITAI) employs the real-time data collection and communication capabilities that modern mobile devices provide to adapt and deliver interventions in real-time. The lack of methodological guidance in constructing data-based high quality JITAI remains a hurdle in advancing JITAI research despite its increasing popularity. In the first part of the dissertation, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Under the linear reward assumption, we choose the reward function (the ``critic") parameterization separately from a lower dimensional parameterization of stochastic JITAIs (the ``actor"). We provide an online actor critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor critic algorithm, including consistency, asymptotic distribution and regret bound of the optimal JITAI parameters are developed and tested by numerical experiments. We also present numerical experiment to test performance of the algorithm when assumptions in the contextual bandits are broken. In the second part of the dissertation, we propose a statistical decision procedure that identifies whether a patient characteristic is useful for AI. We define a discrete-valued characteristic as useful in adaptive intervention if for some values of the characteristic, there is sufficient evidence to recommend a particular intervention, while for other values of the characteristic, either there is sufficient evidence to recommend a different intervention, or there is insufficient evidence to recommend a particular intervention.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133223/1/ehlei_1.pd
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
We study policy gradient (PG) for reinforcement learning in continuous time
and space under the regularized exploratory formulation developed by Wang et
al. (2020). We represent the gradient of the value function with respect to a
given parameterized stochastic policy as the expected integration of an
auxiliary running reward function that can be evaluated using samples and the
current value function. This effectively turns PG into a policy evaluation (PE)
problem, enabling us to apply the martingale approach recently developed by Jia
and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we
propose two types of the actor-critic algorithms for RL, where we learn and
update value functions and policies simultaneously and alternatingly. The first
type is based directly on the aforementioned representation which involves
future trajectories and hence is offline. The second type, designed for online
learning, employs the first-order condition of the policy gradient and turns it
into martingale orthogonality conditions. These conditions are then
incorporated using stochastic approximation when updating policies. Finally, we
demonstrate the algorithms by simulations in two concrete examples.Comment: 52 pages, 1 figur
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Effective offline RL methods require properly handling out-of-distribution
actions. Implicit Q-learning (IQL) addresses this by training a Q-function
using only dataset actions through a modified Bellman backup. However, it is
unclear which policy actually attains the values represented by this implicitly
trained Q-function. In this paper, we reinterpret IQL as an actor-critic method
by generalizing the critic objective and connecting it to a
behavior-regularized implicit actor. This generalization shows how the induced
actor balances reward maximization and divergence from the behavior policy,
with the specific loss choice determining the nature of this tradeoff. Notably,
this actor can exhibit complex and multimodal characteristics, suggesting
issues with the conditional Gaussian actor fit with advantage weighted
regression (AWR) used in prior methods. Instead, we propose using samples from
a diffusion parameterized behavior policy and weights computed from the critic
to then importance sampled our intended policy. We introduce Implicit Diffusion
Q-learning (IDQL), combining our general IQL critic with the policy extraction
method. IDQL maintains the ease of implementation of IQL while outperforming
prior offline RL methods and demonstrating robustness to hyperparameters. Code
is available at https://github.com/philippe-eecs/IDQL.Comment: 11 Pages, 6 Figures, 3 Table
- …