2,138 research outputs found
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Domain Adaptation in Unmanned Aerial Vehicles Landing using Reinforcement Learning
Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing more information on environmental characteristics with system identification would improve the outcomes, however, we found that transferring a policy learned in simulation with domain randomization to the real robot system achieves the best result in the real robot and simulation. Although in simulation the universal policy with system identification is faster in some cases. In this thesis, we compare the results of multiple deep reinforcement learning approaches trained in simulation and transferred in robot experiments with the presence of external disturbances. We were able to create a policy to control a UAV completely trained in simulation and transfer to a real system with the presence of external disturbances. In doing so, we evaluate the performance of dynamics randomization and universal policy with system identification.
Adviser: Carrick Detweile
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
Online Modeling and Monitoring of Dependent Processes under Resource Constraints
Adaptive monitoring of a large population of dynamic processes is critical
for the timely detection of abnormal events under limited resources in many
healthcare and engineering systems. Examples include the risk-based disease
screening and condition-based process monitoring. However, existing adaptive
monitoring models either ignore the dependency among processes or overlook the
uncertainty in process modeling. To design an optimal monitoring strategy that
accurately monitors the processes with poor health conditions and actively
collects information for uncertainty reduction, a novel online collaborative
learning method is proposed in this study. The proposed method designs a
collaborative learning-based upper confidence bound (CL-UCB) algorithm to
optimally balance the exploitation and exploration of dependent processes under
limited resources. Efficiency of the proposed method is demonstrated through
theoretical analysis, simulation studies and an empirical study of adaptive
cognitive monitoring in Alzheimer's disease
On entropy regularized Path Integral Control for trajectory optimization
In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly yielding a formal optimal state trajectory distribution. In this contribution, we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross-entropy between the optimal and a state trajectory distribution parametrized by a parametric stochastic policy. Inspired by this observation, we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is often addressed lately by the Reinforcement Learning (RL) community. We analyze the theoretical convergence behavior of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization
Reinforcement Learning-based Visual Navigation with Information-Theoretic Regularization
To enhance the cross-target and cross-scene generalization of target-driven
visual navigation based on deep reinforcement learning (RL), we introduce an
information-theoretic regularization term into the RL objective. The
regularization maximizes the mutual information between navigation actions and
visual observation transforms of an agent, thus promoting more informed
navigation decisions. This way, the agent models the action-observation
dynamics by learning a variational generative model. Based on the model, the
agent generates (imagines) the next observation from its current observation
and navigation target. This way, the agent learns to understand the causality
between navigation actions and the changes in its observations, which allows
the agent to predict the next action for navigation by comparing the current
and the imagined next observations. Cross-target and cross-scene evaluations on
the AI2-THOR framework show that our method attains at least a
improvement of average success rate over some state-of-the-art models. We
further evaluate our model in two real-world settings: navigation in unseen
indoor scenes from a discrete Active Vision Dataset (AVD) and continuous
real-world environments with a TurtleBot.We demonstrate that our navigation
model is able to successfully achieve navigation tasks in these scenarios.
Videos and models can be found in the supplementary material.Comment: 11 pages, corresponding author: Kai Xu ([email protected]) and
Jun Wang ([email protected]
- …