8 research outputs found
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Learning high-quality Q-value functions plays a key role in the success of
many modern off-policy deep reinforcement learning (RL) algorithms. Previous
works focus on addressing the value overestimation issue, an outcome of
adopting function approximators and off-policy learning. Deviating from the
common viewpoint, we observe that Q-values are indeed underestimated in the
latter stage of the RL training process, primarily related to the use of
inferior actions from the current policy in Bellman updates as compared to the
more optimal action samples in the replay buffer. We hypothesize that this
long-neglected phenomenon potentially hinders policy learning and reduces
sample efficiency. Our insight to address this issue is to incorporate
sufficient exploitation of past successes while maintaining exploration
optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a
simple yet effective approach that updates Q-value using both historical
best-performing actions and the current policy. The instantiations of our
method in both model-free and model-based settings outperform state-of-the-art
methods in various continuous control tasks and achieve strong performance in
failure-prone scenarios and real-world robot tasks
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Solving real-world complex tasks using reinforcement learning (RL) without
high-fidelity simulation environments or large amounts of offline data can be
quite challenging. Online RL agents trained in imperfect simulation
environments can suffer from severe sim-to-real issues. Offline RL approaches
although bypass the need for simulators, often pose demanding requirements on
the size and quality of the offline datasets. The recently emerged hybrid
offline-and-online RL provides an attractive framework that enables joint use
of limited offline data and imperfect simulator for transferable policy
learning. In this paper, we develop a new algorithm, called H2O+, which offers
great flexibility to bridge various choices of offline and online learning
methods, while also accounting for dynamics gaps between the real and
simulation environment. Through extensive simulation and real-world robotics
experiments, we demonstrate superior performance and flexibility over advanced
cross-domain online and offline RL algorithms
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Visual reinforcement learning (RL) has shown promise in continuous control
tasks. Despite its progress, current algorithms are still unsatisfactory in
virtually every aspect of the performance such as sample efficiency, asymptotic
performance, and their robustness to the choice of random seeds. In this paper,
we identify a major shortcoming in existing visual RL methods that is the
agents often exhibit sustained inactivity during early training, thereby
limiting their ability to explore effectively. Expanding upon this crucial
observation, we additionally unveil a significant correlation between the
agents' inclination towards motorically inactive exploration and the absence of
neuronal activity within their policy networks. To quantify this inactivity, we
adopt dormant ratio as a metric to measure inactivity in the RL agent's
network. Empirically, we also recognize that the dormant ratio can act as a
standalone indicator of an agent's activity level, regardless of the received
reward signals. Leveraging the aforementioned insights, we introduce DrM, a
method that uses three core mechanisms to guide agents'
exploration-exploitation trade-offs by actively minimizing the dormant ratio.
Experiments demonstrate that DrM achieves significant improvements in sample
efficiency and asymptotic performance with no broken seeds (76 seeds in total)
across three continuous control benchmark environments, including DeepMind
Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first
model-free algorithm that consistently solves tasks in both the Dog and
Manipulator domains from the DeepMind Control Suite as well as three dexterous
hand manipulation tasks without demonstrations in Adroit, all based on pixel
observations
Border Control-A Membrane-Linked Interactome of Arabidopsis
Cellular membranes act as signaling platforms and control solute transport. Membrane receptors, transporters, and enzymes communicate with intracellular processes through protein-protein interactions. Using a split-ubiquitin yeast two-hybrid screen that covers a test-space of 6.4 × 106 pairs, we identified 12,102 membrane/signaling protein interactions from Arabidopsis. Besides confirmation of expected interactions such as heterotrimeric G protein subunit interactions and aquaporin oligomerization, >99% of the interactions were previously unknown. Interactions were confirmed at a rate of 32% in orthogonal in planta split-green flourescent protein interaction assays, which was statistically indistinguishable from the confirmation rate for known interactions collected from literature (38%). Regulatory associations in membrane protein trafficking, turnover, and phosphorylation include regulation of potassium channel activity through abscisic acid signaling, transporter activity by a WNK kinase, and a brassinolide receptor kinase by trafficking-related proteins. These examples underscore the utility of the membrane/signaling protein interaction network for gene discovery and hypothesis generation in plants and other organisms.