8 research outputs found

    Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

    Full text link
    Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that Q-values are indeed underestimated in the latter stage of the RL training process, primarily related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. We hypothesize that this long-neglected phenomenon potentially hinders policy learning and reduces sample efficiency. Our insight to address this issue is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates Q-value using both historical best-performing actions and the current policy. The instantiations of our method in both model-free and model-based settings outperform state-of-the-art methods in various continuous control tasks and achieve strong performance in failure-prone scenarios and real-world robot tasks

    H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

    Full text link
    Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms

    DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

    Full text link
    Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations

    Border Control-A Membrane-Linked Interactome of Arabidopsis

    No full text
    Cellular membranes act as signaling platforms and control solute transport. Membrane receptors, transporters, and enzymes communicate with intracellular processes through protein-protein interactions. Using a split-ubiquitin yeast two-hybrid screen that covers a test-space of 6.4 × 106 pairs, we identified 12,102 membrane/signaling protein interactions from Arabidopsis. Besides confirmation of expected interactions such as heterotrimeric G protein subunit interactions and aquaporin oligomerization, >99% of the interactions were previously unknown. Interactions were confirmed at a rate of 32% in orthogonal in planta split-green flourescent protein interaction assays, which was statistically indistinguishable from the confirmation rate for known interactions collected from literature (38%). Regulatory associations in membrane protein trafficking, turnover, and phosphorylation include regulation of potassium channel activity through abscisic acid signaling, transporter activity by a WNK kinase, and a brassinolide receptor kinase by trafficking-related proteins. These examples underscore the utility of the membrane/signaling protein interaction network for gene discovery and hypothesis generation in plants and other organisms.
    corecore