9 research outputs found

    EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

    Full text link
    Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.Comment: 21 pages,10 figure

    Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

    Full text link
    Recently, people have shown that large-scale pre-training from internet-scale data is the key to building generalist models, as witnessed in NLP. To build embodied generalist agents, we and many other researchers hypothesize that such foundation prior is also an indispensable component. However, it is unclear what is the proper concrete form to represent those embodied foundation priors and how they should be used in the downstream task. In this paper, we propose an intuitive and effective set of embodied priors that consist of foundation policy, value, and success reward. The proposed priors are based on the goal-conditioned MDP. To verify their effectiveness, we instantiate an actor-critic method assisted by the priors, called Foundation Actor-Critic (FAC). We name our framework as Foundation Reinforcement Learning (FRL), since it completely relies on embodied foundation priors to explore, learn and reinforce. The benefits of FRL are threefold. (1) Sample efficient. With foundation priors, FAC learns significantly faster than traditional RL. Our evaluation on the Meta-World has proved that FAC can achieve 100% success rates for 7/8 tasks under less than 200k frames, which outperforms the baseline method with careful manual-designed rewards under 1M frames. (2) Robust to noisy priors. Our method tolerates the unavoidable noise in embodied foundation models. We show that FAC works well even under heavy noise or quantization errors. (3) Minimal human intervention: FAC completely learns from the foundation priors, without the need of human-specified dense reward, or providing teleoperated demos. Thus, FAC can be easily scaled up. We believe our FRL framework could enable the future robot to autonomously explore and learn without human intervention in the physical world. In summary, our proposed FRL is a novel and powerful learning paradigm, towards achieving embodied generalist agents

    Real-time scheduling of renewable power systems through planning-based reinforcement learning

    Full text link
    The growing renewable energy sources have posed significant challenges to traditional power scheduling. It is difficult for operators to obtain accurate day-ahead forecasts of renewable generation, thereby requiring the future scheduling system to make real-time scheduling decisions aligning with ultra-short-term forecasts. Restricted by the computation speed, traditional optimization-based methods can not solve this problem. Recent developments in reinforcement learning (RL) have demonstrated the potential to solve this challenge. However, the existing RL methods are inadequate in terms of constraint complexity, algorithm performance, and environment fidelity. We are the first to propose a systematic solution based on the state-of-the-art reinforcement learning algorithm and the real power grid environment. The proposed approach enables planning and finer time resolution adjustments of power generators, including unit commitment and economic dispatch, thus increasing the grid's ability to admit more renewable energy. The well-trained scheduling agent significantly reduces renewable curtailment and load shedding, which are issues arising from traditional scheduling's reliance on inaccurate day-ahead forecasts. High-frequency control decisions exploit the existing units' flexibility, reducing the power grid's dependence on hardware transformations and saving investment and operating costs, as demonstrated in experimental results. This research exhibits the potential of reinforcement learning in promoting low-carbon and intelligent power systems and represents a solid step toward sustainable electricity generation.Comment: 12 pages, 7 figure

    Transferable Attention for Domain Adaptation

    No full text
    Recent work in domain adaptation bridges different domains by adversarially learning a domain-invariant representation that cannot be distinguished by a domain discriminator. Existing methods of adversarial domain adaptation mainly align the global images across the source and target domains. However, it is obvious that not all regions of an image are transferable, while forcefully aligning the untransferable regions may lead to negative transfer. Furthermore, some of the images are significantly dissimilar across domains, resulting in weak image-level transferability. To this end, we present Transferable Attention for Domain Adaptation (TADA), focusing our adaptation model on transferable regions or images. We implement two types of complementary transferable attention: transferable local attention generated by multiple region-level domain discriminators to highlight transferable regions, and transferable global attention generated by single image-level domain discriminator to highlight transferable images. Extensive experiments validate that our proposed models exceed state of the art results on standard domain adaptation datasets

    Simultaneous Learning of Pivots and Representations for Cross-Domain Sentiment Classification

    No full text
    Cross-domain sentiment classification aims to leverage useful knowledge from a source domain to mitigate the supervision sparsity in a target domain. A series of approaches depend on the pivot features that behave similarly for polarity prediction in both domains. However, the engineering of such pivot features remains cumbersome and prevents us from learning the disentangled and transferable representations from rich semantic and syntactic information. Towards learning the pivots and representations simultaneously, we propose a new Transferable Pivot Transformer (TPT). Our model consists of two networks: a Pivot Selector that learns to detect transferable n-gram pivots from contexts, and a Transferable Transformer that learns to generate domain-invariant representations by modeling the correlation between pivot and non-pivot words. The Pivot Selector and Transferable Transformer are jointly optimized through end-to-end back-propagation. We experiment with real tasks of cross-domain sentiment classification over 20 domain pairs where our model outperforms prior arts

    Fast Iterative Shrinkage-Thresholding Algorithm with Continuation for Brain Injury Monitoring Imaging Based on Electrical Impedance Tomography

    No full text
    Electrical impedance tomography (EIT) is low-cost and noninvasive and has the potential for real-time imaging and bedside monitoring of brain injury. However, brain injury monitoring by EIT imaging suffers from image noise (IN) and resolution problems, causing blurred reconstructions. To address these problems, a least absolute shrinkage and selection operator model is built, and a fast iterative shrinkage-thresholding algorithm with continuation (FISTA-C) is proposed. Results of numerical simulations and head phantom experiments indicate that FISTA-C reduces IN by 63.2%, 47.2%, and 29.9% and 54.4%, 44.7%, and 22.7%, respectively, when compared with the damped least-squares algorithm, the split Bergman, and the FISTA algorithms. When the signal-to-noise ratio of the measurements is 80–50 dB, FISTA-C can reduce IN by 83.3%, 72.3%, and 68.7% on average when compared with the three algorithms, respectively. Both simulation and phantom experiments suggest that FISTA-C produces the best image resolution and can identify the two closest targets. Moreover, FISTA-C is more practical for clinical application because it does not require excessive parameter adjustments. This technology can provide better reconstruction performance and significantly outperforms the traditional algorithms in terms of IN and resolution and is expected to offer a general algorithm for brain injury monitoring imaging via EIT

    Early detection of acute ischemic stroke using Contrast-enhanced electrical impedance tomography perfusion

    No full text
    A cerebral contrast-enhanced electrical impedance tomography perfusion method is developed for acute ischemic stroke during intravenous thrombolytic therapy. Several clinical contrast agents with stable impedance characteristics and high-conductivity contrast were screened experimentally as electrical impedance contrast agent candidates. The electrical impedance tomography perfusion method was tested on rabbits with focal cerebral infarction, and its capability for early detection was verified based on perfusion images. The experimental results showed that ioversol 350 performed significantly better as an electrical impedance contrast agent than other contrast agents (p < 0.01). Additionally, perfusion images of focal cerebral infarction in rabbits confirmed that the electrical impedance tomography perfusion method could accurately detect the location and area of different cerebral infarction lesions (p < 0.001). Therefore, the cerebral contrast-enhanced electrical impedance tomography perfusion method proposed herein combines traditional, dynamic continuous imaging with rapid detection and could be applied as an early, rapid-detection, auxiliary, bedside imaging method for patients after a suspected ischemic stroke in both prehospital and in-hospital settings

    Systematic genome editing of the genes on zebrafish Chromosome 1 by CRISPR/Cas9

    No full text
    Genome editing by the well-established CRISPR/Cas9 technology has greatly facilitated our understanding of many biological processes. However, a complete whole-genome knockout for any species or model organism has rarely been achieved. Here, we performed a systematic knockout of all the genes (1333) on Chromosome 1 in zebrafish, successfully mutated 1029 genes, and generated 1039 germline-transmissible alleles corresponding to 636 genes. Meanwhile, by high-throughput bioinformatics analysis, we found that sequence features play pivotal roles in effective gRNA targeting at specific genes of interest, while the success rate of gene targeting positively correlates with GC content of the target sites. Moreover, we found that nearly one-fourth of all mutants are related to human diseases, and several representative CRISPR/Cas9-generated mutants are described here. Furthermore, we tried to identify the underlying mechanisms leading to distinct phenotypes between genetic mutants and antisense morpholino-mediated knockdown embryos. Altogether, this work has generated the first chromosome-wide collection of zebrafish genetic mutants by the CRISPR/Cas9 technology, which will serve as a valuable resource for the community, and our bioinformatics analysis also provides some useful guidance to design gene-specific gRNAs for successful gene editing
    corecore