3,071 research outputs found

    Policy Optimization with Model-based Explorations

    Full text link
    Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions' target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.Comment: Accepted at AAAI-1

    Standard metabolic rate predicts growth trajectory of juvenile Chinese crucian carp (Carassius auratus) under changing food availability

    Get PDF
    Phenotypic traits vary greatly within populations and can have a significant influence on aspects of performance. The present study aimed to investigate the effects of individual variation in standard metabolic rate (SMR) on growth rate and tolerance to food-deprivation in juvenile crucian carp (Carassius auratus) under varying levels of food availability. To address this issue, 19 high and 16 low SMR (individuals were randomly assigned to a satiation diet for 3 weeks, whereas another 20 high and 16 low SMR individuals were assigned to a restricted diet (approximately 50% of satiation) for the same period. Then, all fish were completely food-deprived for another 3 weeks. High SMR individuals showed a higher growth rate when fed to satiation, but this advantage of SMR did not exist in food-restricted fish. This result was related to improved feeding efficiency with decreased food intake in low SMR individuals, due to their low food processing capacity and maintenance costs. High SMR individuals experienced more mass loss during food-deprivation as compared to low SMR individuals. Our results here illustrate context-dependent costs and benefits of intraspecific variation in SMR whereby high SMR individuals show increased growth performance under high food availability but had a cost under stressful environments (i.e., food shortage)

    Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning

    Full text link
    Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our approach: first we build Virtual Taobao, a simulator learned from historical customer behavior data through the proposed GAN-SD (GAN for Simulating Distributions) and MAIL (multi-agent adversarial imitation learning), and then we train policies in Virtual Taobao with no physical costs in which ANC (Action Norm Constraint) strategy is proposed to reduce over-fitting. In experiments, Virtual Taobao is trained from hundreds of millions of customers' records, and its properties are compared with the real environment. The results disclose that Virtual Taobao faithfully recovers important properties of the real environment. We also show that the policies trained in Virtual Taobao can have significantly superior online performance to the traditional supervised approaches. We hope our work could shed some light on reinforcement learning applications in complex physical environments

    Artesunate potentiates antibiotics by inactivating heme-harbouring bacterial nitric oxide synthase and catalase

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A current challenge of coping with bacterial infection is that bacterial pathogens are becoming less susceptible to or more tolerant of commonly used antibiotics. It is urgent to work out a practical solution to combat the multidrug resistant bacterial pathogens.</p> <p>Findings</p> <p>Oxidative stress-acclimatized bacteria thrive in rifampicin by generating antibiotic-detoxifying nitric oxide (NO), which can be repressed by artesunate or an inhibitor of nitric oxide synthase (NOS). Suppressed bacterial proliferation correlates with mitigated NO production upon the combined treatment of bacteria by artesunate with antibiotics. Detection of the heme-artesunate conjugate and accordingly declined activities of heme-harbouring bacterial NOS and catalase indicates that artesunate renders bacteria susceptible to antibiotics by alkylating the prosthetic heme group of hemo-enzymes.</p> <p>Conclusions</p> <p>By compromising NO-mediated protection from antibiotics and triggering harmful hydrogen peroxide burst, artesunate may serve as a promising antibiotic synergist for killing the multidrug resistant pathogenic bacteria.</p

    Prevalence of sexual harassment of nurses and nursing students in China: A meta-analysis of observational studies

    Get PDF
    Sexual harassment experienced by nurses and nursing students is common and significantly associated with negative consequences. This study is a meta-analysis of the pooled prevalence of sexual harassment of nurses and nursing students in China. Electronic databases (PubMed, EMBASE, PsycINFO, Web of Science and Ovid, China National Knowledge Internet, WanFang, SinoMed and Chinese VIP Information) were independently and systematically searched by two reviewers from their commencement date to 12 March 2018. Forty-one studies that reported the prevalence of sexual harassment were analyzed using the random-effects model. The pooled prevalence of sexual harassment was 7.5% (95% CI: 5.5%-10.1%), with 7.5% (5.5%-10.2%) in nurses and 7.2% (3.0%-16.2%) in nursing students. Subgroup analyses showed that the year of survey and sample size were significantly associated with the prevalence of sexual harassment, but not the seniority of nursing staff, department, hospital, economic region, timeframe, age, working experience or subtypes of harassment. In China, sexual harassment was found to be common in nurses and nursing students. Considering the significant negative impact of sexual harassment, effective preventive and workplace measures should be developed
    corecore