3,071 research outputs found
Policy Optimization with Model-based Explorations
Model-free reinforcement learning methods such as the Proximal Policy
Optimization algorithm (PPO) have successfully applied in complex
decision-making problems such as Atari games. However, these methods suffer
from high variances and high sample complexity. On the other hand, model-based
reinforcement learning methods that learn the transition dynamics are more
sample efficient, but they often suffer from the bias of the transition
estimation. How to make use of both model-based and model-free learning is a
central problem in reinforcement learning. In this paper, we present a new
technique to address the trade-off between exploration and exploitation, which
regards the difference between model-free and model-based estimations as a
measure of exploration value. We apply this new technique to the PPO algorithm
and arrive at a new policy optimization method, named Policy Optimization with
Model-based Explorations (POME). POME uses two components to predict the
actions' target values: a model-free one estimated by Monte-Carlo sampling and
a model-based one which learns a transition model and predicts the value of the
next state. POME adds the error of these two target estimations as the
additional exploration value for each state-action pair, i.e, encourages the
algorithm to explore the states with larger target errors which are hard to
estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME
outperforms PPO on 33 games out of 49 games.Comment: Accepted at AAAI-1
Standard metabolic rate predicts growth trajectory of juvenile Chinese crucian carp (Carassius auratus) under changing food availability
Phenotypic traits vary greatly within populations and can have a significant influence on aspects of performance. The present study aimed to investigate the effects of individual variation in standard metabolic rate (SMR) on growth rate and tolerance to food-deprivation in juvenile crucian carp (Carassius auratus) under varying levels of food availability. To address this issue, 19 high and 16 low SMR (individuals were randomly assigned to a satiation diet for 3 weeks, whereas another 20 high and 16 low SMR individuals were assigned to a restricted diet (approximately 50% of satiation) for the same period. Then, all fish were completely food-deprived for another 3 weeks. High SMR individuals showed a higher growth rate when fed to satiation, but this advantage of SMR did not exist in food-restricted fish. This result was related to improved feeding efficiency with decreased food intake in low SMR individuals, due to their low food processing capacity and maintenance costs. High SMR individuals experienced more mass loss during food-deprivation as compared to low SMR individuals. Our results here illustrate context-dependent costs and benefits of intraspecific variation in SMR whereby high SMR individuals show increased growth performance under high food availability but had a cost under stressful environments (i.e., food shortage)
Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning
Applying reinforcement learning in physical-world tasks is extremely
challenging. It is commonly infeasible to sample a large number of trials, as
required by current reinforcement learning methods, in a physical environment.
This paper reports our project on using reinforcement learning for better
commodity search in Taobao, one of the largest online retail platforms and
meanwhile a physical environment with a high sampling cost. Instead of training
reinforcement learning in Taobao directly, we present our approach: first we
build Virtual Taobao, a simulator learned from historical customer behavior
data through the proposed GAN-SD (GAN for Simulating Distributions) and MAIL
(multi-agent adversarial imitation learning), and then we train policies in
Virtual Taobao with no physical costs in which ANC (Action Norm Constraint)
strategy is proposed to reduce over-fitting. In experiments, Virtual Taobao is
trained from hundreds of millions of customers' records, and its properties are
compared with the real environment. The results disclose that Virtual Taobao
faithfully recovers important properties of the real environment. We also show
that the policies trained in Virtual Taobao can have significantly superior
online performance to the traditional supervised approaches. We hope our work
could shed some light on reinforcement learning applications in complex
physical environments
Artesunate potentiates antibiotics by inactivating heme-harbouring bacterial nitric oxide synthase and catalase
<p>Abstract</p> <p>Background</p> <p>A current challenge of coping with bacterial infection is that bacterial pathogens are becoming less susceptible to or more tolerant of commonly used antibiotics. It is urgent to work out a practical solution to combat the multidrug resistant bacterial pathogens.</p> <p>Findings</p> <p>Oxidative stress-acclimatized bacteria thrive in rifampicin by generating antibiotic-detoxifying nitric oxide (NO), which can be repressed by artesunate or an inhibitor of nitric oxide synthase (NOS). Suppressed bacterial proliferation correlates with mitigated NO production upon the combined treatment of bacteria by artesunate with antibiotics. Detection of the heme-artesunate conjugate and accordingly declined activities of heme-harbouring bacterial NOS and catalase indicates that artesunate renders bacteria susceptible to antibiotics by alkylating the prosthetic heme group of hemo-enzymes.</p> <p>Conclusions</p> <p>By compromising NO-mediated protection from antibiotics and triggering harmful hydrogen peroxide burst, artesunate may serve as a promising antibiotic synergist for killing the multidrug resistant pathogenic bacteria.</p
Prevalence of sexual harassment of nurses and nursing students in China: A meta-analysis of observational studies
Sexual harassment experienced by nurses and nursing students is common and significantly associated with negative consequences. This study is a meta-analysis of the pooled prevalence of sexual harassment of nurses and nursing students in China. Electronic databases (PubMed, EMBASE, PsycINFO, Web of Science and Ovid, China National Knowledge Internet, WanFang, SinoMed and Chinese VIP Information) were independently and systematically searched by two reviewers from their commencement date to 12 March 2018. Forty-one studies that reported the prevalence of sexual harassment were analyzed using the random-effects model. The pooled prevalence of sexual harassment was 7.5% (95% CI: 5.5%-10.1%), with 7.5% (5.5%-10.2%) in nurses and 7.2% (3.0%-16.2%) in nursing students. Subgroup analyses showed that the year of survey and sample size were significantly associated with the prevalence of sexual harassment, but not the seniority of nursing staff, department, hospital, economic region, timeframe, age, working experience or subtypes of harassment. In China, sexual harassment was found to be common in nurses and nursing students. Considering the significant negative impact of sexual harassment, effective preventive and workplace measures should be developed
- …