87 research outputs found
Guarded Policy Optimization with Imperfect Online Demonstrations
The Teacher-Student Framework (TSF) is a reinforcement learning setting where
a teacher agent guards the training of a student agent by intervening and
providing online demonstrations. Assuming optimal, the teacher policy has the
perfect timing and capability to intervene in the learning process of the
student agent, providing safety guarantee and exploration guidance.
Nevertheless, in many real-world settings it is expensive or even impossible to
obtain a well-performing teacher policy. In this work, we relax the assumption
of a well-performing teacher and develop a new method that can incorporate
arbitrary teacher policies with modest or inferior performance. We instantiate
an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared
Control (TS2C), which incorporates teacher intervention based on
trajectory-based value estimation. Theoretical analysis validates that the
proposed TS2C algorithm attains efficient exploration and substantial safety
guarantee without being affected by the teacher's own performance. Experiments
on various continuous control tasks show that our method can exploit teacher
policies at different performance levels while maintaining a low training cost.
Moreover, the student policy surpasses the imperfect teacher policy in terms of
higher accumulated reward in held-out testing environments. Code is available
at https://metadriverse.github.io/TS2C.Comment: Accepted at ICLR 2023 (top 25%
State Regularized Policy Optimization on Data with Dynamics Shift
In many real-world scenarios, Reinforcement Learning (RL) algorithms are
trained on data with dynamics shift, i.e., with different underlying
environment dynamics. A majority of current methods address such issue by
training context encoders to identify environment parameters. Data with
dynamics shift are separated according to their environment parameters to train
the corresponding policy. However, these methods can be sample inefficient as
data are used \textit{ad hoc}, and policies trained for one dynamics cannot
benefit from data collected in all other environments with different dynamics.
In this paper, we find that in many environments with similar structures and
different dynamics, optimal policies have similar stationary state
distributions. We exploit such property and learn the stationary state
distribution from data with dynamics shift for efficient data reuse. Such
distribution is used to regularize the policy trained in a new environment,
leading to the SRPO (\textbf{S}tate \textbf{R}egularized \textbf{P}olicy
\textbf{O}ptimization) algorithm. To conduct theoretical analyses, the
intuition of similar environment structures is characterized by the notion of
homomorphous MDPs. We then demonstrate a lower-bound performance guarantee on
policies regularized by the stationary state distribution. In practice, SRPO
can be an add-on module to context-based algorithms in both online and offline
RL settings. Experimental results show that SRPO can make several context-based
algorithms far more data efficient and significantly improve their overall
performance.Comment: Preprint. Under Revie
PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement
Current advances in recommender systems have been remarkably successful in
optimizing immediate engagement. However, long-term user engagement, a more
desirable performance metric, remains difficult to improve. Meanwhile, recent
reinforcement learning (RL) algorithms have shown their effectiveness in a
variety of long-term goal optimization tasks. For this reason, RL is widely
considered as a promising framework for optimizing long-term user engagement in
recommendation. Though promising, the application of RL heavily relies on
well-designed rewards, but designing rewards related to long-term user
engagement is quite difficult. To mitigate the problem, we propose a novel
paradigm, recommender systems with human preferences (or Preference-based
Recommender systems), which allows RL recommender systems to learn from
preferences about users historical behaviors rather than explicitly defined
rewards. Such preferences are easily accessible through techniques such as
crowdsourcing, as they do not require any expert knowledge. With PrefRec, we
can fully exploit the advantages of RL in optimizing long-term goals, while
avoiding complex reward engineering. PrefRec uses the preferences to
automatically train a reward function in an end-to-end manner. The reward
function is then used to generate learning signals to train the recommendation
policy. Furthermore, we design an effective optimization method for PrefRec,
which uses an additional value function, expectile regression and reward model
pre-training to improve the performance. We conduct experiments on a variety of
long-term user engagement optimization tasks. The results show that PrefRec
significantly outperforms previous state-of-the-art methods in all the tasks
Could a Kilonova Kill: a Threat Assessment
Binary neutron star mergers (BNS) produce high-energy emissions from several
physically different sources, including a gamma-ray burst (GRB) and its
afterglow, a kilonova, and, at late times, a remnant many parsecs in size.
Ionizing radiation from these sources can be dangerous for life on Earth-like
planets when located too close. Work to date has explored the substantial
danger posed by the GRB to on-axis observers: here we focus instead on the
potential threats posed to nearby off-axis observers. Our analysis is based
largely on observations of the GW 170817/GRB 170817A multi-messenger event, as
well as theoretical predictions. For baseline kilonova parameters, we find that
the X-ray emission from the afterglow may be lethal out to pc and the
off-axis gamma-ray emission may threaten a range out to pc, whereas
the greatest threat comes years after the explosion, from the cosmic rays
accelerated by the kilonova blast, which can be lethal out to distances up to
pc. The distances quoted here are typical, but the values have
significant uncertainties and depend on the viewing angle, ejected mass, and
explosion energy in ways we quantify. Assessing the overall threat to
Earth-like planets, have a similar kill distance to supernovae, but are far
less common. However, our results rely on the scant available kilonova data,
and multi-messenger observations will clarify the danger posed by such events.Comment: 21 pages, 5 figures. Comments welcom
Thallium-208: a beacon of in situ neutron capture nucleosynthesis
We demonstrate that the well-known 2.6 MeV gamma-ray emission line from
thallium-208 could serve as a real-time indicator of astrophysical heavy
element production, with both rapid (r) and intermediate (i) neutron capture
processes capable of its synthesis. We consider the r process in a Galactic
neutron star merger and show Tl-208 to be detectable from ~12 hours to ~10
days, and again ~1-20 years post-event. Detection of Tl-208 represents the only
identified prospect for a direct signal of lead production (implying gold
synthesis), arguing for the importance of future MeV telescope missions which
aim to detect Galactic events but may also be able to reach some nearby
galaxies in the Local Group.Comment: accepted to PR
Proposed Lunar Measurements of -Process Radioisotopes to Distinguish Origin of Deep-sea 244Pu
244Pu has recently been discovered in deep-sea deposits spanning the past 10
Myr, a period that includes two 60Fe pulses from nearby supernovae. 244Pu is
among the heaviest -process products, and we consider whether it was created
in the supernovae, which is disfavored by nucleosynthesis simulations, or in an
earlier kilonova event that seeded 244Pu in the nearby interstellar medium that
was subsequently swept up by the supernova debris. We discuss how these
possibilities can be probed by measuring 244Pu and other -process
radioisotopes such as 129I and 182Hf, both in lunar regolith samples returned
to Earth by missions such as Chang'e and Artemis, and in deep-sea deposits.Comment: Extensive rewrite of v1 with added emphasis of lunar sample return
missions, including Artemis and Chang'e. 11 pages, 4 figures, 2 tabl
Transcriptome analysis reveals salt-stress-regulated biological processes and key pathways in roots of cotton (Gossypium hirsutum L.)
AbstractHigh salinity is one of the main factors limiting cotton growth and productivity. The genes that regulate salt stress in TM-1 upland cotton were monitored using microarray and real-time PCR (RT-PCR) with samples taken from roots. Microarray analysis showed that 1503 probe sets were up-regulated and 1490 probe sets were down-regulated in plants exposed for 3h to 100mM NaCl, and RT-PCR analysis validated 42 relevant/related genes. The distribution of enriched gene ontology terms showed such important processes as the response to water stress and pathways of hormone metabolism and signal transduction were induced by the NaCl treatment. Some key regulatory gene families involved in abiotic and biotic sources of stress such as WRKY, ERF, and JAZ were differentially expressed. Our transcriptome analysis might provide some useful insights into salt-mediated signal transduction pathways in cotton and offer a number of candidate genes as potential markers of tolerance to salt stress
Gene Expression Profiles Deciphering Rice Phenotypic Variation between Nipponbare (Japonica) and 93-11 (Indica) during Oxidative Stress
Rice is a very important food staple that feeds more than half the world's population. Two major Asian cultivated rice (Oryza sativa L.) subspecies, japonica and indica, show significant phenotypic variation in their stress responses. However, the molecular mechanisms underlying this phenotypic variation are still largely unknown. A common link among different stresses is that they produce an oxidative burst and result in an increase of reactive oxygen species (ROS). In this study, methyl viologen (MV) as a ROS agent was applied to investigate the rice oxidative stress response. We observed that 93-11 (indica) seedlings exhibited leaf senescence with severe lesions under MV treatment compared to Nipponbare (japonica). Whole-genome microarray experiments were conducted, and 1,062 probe sets were identified with gene expression level polymorphisms between the two rice cultivars in addition to differential expression under MV treatment, which were assigned as Core Intersectional Probesets (CIPs). These CIPs were analyzed by gene ontology (GO) and highlighted with enrichment GO terms related to toxin and oxidative stress responses as well as other responses. These GO term-enriched genes of the CIPs include glutathine S-transferases (GSTs), P450, plant defense genes, and secondary metabolism related genes such as chalcone synthase (CHS). Further insertion/deletion (InDel) and regulatory element analyses for these identified CIPs suggested that there may be some eQTL hotspots related to oxidative stress in the rice genome, such as GST genes encoded on chromosome 10. In addition, we identified a group of marker genes individuating the japonica and indica subspecies. In summary, we developed a new strategy combining biological experiments and data mining to study the possible molecular mechanism of phenotypic variation during oxidative stress between Nipponbare and 93-11. This study will aid in the analysis of the molecular basis of quantitative traits
Recommended from our members
A Proposed High-Power UV Industrial Demonstration Laser at CEBAF
The Laser Processing Consortium, a collaboration of industries, universities, and the Continuous Electron Beam Accelerator Facility (CEBAF) in Newport News, Virginia, has proposed building a demonstration industrial processing laser for surface treatment and micro-machining. The laser is a free-electron laser (FEL) with average power output exceeding 1 kW in the ultraviolet (UV). The design calls for a novel driver accelerator that recovers most of the energy of the exhaust electron beam to produce laser light with good wall-plug efficiency. The laser and accelerator design use technologies that are scalable to much higher power. The authors will describe the critical design issues in the laser such as the stability, power handling, and losses of the optical resonator, and the quality, power, and reliability of the electron beam. They will also describe the calculated laser performance. Finally progress to date on accelerator development and resonator modeling will be reported
- β¦