Search CORE

87 research outputs found

Guarded Policy Optimization with Imperfect Online Demonstrations

Author: Li Quanyi
Liu Zhihan
Peng Zhenghao
Xue Zhenghai
Zhou Bolei
Publication venue
Publication date: 23/04/2023
Field of study

The Teacher-Student Framework (TSF) is a reinforcement learning setting where a teacher agent guards the training of a student agent by intervening and providing online demonstrations. Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance. Nevertheless, in many real-world settings it is expensive or even impossible to obtain a well-performing teacher policy. In this work, we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. We instantiate an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared Control (TS2C), which incorporates teacher intervention based on trajectory-based value estimation. Theoretical analysis validates that the proposed TS2C algorithm attains efficient exploration and substantial safety guarantee without being affected by the teacher's own performance. Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost. Moreover, the student policy surpasses the imperfect teacher policy in terms of higher accumulated reward in held-out testing environments. Code is available at https://metadriverse.github.io/TS2C.Comment: Accepted at ICLR 2023 (top 25%

arXiv.org e-Print Archive

State Regularized Policy Optimization on Data with Dynamics Shift

Author: An Bo
Cai Qingpeng
Gai Kun
Jiang Peng
Liu Shuchang
Xue Zhenghai
Zheng Dong
Publication venue
Publication date: 06/06/2023
Field of study

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context encoders to identify environment parameters. Data with dynamics shift are separated according to their environment parameters to train the corresponding policy. However, these methods can be sample inefficient as data are used \textit{ad hoc}, and policies trained for one dynamics cannot benefit from data collected in all other environments with different dynamics. In this paper, we find that in many environments with similar structures and different dynamics, optimal policies have similar stationary state distributions. We exploit such property and learn the stationary state distribution from data with dynamics shift for efficient data reuse. Such distribution is used to regularize the policy trained in a new environment, leading to the SRPO (\textbf{S}tate \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization) algorithm. To conduct theoretical analyses, the intuition of similar environment structures is characterized by the notion of homomorphous MDPs. We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings. Experimental results show that SRPO can make several context-based algorithms far more data efficient and significantly improve their overall performance.Comment: Preprint. Under Revie

arXiv.org e-Print Archive

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

Author: An Bo
Cai Qingpeng
Gai Kun
Jiang Peng
Liu Shuchang
Sun Shuo
Xue Wanqi
Xue Zhenghai
Zheng Dong
Publication venue
Publication date: 02/06/2023
Field of study

Current advances in recommender systems have been remarkably successful in optimizing immediate engagement. However, long-term user engagement, a more desirable performance metric, remains difficult to improve. Meanwhile, recent reinforcement learning (RL) algorithms have shown their effectiveness in a variety of long-term goal optimization tasks. For this reason, RL is widely considered as a promising framework for optimizing long-term user engagement in recommendation. Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult. To mitigate the problem, we propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems), which allows RL recommender systems to learn from preferences about users historical behaviors rather than explicitly defined rewards. Such preferences are easily accessible through techniques such as crowdsourcing, as they do not require any expert knowledge. With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering. PrefRec uses the preferences to automatically train a reward function in an end-to-end manner. The reward function is then used to generate learning signals to train the recommendation policy. Furthermore, we design an effective optimization method for PrefRec, which uses an additional value function, expectile regression and reward model pre-training to improve the performance. We conduct experiments on a variety of long-term user engagement optimization tasks. The results show that PrefRec significantly outperforms previous state-of-the-art methods in all the tasks

arXiv.org e-Print Archive

Could a Kilonova Kill: a Threat Assessment

Author: Ellis John
Fields Brian D.
Hartmann Dieter H.
Liu Zhenghai
McLaughlin Gail C.
Perkins Haille M. L.
Surman Rebecca
Wang Xilu
Publication venue
Publication date: 17/10/2023
Field of study

Binary neutron star mergers (BNS) produce high-energy emissions from several physically different sources, including a gamma-ray burst (GRB) and its afterglow, a kilonova, and, at late times, a remnant many parsecs in size. Ionizing radiation from these sources can be dangerous for life on Earth-like planets when located too close. Work to date has explored the substantial danger posed by the GRB to on-axis observers: here we focus instead on the potential threats posed to nearby off-axis observers. Our analysis is based largely on observations of the GW 170817/GRB 170817A multi-messenger event, as well as theoretical predictions. For baseline kilonova parameters, we find that the X-ray emission from the afterglow may be lethal out to

\sim 5

pc and the off-axis gamma-ray emission may threaten a range out to

\sim 4

pc, whereas the greatest threat comes years after the explosion, from the cosmic rays accelerated by the kilonova blast, which can be lethal out to distances up to

\sim 11

pc. The distances quoted here are typical, but the values have significant uncertainties and depend on the viewing angle, ejected mass, and explosion energy in ways we quantify. Assessing the overall threat to Earth-like planets, have a similar kill distance to supernovae, but are far less common. However, our results rely on the scant available kilonova data, and multi-messenger observations will clarify the danger posed by such events.Comment: 21 pages, 5 figures. Comments welcom

arXiv.org e-Print Archive

Thallium-208: a beacon of in situ neutron capture nucleosynthesis

Author: Denissenkov Pavel
Herwig Falk
Lariviere Maude
Liu Zhenghai
McLaughlin Gail C.
Mumpower Matthew R.
Sprouse Trevor
Surman Rebecca
Vassh Nicole
Wang Xilu
Publication venue
Publication date: 17/11/2023
Field of study

We demonstrate that the well-known 2.6 MeV gamma-ray emission line from thallium-208 could serve as a real-time indicator of astrophysical heavy element production, with both rapid (r) and intermediate (i) neutron capture processes capable of its synthesis. We consider the r process in a Galactic neutron star merger and show Tl-208 to be detectable from ~12 hours to ~10 days, and again ~1-20 years post-event. Detection of Tl-208 represents the only identified prospect for a direct signal of lead production (implying gold synthesis), arguing for the importance of future MeV telescope missions which aim to detect Galactic events but may also be able to reach some nearby galaxies in the Local Group.Comment: accepted to PR

arXiv.org e-Print Archive

Proposed Lunar Measurements of $r$ -Process Radioisotopes to Distinguish Origin of Deep-sea 244Pu

Author: Clark Adam M.
Ellis John
Ertel Adrienne F.
Fields Brian D.
Fry Brian J.
Liu Zhenghai
Miller Jesse A.
Surman Rebecca
Wang Xilu
Publication venue
Publication date: 30/09/2022
Field of study

244Pu has recently been discovered in deep-sea deposits spanning the past 10 Myr, a period that includes two 60Fe pulses from nearby supernovae. 244Pu is among the heaviest

r

-process products, and we consider whether it was created in the supernovae, which is disfavored by nucleosynthesis simulations, or in an earlier kilonova event that seeded 244Pu in the nearby interstellar medium that was subsequently swept up by the supernova debris. We discuss how these possibilities can be probed by measuring 244Pu and other

r

-process radioisotopes such as 129I and 182Hf, both in lunar regolith samples returned to Earth by missions such as Chang'e and Artemis, and in deep-sea deposits.Comment: Extensive rewrite of v1 with added emphasis of lunar sample return missions, including Artemis and Chang'e. 11 pages, 4 figures, 2 tabl

arXiv.org e-Print Archive

Directory of Open Access Journals

Transcriptome analysis reveals salt-stress-regulated biological processes and key pathways in roots of cotton (Gossypium hirsutum L.)

Author: Li Fuguang
Liu Chuanliang
Su Zhen
Wang Chunchao
Wang Qianhua
Wei Qiang
Yan Hong
Yao Dongxia
Zhang Chaojun
Zhang Xueyan
Zhang Zhenghai
Zhao Xinhua
Publication venue: Elsevier Inc.
Publication date: 31/07/2011
Field of study

AbstractHigh salinity is one of the main factors limiting cotton growth and productivity. The genes that regulate salt stress in TM-1 upland cotton were monitored using microarray and real-time PCR (RT-PCR) with samples taken from roots. Microarray analysis showed that 1503 probe sets were up-regulated and 1490 probe sets were down-regulated in plants exposed for 3h to 100mM NaCl, and RT-PCR analysis validated 42 relevant/related genes. The distribution of enriched gene ontology terms showed such important processes as the response to water stress and pathways of hormone metabolism and signal transduction were induced by the NaCl treatment. Some key regulatory gene families involved in abiotic and biotic sources of stress such as WRKY, ERF, and JAZ were differentially expressed. Our transcriptome analysis might provide some useful insights into salt-mediated signal transduction pathways in cotton and offer a number of candidate genes as potential markers of tolerance to salt stress

Elsevier - Publisher Connector

Gene Expression Profiles Deciphering Rice Phenotypic Variation between Nipponbare (Japonica) and 93-11 (Indica) during Oxidative Stress

Author: Di Chao
Ling Yi
Liu Fengxia
Su Zhen
Sun Chuanqing
Tan Lubin
Tan Yuanjun
Wang Chunchao
Wei Qiang
Xing Zhuo
Xu Wenying
Xue Yongbiao
Yan Hong
Yao Dongxia
Zhang Zhenghai
Publication venue: Public Library of Science
Publication date: 08/01/2010
Field of study

Rice is a very important food staple that feeds more than half the world's population. Two major Asian cultivated rice (Oryza sativa L.) subspecies, japonica and indica, show significant phenotypic variation in their stress responses. However, the molecular mechanisms underlying this phenotypic variation are still largely unknown. A common link among different stresses is that they produce an oxidative burst and result in an increase of reactive oxygen species (ROS). In this study, methyl viologen (MV) as a ROS agent was applied to investigate the rice oxidative stress response. We observed that 93-11 (indica) seedlings exhibited leaf senescence with severe lesions under MV treatment compared to Nipponbare (japonica). Whole-genome microarray experiments were conducted, and 1,062 probe sets were identified with gene expression level polymorphisms between the two rice cultivars in addition to differential expression under MV treatment, which were assigned as Core Intersectional Probesets (CIPs). These CIPs were analyzed by gene ontology (GO) and highlighted with enrichment GO terms related to toxin and oxidative stress responses as well as other responses. These GO term-enriched genes of the CIPs include glutathine S-transferases (GSTs), P450, plant defense genes, and secondary metabolism related genes such as chalcone synthase (CHS). Further insertion/deletion (InDel) and regulatory element analyses for these identified CIPs suggested that there may be some eQTL hotspots related to oxidative stress in the rice genome, such as GST genes encoded on chromosome 10. In addition, we identified a group of marker genes individuating the japonica and indica subspecies. In summary, we developed a new strategy combining biological experiments and data mining to study the possible molecular mechanism of phenotypic variation during oxidative stress between Nipponbare and 93-11. This study will aid in the analysis of the molecular basis of quantitative traits

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Recommended from our members

A Proposed High-Power UV Industrial Demonstration Laser at CEBAF

Author: Benson Stephen V.
Bisognano Joseph J.
Bohn Courtlandt L.
Cardman Larry
Colson W. B.
Davidson Paul
Douglas David
Dylla H. F.
Engwall David
Fugitt Jock
Goldstein John
Jordan Kevin
Kehne David
Li Zhenghai
Liu Hongxiu
Merminga Lia
Neil George R.
Nueffer David
Shinn Michelle
Wiseman Mark
Wong Rober
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/08/1996
Field of study

The Laser Processing Consortium, a collaboration of industries, universities, and the Continuous Electron Beam Accelerator Facility (CEBAF) in Newport News, Virginia, has proposed building a demonstration industrial processing laser for surface treatment and micro-machining. The laser is a free-electron laser (FEL) with average power output exceeding 1 kW in the ultraviolet (UV). The design calls for a novel driver accelerator that recovers most of the energy of the exhaust electron beam to produce laser light with good wall-plug efficiency. The laser and accelerator design use technologies that are scalable to much higher power. The authors will describe the critical design issues in the laser such as the stability, power handling, and losses of the optical resonator, and the quality, power, and reliability of the electron beam. They will also describe the calculated laser performance. Finally progress to date on accelerator development and resonator modeling will be reported

UNT Digital Library