Search CORE

3,091 research outputs found

Strong Optical and UV Intermediate-Width Emission Lines in the Quasar SDSS J232444.80-094600.3: Dust-Free and Intermediate-Density Gas at the Skin of Dusty Torus ?

Author: Hao Lei
Ji Tuo
Li Zhenzhen
Liu Bo
Wang Shufen
Zhou Hongyan
Publication venue: 'IOP Publishing'
Publication date: 23/05/2016
Field of study

Emission lines from the broad emission line region (BELR) and the narrow emission line region (NELR) of active galactic nuclei (AGNs) are extensively studied. However, between these two regions emission lines are rarely detected. We present a detailed analysis of a quasar SDSS J232444.80-094600.3 (SDSS J2324

-

0946), which is remarkable for its strong intermediate-width emission lines (IELs) with FWHM

\approx

1800 \kmps. The IEL component is presented in different emission lines, including the permitted lines \lya\

\lambda

1216, \civ\

\lambda

1549, semiforbidden line \ciii\

\lambda

1909, and forbidden lines \oiii\

\lambda\lambda

4959, 5007. With the aid of photo-ionization models, we found that the IELs are produced by gas with a hydrogen density of

n_{\rm H} \sim 10^{6.2}-10^{6.3}~\rm cm^{-3}

, a distance to the central ionizing source of

R \sim 35-50

pc, a covering factor of CF

\sim

6\%, and a dust-to-gas ratio of

\leq 4\%

times of SMC. We suggest that the strong IELs of this quasar are produced by nearly dust-free and intermediate-density gas located at the skin of the dusty torus. Such strong IELs, served as a useful diagnose, can provide an avenue to study the properties of gas between the BELR and the NELR

arXiv.org e-Print Archive

Shanghai Astronomical Observatory,Chinese Academy of Sciences

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

Author: Chen Minshuo
Ji Xiang
Wang Huazheng
Wang Mengdi
Zhao Tuo
Publication venue
Publication date: 24/07/2023
Field of study

A crucial task in decision-making problems is reward engineering. It is common in practice that no obvious choice of reward function exists. Thus, a popular approach is to introduce human feedback during training and leverage such feedback to learn a reward function. Among all policy learning methods that use human feedback, preference-based methods have demonstrated substantial success in recent empirical applications such as InstructGPT. In this work, we develop a theory that provably shows the benefits of preference-based methods in offline contextual bandits. In particular, we improve the modeling and suboptimality analysis for running policy learning methods on human-scored samples directly. Then, we compare it with the suboptimality guarantees of preference-based methods and show that preference-based methods enjoy lower suboptimality

arXiv.org e-Print Archive