3,091 research outputs found
Strong Optical and UV Intermediate-Width Emission Lines in the Quasar SDSS J232444.80-094600.3: Dust-Free and Intermediate-Density Gas at the Skin of Dusty Torus ?
Emission lines from the broad emission line region (BELR) and the narrow
emission line region (NELR) of active galactic nuclei (AGNs) are extensively
studied. However, between these two regions emission lines are rarely detected.
We present a detailed analysis of a quasar SDSS J232444.80-094600.3 (SDSS
J23240946), which is remarkable for its strong intermediate-width emission
lines (IELs) with FWHM 1800 \kmps. The IEL component is presented in
different emission lines, including the permitted lines \lya\ 1216,
\civ\ 1549, semiforbidden line \ciii\ 1909, and forbidden
lines \oiii\ 4959, 5007. With the aid of photo-ionization
models, we found that the IELs are produced by gas with a hydrogen density of
, a distance to the central
ionizing source of pc, a covering factor of CF 6\%, and a
dust-to-gas ratio of times of SMC. We suggest that the strong IELs
of this quasar are produced by nearly dust-free and intermediate-density gas
located at the skin of the dusty torus. Such strong IELs, served as a useful
diagnose, can provide an avenue to study the properties of gas between the BELR
and the NELR
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
A crucial task in decision-making problems is reward engineering. It is
common in practice that no obvious choice of reward function exists. Thus, a
popular approach is to introduce human feedback during training and leverage
such feedback to learn a reward function. Among all policy learning methods
that use human feedback, preference-based methods have demonstrated substantial
success in recent empirical applications such as InstructGPT. In this work, we
develop a theory that provably shows the benefits of preference-based methods
in offline contextual bandits. In particular, we improve the modeling and
suboptimality analysis for running policy learning methods on human-scored
samples directly. Then, we compare it with the suboptimality guarantees of
preference-based methods and show that preference-based methods enjoy lower
suboptimality
- …