3,091 research outputs found

    Strong Optical and UV Intermediate-Width Emission Lines in the Quasar SDSS J232444.80-094600.3: Dust-Free and Intermediate-Density Gas at the Skin of Dusty Torus ?

    Full text link
    Emission lines from the broad emission line region (BELR) and the narrow emission line region (NELR) of active galactic nuclei (AGNs) are extensively studied. However, between these two regions emission lines are rarely detected. We present a detailed analysis of a quasar SDSS J232444.80-094600.3 (SDSS J2324−-0946), which is remarkable for its strong intermediate-width emission lines (IELs) with FWHM ≈\approx 1800 \kmps. The IEL component is presented in different emission lines, including the permitted lines \lya\ λ\lambda1216, \civ\ λ\lambda1549, semiforbidden line \ciii\ λ\lambda1909, and forbidden lines \oiii\ λλ\lambda\lambda4959, 5007. With the aid of photo-ionization models, we found that the IELs are produced by gas with a hydrogen density of nH∼106.2−106.3 cm−3n_{\rm H} \sim 10^{6.2}-10^{6.3}~\rm cm^{-3}, a distance to the central ionizing source of R∼35−50R \sim 35-50 pc, a covering factor of CF ∼\sim 6\%, and a dust-to-gas ratio of ≤4%\leq 4\% times of SMC. We suggest that the strong IELs of this quasar are produced by nearly dust-free and intermediate-density gas located at the skin of the dusty torus. Such strong IELs, served as a useful diagnose, can provide an avenue to study the properties of gas between the BELR and the NELR

    Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

    Full text link
    A crucial task in decision-making problems is reward engineering. It is common in practice that no obvious choice of reward function exists. Thus, a popular approach is to introduce human feedback during training and leverage such feedback to learn a reward function. Among all policy learning methods that use human feedback, preference-based methods have demonstrated substantial success in recent empirical applications such as InstructGPT. In this work, we develop a theory that provably shows the benefits of preference-based methods in offline contextual bandits. In particular, we improve the modeling and suboptimality analysis for running policy learning methods on human-scored samples directly. Then, we compare it with the suboptimality guarantees of preference-based methods and show that preference-based methods enjoy lower suboptimality
    • …
    corecore