4,375 research outputs found
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
This paper introduces SCOPE-RL, a comprehensive open-source Python software
designed for offline reinforcement learning (offline RL), off-policy evaluation
(OPE), and selection (OPS). Unlike most existing libraries that focus solely on
either policy learning or evaluation, SCOPE-RL seamlessly integrates these two
key aspects, facilitating flexible and complete implementations of both offline
RL and OPE processes. SCOPE-RL put particular emphasis on its OPE modules,
offering a range of OPE estimators and robust evaluation-of-OPE protocols. This
approach enables more in-depth and reliable OPE compared to other packages. For
instance, SCOPE-RL enhances OPE by estimating the entire reward distribution
under a policy rather than its mere point-wise expected value. Additionally,
SCOPE-RL provides a more thorough evaluation-of-OPE by presenting the
risk-return tradeoff in OPE results, extending beyond mere accuracy evaluations
in existing OPE literature. SCOPE-RL is designed with user accessibility in
mind. Its user-friendly APIs, comprehensive documentation, and a variety of
easy-to-follow examples assist researchers and practitioners in efficiently
implementing and experimenting with various offline RL methods and OPE
estimators, tailored to their specific problem contexts. The documentation of
SCOPE-RL is available at https://scope-rl.readthedocs.io/en/latest/.Comment: preprint, open-source software:
https://github.com/hakuhodo-technologies/scope-r
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
Off-Policy Evaluation (OPE) aims to assess the effectiveness of
counterfactual policies using only offline logged data and is often used to
identify the top-k promising policies for deployment in online A/B tests.
Existing evaluation metrics for OPE estimators primarily focus on the
"accuracy" of OPE or that of downstream policy selection, neglecting
risk-return tradeoff in the subsequent online policy deployment. To address
this issue, we draw inspiration from portfolio evaluation in finance and
develop a new metric, called SharpeRatio@k, which measures the risk-return
tradeoff of policy portfolios formed by an OPE estimator under varying online
evaluation budgets (k). We validate our metric in two example scenarios,
demonstrating its ability to effectively distinguish between low-risk and
high-risk estimators and to accurately identify the most efficient one.
Efficiency of an estimator is characterized by its capability to form the most
advantageous policy portfolios, maximizing returns while minimizing risks
during online deployment, a nuance that existing metrics typically overlook. To
facilitate a quick, accurate, and consistent evaluation of OPE via
SharpeRatio@k, we have also integrated this metric into an open-source
software, SCOPE-RL (https://github.com/hakuhodo-technologies/scope-rl).
Employing SharpeRatio@k and SCOPE-RL, we conduct comprehensive benchmarking
experiments on various estimators and RL tasks, focusing on their risk-return
tradeoff. These experiments offer several interesting directions and
suggestions for future OPE research.Comment: ICLR202
Precision Crystal Calorimetry in High Energy Physics
Crystal Calorimetry is widely used in high energy physics because of its
precision. Recent development in crystal technology identified two key issues
to reach and maintain crystal precision: light response uniformity and
calibration in situ. Crystal radiation damage is understood. While the damage
in alkali halides is found to be caused by the oxygen/hydroxyl contamination,
it is the structure defects, such as oxygen vacancies, cause damage in oxides.Comment: 8 pages with 13 eps Figures, RevTe
Electronic and Magnetic Phase Diagram of a Superconductor, SmFeAsO1-xFx
A crystallographic and magnetic phase diagram of SmFeAsO1-xFx is determined
as a function of x in terms of temperature based on electrical transport and
magnetization, synchrotron powder x-ray diffraction, 57Fe Mossbauer spectra
(MS), and 149Sm nuclear resonant forward scattering (NRFS) measurements. MS
revealed that the magnetic moments of Fe were aligned antiferromagnetically at
~144 K (TN(Fe)). The magnetic moment of Fe (MFe) is estimated to be 0.34
myuB/Fe at 4.2 K for undoped SmFeAsO; MFe is quenched in superconducting
F-doped SmFeAsO. 149Sm NRFS spectra revealed that the magnetic moments of Sm
start to order antiferromagnetically at 5.6 K (undoped) and 4.4 K (TN(Sm)) (x =
0.069). Results clearly indicate that the antiferromagnetic Sm sublattice
coexists with the superconducting phase in SmFeAsO1-xFx below TN(Sm), while
antiferromagnetic Fe sublattice does not coexist with the superconducting
phase.Comment: Accepted in New Journal of Physic
Searching for realistic 4d string models with a Pati-Salam symmetry -- Orbifold grand unified theories from heterotic string compactification on a Z6 orbifold
Motivated by orbifold grand unified theories, we construct a class of
three-family Pati-Salam models in a Z6 abelian symmetric orbifold with two
discrete Wilson lines. These models have marked differences from
previously-constructed three-family models in prime-order orbifolds. In the
limit where one of the six compactified dimensions (which lies in a Z2
sub-orbifold) is large compared to the string length scale, our models
reproduce the supersymmetry and gauge symmetry breaking pattern of 5d orbifold
grand unified theories on an S1/Z2 orbicircle. We find a horizontal 2+1
splitting in the chiral matter spectra -- 2 families of matter are localized on
the Z2 orbifold fixed points, and 1 family propagates in the 5d bulk -- and
identify them as the first-two and third families. Remarkably, the first two
families enjoy a non-abelian dihedral D4 family symmetry, due to the geometric
setup of the compactified space. In all our models there are always some color
triplets, i.e. (6,1,1) representations of the Pati-Salam group, survive
orbifold projections. They could be utilized to spontaneously break the
Pati-Salam symmetry to that of the Standard Model. One model, with a 5d E6
symmetry, may give rise to interesting low energy phenomenology. We study gauge
coupling unification, allowed Yukawa couplings and some of their
phenomenological consequences. The E6 model has a renormalizable Yukawa
coupling only for the third family. It predicts a gauge-Yukawa unification
relation at the 5d compactification scale, and is capable of generating
reasonable quark/lepton masses and mixings. Potential problems are also
addressed, they may point to the direction for refining our models.Comment: 58 pages, 5 figures, 4 tables, revtex4 with ams fonts. Version to
appear in NP
Effect of Multiphase Radiation on Coal Combustion in a Pulverized Coal jet Flame
The accurate modeling of coal combustion requires detailed radiative heat transfer models for both gaseous combustion products and solid coal particles. A multiphase Monte Carlo ray tracing (MCRT) radiation solver is developed in this work to simulate a laboratory-scale pulverized coal flame. The MCRT solver considers radiative interactions between coal particles and three major combustion products (CO2, H2O, and CO). A line-by-line spectral database for the gas phase and a size-dependent nongray correlation for the solid phase are employed to account for the nongray effects. The flame structure is significantly altered by considering nongray radiation and the lift-off height of the flame increases by approximately 35%, compared to the simulation without radiation. Radiation is also found to affect the evolution of coal particles considerably as it takes over as the dominant mode of heat transfer for medium-to-large coal particles downstream of the flame. To investigate the respective effects of spectral models for the gas and solid phases, a Planck-mean-based gray gas model and a size-independent gray particle model are applied in a frozen-field analysis of a steady-state snapshot of the flame. The gray gas approximation considerably underestimates the radiative source terms for both the gas phase and the solid phase. The gray coal approximation also leads to under-prediction of the particle emission and absorption. However, the level of under-prediction is not as significant as that resulting from the employment of the gray gas model. Finally, the effect of the spectral property of ash on radiation is also investigated and found to be insignificant for the present target flame
Evidence for orbital ordering in LaCoO3
We present powder and single crystal X-ray diffraction data as evidence for a
monoclinic distortion in the low spin (S=0) and intermediate spin state (S=1)
of LaCoO3. The alternation of short and long bonds in the ab plane indicates
the presence of eg orbital ordering induced by a cooperative Jahn-Teller
distortion. We observe an increase of the Jahn-Teller distortion with
temperature in agreement with a thermally activated behavior of the Co3+ ions
from a low-spin ground state to an intermediate-spin excited state.Comment: Accepted to Phys. Rev.
- âŠ