4,375 research outputs found

    SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

    Full text link
    This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS). Unlike most existing libraries that focus solely on either policy learning or evaluation, SCOPE-RL seamlessly integrates these two key aspects, facilitating flexible and complete implementations of both offline RL and OPE processes. SCOPE-RL put particular emphasis on its OPE modules, offering a range of OPE estimators and robust evaluation-of-OPE protocols. This approach enables more in-depth and reliable OPE compared to other packages. For instance, SCOPE-RL enhances OPE by estimating the entire reward distribution under a policy rather than its mere point-wise expected value. Additionally, SCOPE-RL provides a more thorough evaluation-of-OPE by presenting the risk-return tradeoff in OPE results, extending beyond mere accuracy evaluations in existing OPE literature. SCOPE-RL is designed with user accessibility in mind. Its user-friendly APIs, comprehensive documentation, and a variety of easy-to-follow examples assist researchers and practitioners in efficiently implementing and experimenting with various offline RL methods and OPE estimators, tailored to their specific problem contexts. The documentation of SCOPE-RL is available at https://scope-rl.readthedocs.io/en/latest/.Comment: preprint, open-source software: https://github.com/hakuhodo-technologies/scope-r

    Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

    Full text link
    Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called SharpeRatio@k, which measures the risk-return tradeoff of policy portfolios formed by an OPE estimator under varying online evaluation budgets (k). We validate our metric in two example scenarios, demonstrating its ability to effectively distinguish between low-risk and high-risk estimators and to accurately identify the most efficient one. Efficiency of an estimator is characterized by its capability to form the most advantageous policy portfolios, maximizing returns while minimizing risks during online deployment, a nuance that existing metrics typically overlook. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also integrated this metric into an open-source software, SCOPE-RL (https://github.com/hakuhodo-technologies/scope-rl). Employing SharpeRatio@k and SCOPE-RL, we conduct comprehensive benchmarking experiments on various estimators and RL tasks, focusing on their risk-return tradeoff. These experiments offer several interesting directions and suggestions for future OPE research.Comment: ICLR202

    Precision Crystal Calorimetry in High Energy Physics

    Get PDF
    Crystal Calorimetry is widely used in high energy physics because of its precision. Recent development in crystal technology identified two key issues to reach and maintain crystal precision: light response uniformity and calibration in situ. Crystal radiation damage is understood. While the damage in alkali halides is found to be caused by the oxygen/hydroxyl contamination, it is the structure defects, such as oxygen vacancies, cause damage in oxides.Comment: 8 pages with 13 eps Figures, RevTe

    Electronic and Magnetic Phase Diagram of a Superconductor, SmFeAsO1-xFx

    Full text link
    A crystallographic and magnetic phase diagram of SmFeAsO1-xFx is determined as a function of x in terms of temperature based on electrical transport and magnetization, synchrotron powder x-ray diffraction, 57Fe Mossbauer spectra (MS), and 149Sm nuclear resonant forward scattering (NRFS) measurements. MS revealed that the magnetic moments of Fe were aligned antiferromagnetically at ~144 K (TN(Fe)). The magnetic moment of Fe (MFe) is estimated to be 0.34 myuB/Fe at 4.2 K for undoped SmFeAsO; MFe is quenched in superconducting F-doped SmFeAsO. 149Sm NRFS spectra revealed that the magnetic moments of Sm start to order antiferromagnetically at 5.6 K (undoped) and 4.4 K (TN(Sm)) (x = 0.069). Results clearly indicate that the antiferromagnetic Sm sublattice coexists with the superconducting phase in SmFeAsO1-xFx below TN(Sm), while antiferromagnetic Fe sublattice does not coexist with the superconducting phase.Comment: Accepted in New Journal of Physic

    Searching for realistic 4d string models with a Pati-Salam symmetry -- Orbifold grand unified theories from heterotic string compactification on a Z6 orbifold

    Full text link
    Motivated by orbifold grand unified theories, we construct a class of three-family Pati-Salam models in a Z6 abelian symmetric orbifold with two discrete Wilson lines. These models have marked differences from previously-constructed three-family models in prime-order orbifolds. In the limit where one of the six compactified dimensions (which lies in a Z2 sub-orbifold) is large compared to the string length scale, our models reproduce the supersymmetry and gauge symmetry breaking pattern of 5d orbifold grand unified theories on an S1/Z2 orbicircle. We find a horizontal 2+1 splitting in the chiral matter spectra -- 2 families of matter are localized on the Z2 orbifold fixed points, and 1 family propagates in the 5d bulk -- and identify them as the first-two and third families. Remarkably, the first two families enjoy a non-abelian dihedral D4 family symmetry, due to the geometric setup of the compactified space. In all our models there are always some color triplets, i.e. (6,1,1) representations of the Pati-Salam group, survive orbifold projections. They could be utilized to spontaneously break the Pati-Salam symmetry to that of the Standard Model. One model, with a 5d E6 symmetry, may give rise to interesting low energy phenomenology. We study gauge coupling unification, allowed Yukawa couplings and some of their phenomenological consequences. The E6 model has a renormalizable Yukawa coupling only for the third family. It predicts a gauge-Yukawa unification relation at the 5d compactification scale, and is capable of generating reasonable quark/lepton masses and mixings. Potential problems are also addressed, they may point to the direction for refining our models.Comment: 58 pages, 5 figures, 4 tables, revtex4 with ams fonts. Version to appear in NP

    Effect of Multiphase Radiation on Coal Combustion in a Pulverized Coal jet Flame

    Get PDF
    The accurate modeling of coal combustion requires detailed radiative heat transfer models for both gaseous combustion products and solid coal particles. A multiphase Monte Carlo ray tracing (MCRT) radiation solver is developed in this work to simulate a laboratory-scale pulverized coal flame. The MCRT solver considers radiative interactions between coal particles and three major combustion products (CO2, H2O, and CO). A line-by-line spectral database for the gas phase and a size-dependent nongray correlation for the solid phase are employed to account for the nongray effects. The flame structure is significantly altered by considering nongray radiation and the lift-off height of the flame increases by approximately 35%, compared to the simulation without radiation. Radiation is also found to affect the evolution of coal particles considerably as it takes over as the dominant mode of heat transfer for medium-to-large coal particles downstream of the flame. To investigate the respective effects of spectral models for the gas and solid phases, a Planck-mean-based gray gas model and a size-independent gray particle model are applied in a frozen-field analysis of a steady-state snapshot of the flame. The gray gas approximation considerably underestimates the radiative source terms for both the gas phase and the solid phase. The gray coal approximation also leads to under-prediction of the particle emission and absorption. However, the level of under-prediction is not as significant as that resulting from the employment of the gray gas model. Finally, the effect of the spectral property of ash on radiation is also investigated and found to be insignificant for the present target flame

    Evidence for orbital ordering in LaCoO3

    Get PDF
    We present powder and single crystal X-ray diffraction data as evidence for a monoclinic distortion in the low spin (S=0) and intermediate spin state (S=1) of LaCoO3. The alternation of short and long bonds in the ab plane indicates the presence of eg orbital ordering induced by a cooperative Jahn-Teller distortion. We observe an increase of the Jahn-Teller distortion with temperature in agreement with a thermally activated behavior of the Co3+ ions from a low-spin ground state to an intermediate-spin excited state.Comment: Accepted to Phys. Rev.
    • 

    corecore