1,304 research outputs found

    Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

    Full text link
    Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 202

    Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

    Full text link
    Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 202

    CORL: Research-oriented Deep Offline Reinforcement Learning Library

    Full text link
    CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.Comment: Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/COR

    Bessel beam CARS of axially structured samples

    Get PDF
    We report about a Bessel beam CARS approach for axial profiling of multi-layer structures. This study presents an experimental implementation for the generation of CARS by Bessel beam excitation using only passive optical elements. Furthermore, an analytical expression is provided describing the generated anti-Stokes field by a homogeneous sample. Based on the concept of coherent transfer functions, the underling resolving power of axially structured geometries is investigated. It is found that through the non-linearity of the CARS process in combination with the folded illumination geometry continuous phase-matching is achieved starting from homogeneous samples up to spatial sample frequencies at twice of the pumping electric field wave. The experimental and analytical findings are modeled by the implementation of the Debye Integral and scalar Green function approach. Finally, the goal of reconstructing an axially layered sample is demonstrated on the basis of the numerically simulated modulus and phase of the anti-Stokes far-field radiation pattern

    Ultrafast in cellulo photoinduced dynamics processes of the paradigm molecular light switch [Ru(bpy)2dppz]2+

    Get PDF
    An in cellulo study of the ultrafast excited state processes in the paradigm molecular light switch [Ru(bpy)2dppz]2+ by localized pump-probe spectroscopy is reported for the first time. The localization of [Ru(bpy)2dppz]2+ in HepG2 cells is verified by emission microscopy and the characteristic photoinduced picosecond dynamics of the molecular light switch is observed in cellulo. The observation of the typical phosphorescence stemming from a 3MLCT state suggests that the [Ru(bpy)2dppz]2+ complex intercalates with the DNA in the nucleus. The results presented for this benchmark coordination compound reveal the necessity to study the photoinduced processes in coordination compounds for intracellular use, e.g. as sensors or as photodrugs, in the actual biological target environment in order to derive a detailed molecular mechanistic understanding of the excited-state properties of the systems in the actual biological target environment

    Chemical, Physical and Biological Triggers of Evolutionary Conserved Bcl-xL-Mediated Apoptosis

    Get PDF
    Background: The evidence that pan-Bcl-2 or Bcl-xL-specific inhibitors prematurely kill virus-infected or RNA/DNA-transfected cells provides rationale for investigating these apoptotic inducers further. We hypothesized that not only invasive RNA or DNA (biological factors) but also DNA/RNA-damaging chemical or physical factors could trigger apoptosis that have been sensitized with pan-Bcl-2 or Bcl-xL-specific agents; Methods: We tested chemical and physical factors plus Bcl-xL-specific inhibitor A-1155463 in cells of various origins and the small roundworms (C. elegans); Results: We show that combination of a A-1155463 along with a DNA-damaging agent, 4-nitroquinoline-1-oxide (4NQO), prematurely kills cells of various origins as well as C. elegans. The synergistic effect is p53-dependent and associated with the release of Bad and Bax from Bcl-xL, which trigger mitochondrial outer membrane permeabilization. Furthermore, we found that combining Bcl-xL-specific inhibitors with various chemical compounds or physical insults also induced cell death; Conclusions: Thus, we were able to identify several biological, chemical and physical triggers of the evolutionarily conserved Bcl-xL-mediated apoptotic pathway, shedding light on strategies and targets for novel drug development

    Jet energy measurement with the ATLAS detector in proton-proton collisions at root s=7 TeV

    Get PDF
    The jet energy scale and its systematic uncertainty are determined for jets measured with the ATLAS detector at the LHC in proton-proton collision data at a centre-of-mass energy of √s = 7TeV corresponding to an integrated luminosity of 38 pb-1. Jets are reconstructed with the anti-kt algorithm with distance parameters R=0. 4 or R=0. 6. Jet energy and angle corrections are determined from Monte Carlo simulations to calibrate jets with transverse momenta pT≄20 GeV and pseudorapidities {pipe}η{pipe}<4. 5. The jet energy systematic uncertainty is estimated using the single isolated hadron response measured in situ and in test-beams, exploiting the transverse momentum balance between central and forward jets in events with dijet topologies and studying systematic variations in Monte Carlo simulations. The jet energy uncertainty is less than 2. 5 % in the central calorimeter region ({pipe}η{pipe}<0. 8) for jets with 60≀pT<800 GeV, and is maximally 14 % for pT<30 GeV in the most forward region 3. 2≀{pipe}η{pipe}<4. 5. The jet energy is validated for jet transverse momenta up to 1 TeV to the level of a few percent using several in situ techniques by comparing a well-known reference such as the recoiling photon pT, the sum of the transverse momenta of tracks associated to the jet, or a system of low-pT jets recoiling against a high-pT jet. More sophisticated jet calibration schemes are presented based on calorimeter cell energy density weighting or hadronic properties of jets, aiming for an improved jet energy resolution and a reduced flavour dependence of the jet response. The systematic uncertainty of the jet energy determined from a combination of in situ techniques is consistent with the one derived from single hadron response measurements over a wide kinematic range. The nominal corrections and uncertainties are derived for isolated jets in an inclusive sample of high-pT jets. Special cases such as event topologies with close-by jets, or selections of samples with an enhanced content of jets originating from light quarks, heavy quarks or gluons are also discussed and the corresponding uncertainties are determined. © 2013 CERN for the benefit of the ATLAS collaboration

    Measurement of the inclusive and dijet cross-sections of b-jets in pp collisions at sqrt(s) = 7 TeV with the ATLAS detector

    Get PDF
    The inclusive and dijet production cross-sections have been measured for jets containing b-hadrons (b-jets) in proton-proton collisions at a centre-of-mass energy of sqrt(s) = 7 TeV, using the ATLAS detector at the LHC. The measurements use data corresponding to an integrated luminosity of 34 pb^-1. The b-jets are identified using either a lifetime-based method, where secondary decay vertices of b-hadrons in jets are reconstructed using information from the tracking detectors, or a muon-based method where the presence of a muon is used to identify semileptonic decays of b-hadrons inside jets. The inclusive b-jet cross-section is measured as a function of transverse momentum in the range 20 < pT < 400 GeV and rapidity in the range |y| < 2.1. The bbbar-dijet cross-section is measured as a function of the dijet invariant mass in the range 110 < m_jj < 760 GeV, the azimuthal angle difference between the two jets and the angular variable chi in two dijet mass regions. The results are compared with next-to-leading-order QCD predictions. Good agreement is observed between the measured cross-sections and the predictions obtained using POWHEG + Pythia. MC@NLO + Herwig shows good agreement with the measured bbbar-dijet cross-section. However, it does not reproduce the measured inclusive cross-section well, particularly for central b-jets with large transverse momenta.Comment: 10 pages plus author list (21 pages total), 8 figures, 1 table, final version published in European Physical Journal

    Observation of associated near-side and away-side long-range correlations in √sNN=5.02  TeV proton-lead collisions with the ATLAS detector

    Get PDF
    Two-particle correlations in relative azimuthal angle (Δϕ) and pseudorapidity (Δη) are measured in √sNN=5.02  TeV p+Pb collisions using the ATLAS detector at the LHC. The measurements are performed using approximately 1  Όb-1 of data as a function of transverse momentum (pT) and the transverse energy (ÎŁETPb) summed over 3.1<η<4.9 in the direction of the Pb beam. The correlation function, constructed from charged particles, exhibits a long-range (2<|Δη|<5) “near-side” (Δϕ∌0) correlation that grows rapidly with increasing ÎŁETPb. A long-range “away-side” (Δϕ∌π) correlation, obtained by subtracting the expected contributions from recoiling dijets and other sources estimated using events with small ÎŁETPb, is found to match the near-side correlation in magnitude, shape (in Δη and Δϕ) and ÎŁETPb dependence. The resultant Δϕ correlation is approximately symmetric about π/2, and is consistent with a dominant cos⁥2Δϕ modulation for all ÎŁETPb ranges and particle pT
    • 

    corecore