4,960 research outputs found

    Reinforcement Learning by Guided Safe Exploration

    Full text link
    Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.Comment: Accecpted at ECAI 202

    Large-scale trade in legally protected marine mollusc shells from Java and Bali, Indonesia

    Get PDF
    Background Tropical marine molluscs are traded globally. Larger species with slow life histories are under threat from over-exploitation. We report on the trade in protected marine mollusc shells in and from Java and Bali, Indonesia. Since 1987 twelve species of marine molluscs are protected under Indonesian law to shield them from overexploitation. Despite this protection they are traded openly in large volumes. Methodology/Principal Findings We collected data on species composition, origins, volumes and prices at two large open markets (2013), collected data from wholesale traders (2013), and compiled seizure data by the Indonesian authorities (2008–2013). All twelve protected species were observed in trade. Smaller species were traded for 32,000 shells valued at USD500,000), chambered nautilus (Nautilus pompilius) (>3,000 shells, USD60,000) and giant clams (Tridacna spp.) (>2,000 shells, USD45,000) were traded in largest volumes. Two-thirds of this trade was destined for international markets, including in the USA and Asia-Pacific region. Conclusions/Significance We demonstrated that the trade in protected marine mollusc shells in Indonesia is not controlled nor monitored, that it involves large volumes, and that networks of shell collectors, traders, middlemen and exporters span the globe. This impedes protection of these species on the ground and calls into question the effectiveness of protected species management in Indonesia; solutions are unlikely to be found only in Indonesia and must involve the cooperation of importing countries

    Scalable Safe Policy Improvement via Monte Carlo Tree Search

    Get PDF
    Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent

    Context-dependent costs and benefits of tuberculosis resistance traits in a wild mammalian host

    Get PDF
    Disease acts as a powerful driver of evolution in natural host populations, yet individuals in a population often vary in their susceptibility to infection. Energetic trade-offs between immune and reproductive investment lead to the evolution of distinct life history strategies, driven by the relative fitness costs and benefits of resisting infection. However, examples quantifying the cost of resistance outside of the laboratory are rare. Here, we observe two distinct forms of resistance to bovine tuberculosis (bTB), an important zoonotic pathogen, in a free-ranging African buffalo (Syncerus caffer) population. We characterize these phenotypes as “infection resistance,” in which hosts delay or prevent infection, and “proliferation resistance,” in which the host limits the spread of lesions caused by the pathogen after infection has occurred. We found weak evidence that infection resistance to bTB may be heritable in this buffalo population (h2 = 0.10) and comes at the cost of reduced body condition and marginally reduced survival once infected, but also associates with an overall higher reproductive rate. Infection-resistant animals thus appear to follow a “fast” pace-of-life syndrome, in that they reproduce more quickly but die upon infection. In contrast, proliferation resistance had no apparent costs and was associated with measures of positive host health—such as having a higher body condition and reproductive rate. This study quantifies striking phenotypic variation in pathogen resistance and provides evidence for a link between life history variation and a disease resistance trait in a wild mammalian host population

    Parameter-Independent Strategies for pMDPs via POMDPs

    Full text link
    Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape

    Improved performance of the LHCb Outer Tracker in LHC Run 2

    Full text link
    The LHCb Outer Tracker is a gaseous detector covering an area of 5×6m25\times 6 m^2 with 12 double layers of straw tubes. The performance of the detector is presented based on data of the LHC Run 2 running period from 2015 and 2016. Occupancies and operational experience for data collected in ppp p, pPb and PbPb collisions are described. An updated study of the ageing effects is presented showing no signs of gain deterioration or other radiation damage effects. In addition several improvements with respect to LHC Run 1 data taking are introduced. A novel real-time calibration of the time-alignment of the detector and the alignment of the single monolayers composing detector modules are presented, improving the drift-time and position resolution of the detector by 20\%. Finally, a potential use of the improved resolution for the timing of charged tracks is described, showing the possibility to identify low-momentum hadrons with their time-of-flight.Comment: 29 pages, 20 figures, minor changes to match the published versio

    Determination of the Michel Parameters rho, xi, and delta in tau-Lepton Decays with tau --> rho nu Tags

    Full text link
    Using the ARGUS detector at the e+ee^+ e^- storage ring DORIS II, we have measured the Michel parameters ρ\rho, ξ\xi, and ξδ\xi\delta for τ±l±ννˉ\tau^{\pm}\to l^{\pm} \nu\bar\nu decays in τ\tau-pair events produced at center of mass energies in the region of the Υ\Upsilon resonances. Using τρν\tau^\mp \to \rho^\mp \nu as spin analyzing tags, we find ρe=0.68±0.04±0.08\rho_{e}=0.68\pm 0.04 \pm 0.08, ξe=1.12±0.20±0.09\xi_{e}= 1.12 \pm 0.20 \pm 0.09, ξδe=0.57±0.14±0.07\xi\delta_{e}= 0.57 \pm 0.14 \pm 0.07, ρμ=0.69±0.06±0.08\rho_{\mu}= 0.69 \pm 0.06 \pm 0.08, ξμ=1.25±0.27±0.14\xi_{\mu}= 1.25 \pm 0.27 \pm 0.14 and ξδμ=0.72±0.18±0.10\xi\delta_{\mu}= 0.72 \pm 0.18 \pm 0.10. In addition, we report the combined ARGUS results on ρ\rho, ξ\xi, and ξδ\xi\delta using this work und previous measurements.Comment: 10 pages, well formatted postscript can be found at http://pktw06.phy.tu-dresden.de/iktp/pub/desy97-194.p

    Semileptonic Branching Fraction of Charged and Neutral B Mesons

    Full text link
    An examination of leptons in Υ(4S){\Upsilon (4S)} events tagged by reconstructed BB decays yields semileptonic branching fractions of b=(10.1±1.8±1.4)%b_-=(10.1 \pm 1.8\pm 1.4)\% for charged and b0=(10.9±0.7±1.1)%b_0=(10.9 \pm 0.7\pm 1.1)\% for neutral BB mesons. This is the first measurement for charged BB. Assuming equality of the charged and neutral semileptonic widths, the ratio b/b0=0.93±0.18±0.12b_-/b_0=0.93 \pm 0.18 \pm 0.12 is equivalent to the ratio of lifetimes. A postscript version is available through World-Wide-Web in http://w4.lns.cornell.edu/public/CLNS/1994Comment: 9 pages (in REVTEX format) Preprint CLNS94-1286, CLEO 94-1
    corecore