4,989 research outputs found
Reinforcement Learning by Guided Safe Exploration
Safety is critical to broadening the application of reinforcement learning
(RL). Often, we train RL agents in a controlled environment, such as a
laboratory, before deploying them in the real world. However, the real-world
target task might be unknown prior to deployment. Reward-free RL trains an
agent without the reward to adapt quickly once the reward is revealed. We
consider the constrained reward-free setting, where an agent (the guide) learns
to explore safely without the reward signal. This agent is trained in a
controlled environment, which allows unsafe interactions and still provides the
safety signal. After the target task is revealed, safety violations are not
allowed anymore. Thus, the guide is leveraged to compose a safe behaviour
policy. Drawing from transfer learning, we also regularize a target policy (the
student) towards the guide while the student is unreliable and gradually
eliminate the influence of the guide as training progresses. The empirical
analysis shows that this method can achieve safe transfer learning and helps
the student solve the target task faster.Comment: Accecpted at ECAI 202
Large-scale trade in legally protected marine mollusc shells from Java and Bali, Indonesia
Background
Tropical marine molluscs are traded globally. Larger species with slow life histories are under threat from over-exploitation. We report on the trade in protected marine mollusc shells in and from Java and Bali, Indonesia. Since 1987 twelve species of marine molluscs are protected under Indonesian law to shield them from overexploitation. Despite this protection they are traded openly in large volumes.
Methodology/Principal Findings
We collected data on species composition, origins, volumes and prices at two large open markets (2013), collected data from wholesale traders (2013), and compiled seizure data by the Indonesian authorities (2008–2013). All twelve protected species were observed in trade. Smaller species were traded for 32,000 shells valued at USD500,000), chambered nautilus (Nautilus pompilius) (>3,000 shells, USD60,000) and giant clams (Tridacna spp.) (>2,000 shells, USD45,000) were traded in largest volumes. Two-thirds of this trade was destined for international markets, including in the USA and Asia-Pacific region.
Conclusions/Significance
We demonstrated that the trade in protected marine mollusc shells in Indonesia is not controlled nor monitored, that it involves large volumes, and that networks of shell collectors, traders, middlemen and exporters span the globe. This impedes protection of these species on the ground and calls into question the effectiveness of protected species management in Indonesia; solutions are unlikely to be found only in Indonesia and must involve the cooperation of importing countries
Scalable Safe Policy Improvement via Monte Carlo Tree Search
Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent
Context-dependent costs and benefits of tuberculosis resistance traits in a wild mammalian host
Disease acts as a powerful driver of evolution in natural host populations, yet individuals in a population often vary in their susceptibility to infection. Energetic trade-offs between immune and reproductive investment lead to the evolution of distinct life history strategies, driven by the relative fitness costs and benefits of resisting infection. However, examples quantifying the cost of resistance outside of the laboratory are rare. Here, we observe two distinct forms of resistance to bovine tuberculosis (bTB), an important zoonotic pathogen, in a free-ranging African buffalo (Syncerus caffer) population. We characterize these phenotypes as “infection resistance,” in which hosts delay or prevent infection, and “proliferation resistance,” in which the host limits the spread of lesions caused by the pathogen after infection has occurred. We found weak evidence that infection resistance to bTB may be heritable in this buffalo population (h2 = 0.10) and comes at the cost of reduced body condition and marginally reduced survival once infected, but also associates with an overall higher reproductive rate. Infection-resistant animals thus appear to follow a “fast” pace-of-life syndrome, in that they reproduce more quickly but die upon infection. In contrast, proliferation resistance had no apparent costs and was associated with measures of positive host health—such as having a higher body condition and reproductive rate. This study quantifies striking phenotypic variation in pathogen resistance and provides evidence for a link between life history variation and a disease resistance trait in a wild mammalian host population
Parameter-Independent Strategies for pMDPs via POMDPs
Markov Decision Processes (MDPs) are a popular class of models suitable for
solving control decision problems in probabilistic reactive systems. We
consider parametric MDPs (pMDPs) that include parameters in some of the
transition probabilities to account for stochastic uncertainties of the
environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are
unknown and impossible to measure directly during execution, but there is a
probability distribution known over the parameter values. We study for the
first time computing parameter-independent strategies that are expectation
optimal, i.e., optimize the expected reachability probability under the
probability distribution over the parameters. We present an encoding of our
problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem
to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating
(repeated) learner model; a series of benchmarks of varying configurations of a
robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape
Improved performance of the LHCb Outer Tracker in LHC Run 2
The LHCb Outer Tracker is a gaseous detector covering an area of with 12 double layers of straw tubes. The performance of the detector is
presented based on data of the LHC Run 2 running period from 2015 and 2016.
Occupancies and operational experience for data collected in , pPb and
PbPb collisions are described. An updated study of the ageing effects is
presented showing no signs of gain deterioration or other radiation damage
effects. In addition several improvements with respect to LHC Run 1 data taking
are introduced. A novel real-time calibration of the time-alignment of the
detector and the alignment of the single monolayers composing detector modules
are presented, improving the drift-time and position resolution of the detector
by 20\%. Finally, a potential use of the improved resolution for the timing of
charged tracks is described, showing the possibility to identify low-momentum
hadrons with their time-of-flight.Comment: 29 pages, 20 figures, minor changes to match the published versio
Determination of the Michel Parameters rho, xi, and delta in tau-Lepton Decays with tau --> rho nu Tags
Using the ARGUS detector at the storage ring DORIS II, we have
measured the Michel parameters , , and for
decays in -pair events produced at
center of mass energies in the region of the resonances. Using
as spin analyzing tags, we find , , , , and . In addition, we report
the combined ARGUS results on , , and using this work
und previous measurements.Comment: 10 pages, well formatted postscript can be found at
http://pktw06.phy.tu-dresden.de/iktp/pub/desy97-194.p
Semileptonic Branching Fraction of Charged and Neutral B Mesons
An examination of leptons in events tagged by reconstructed
decays yields semileptonic branching fractions of for charged and for neutral mesons.
This is the first measurement for charged . Assuming equality of the charged
and neutral semileptonic widths, the ratio is
equivalent to the ratio of lifetimes. A postscript version is available through
World-Wide-Web in http://w4.lns.cornell.edu/public/CLNS/1994Comment: 9 pages (in REVTEX format) Preprint CLNS94-1286, CLEO 94-1
- …