4,725 research outputs found
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Scalable Safe Policy Improvement via Monte Carlo Tree Search
Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent
Improved performance of the LHCb Outer Tracker in LHC Run 2
The LHCb Outer Tracker is a gaseous detector covering an area of with 12 double layers of straw tubes. The performance of the detector is
presented based on data of the LHC Run 2 running period from 2015 and 2016.
Occupancies and operational experience for data collected in , pPb and
PbPb collisions are described. An updated study of the ageing effects is
presented showing no signs of gain deterioration or other radiation damage
effects. In addition several improvements with respect to LHC Run 1 data taking
are introduced. A novel real-time calibration of the time-alignment of the
detector and the alignment of the single monolayers composing detector modules
are presented, improving the drift-time and position resolution of the detector
by 20\%. Finally, a potential use of the improved resolution for the timing of
charged tracks is described, showing the possibility to identify low-momentum
hadrons with their time-of-flight.Comment: 29 pages, 20 figures, minor changes to match the published versio
Parameter-Independent Strategies for pMDPs via POMDPs
Markov Decision Processes (MDPs) are a popular class of models suitable for
solving control decision problems in probabilistic reactive systems. We
consider parametric MDPs (pMDPs) that include parameters in some of the
transition probabilities to account for stochastic uncertainties of the
environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are
unknown and impossible to measure directly during execution, but there is a
probability distribution known over the parameter values. We study for the
first time computing parameter-independent strategies that are expectation
optimal, i.e., optimize the expected reachability probability under the
probability distribution over the parameters. We present an encoding of our
problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem
to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating
(repeated) learner model; a series of benchmarks of varying configurations of a
robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape
Dependence of Intramyocardial Pressure and Coronary Flow on Ventricular Loading and Contractility: A Model Study
The phasic coronary arterial inflow during the normal cardiac cycle has been explained with simple (waterfall, intramyocardial pump) models, emphasizing the role of ventricular pressure. To explain changes in isovolumic and low afterload beats, these models were extended with the effect of three-dimensional wall stress, nonlinear characteristics of the coronary bed, and extravascular fluid exchange. With the associated increase in the number of model parameters, a detailed parameter sensitivity analysis has become difficult. Therefore we investigated the primary relations between ventricular pressure and volume, wall stress, intramyocardial pressure and coronary blood flow, with a mathematical model with a limited number of parameters. The model replicates several experimental observations: the phasic character of coronary inflow is virtually independent of maximum ventricular pressure, the amplitude of the coronary flow signal varies about proportionally with cardiac contractility, and intramyocardial pressure in the ventricular wall may exceed ventricular pressure. A parameter sensitivity analysis shows that the normalized amplitude of coronary inflow is mainly determined by contractility, reflected in ventricular pressure and, at low ventricular volumes, radial wall stress. Normalized flow amplitude is less sensitive to myocardial coronary compliance and resistance, and to the relation between active fiber stress, time, and sarcomere shortening velocity
Determination of the Michel Parameters rho, xi, and delta in tau-Lepton Decays with tau --> rho nu Tags
Using the ARGUS detector at the storage ring DORIS II, we have
measured the Michel parameters , , and for
decays in -pair events produced at
center of mass energies in the region of the resonances. Using
as spin analyzing tags, we find , , , , and . In addition, we report
the combined ARGUS results on , , and using this work
und previous measurements.Comment: 10 pages, well formatted postscript can be found at
http://pktw06.phy.tu-dresden.de/iktp/pub/desy97-194.p
Bounded approximations for linear multi-objective planning under uncertainty.
Abstract Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences exactly, and would prefer a decision support system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we propose an anytime method, approximate optimistic linear support (AOLS), that incrementally builds up a complete set of -optimal plans, exploiting the piecewise-linear and convex shape of the value function. Second, we propose an approximate anytime method, scalarised sample incremental improvement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our methods are able to produce (near-)optimal alternative sets orders of magnitude faster than existing techniques, thereby demonstrating that our methods provide sensible approximations in stochastic multi-objective domains
Observation of the Isospin-Violating Decay
Using data collected with the CLEO~II detector, we have observed the
isospin-violating decay . The decay rate for this mode,
relative to the dominant radiative decay, is found to be .Comment: 8 page uuencoded postscript file, also available through
http://w4.lns.cornell.edu/public/CLN
- …