4,679 research outputs found

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Scalable Safe Policy Improvement via Monte Carlo Tree Search

    Get PDF
    Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent

    Improved performance of the LHCb Outer Tracker in LHC Run 2

    Full text link
    The LHCb Outer Tracker is a gaseous detector covering an area of 5×6m25\times 6 m^2 with 12 double layers of straw tubes. The performance of the detector is presented based on data of the LHC Run 2 running period from 2015 and 2016. Occupancies and operational experience for data collected in ppp p, pPb and PbPb collisions are described. An updated study of the ageing effects is presented showing no signs of gain deterioration or other radiation damage effects. In addition several improvements with respect to LHC Run 1 data taking are introduced. A novel real-time calibration of the time-alignment of the detector and the alignment of the single monolayers composing detector modules are presented, improving the drift-time and position resolution of the detector by 20\%. Finally, a potential use of the improved resolution for the timing of charged tracks is described, showing the possibility to identify low-momentum hadrons with their time-of-flight.Comment: 29 pages, 20 figures, minor changes to match the published versio

    Parameter-Independent Strategies for pMDPs via POMDPs

    Full text link
    Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape

    Dependence of Intramyocardial Pressure and Coronary Flow on Ventricular Loading and Contractility: A Model Study

    Get PDF
    The phasic coronary arterial inflow during the normal cardiac cycle has been explained with simple (waterfall, intramyocardial pump) models, emphasizing the role of ventricular pressure. To explain changes in isovolumic and low afterload beats, these models were extended with the effect of three-dimensional wall stress, nonlinear characteristics of the coronary bed, and extravascular fluid exchange. With the associated increase in the number of model parameters, a detailed parameter sensitivity analysis has become difficult. Therefore we investigated the primary relations between ventricular pressure and volume, wall stress, intramyocardial pressure and coronary blood flow, with a mathematical model with a limited number of parameters. The model replicates several experimental observations: the phasic character of coronary inflow is virtually independent of maximum ventricular pressure, the amplitude of the coronary flow signal varies about proportionally with cardiac contractility, and intramyocardial pressure in the ventricular wall may exceed ventricular pressure. A parameter sensitivity analysis shows that the normalized amplitude of coronary inflow is mainly determined by contractility, reflected in ventricular pressure and, at low ventricular volumes, radial wall stress. Normalized flow amplitude is less sensitive to myocardial coronary compliance and resistance, and to the relation between active fiber stress, time, and sarcomere shortening velocity

    Determination of the Michel Parameters rho, xi, and delta in tau-Lepton Decays with tau --> rho nu Tags

    Full text link
    Using the ARGUS detector at the e+ee^+ e^- storage ring DORIS II, we have measured the Michel parameters ρ\rho, ξ\xi, and ξδ\xi\delta for τ±l±ννˉ\tau^{\pm}\to l^{\pm} \nu\bar\nu decays in τ\tau-pair events produced at center of mass energies in the region of the Υ\Upsilon resonances. Using τρν\tau^\mp \to \rho^\mp \nu as spin analyzing tags, we find ρe=0.68±0.04±0.08\rho_{e}=0.68\pm 0.04 \pm 0.08, ξe=1.12±0.20±0.09\xi_{e}= 1.12 \pm 0.20 \pm 0.09, ξδe=0.57±0.14±0.07\xi\delta_{e}= 0.57 \pm 0.14 \pm 0.07, ρμ=0.69±0.06±0.08\rho_{\mu}= 0.69 \pm 0.06 \pm 0.08, ξμ=1.25±0.27±0.14\xi_{\mu}= 1.25 \pm 0.27 \pm 0.14 and ξδμ=0.72±0.18±0.10\xi\delta_{\mu}= 0.72 \pm 0.18 \pm 0.10. In addition, we report the combined ARGUS results on ρ\rho, ξ\xi, and ξδ\xi\delta using this work und previous measurements.Comment: 10 pages, well formatted postscript can be found at http://pktw06.phy.tu-dresden.de/iktp/pub/desy97-194.p

    Bounded approximations for linear multi-objective planning under uncertainty.

    Get PDF
    Abstract Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences exactly, and would prefer a decision support system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we propose an anytime method, approximate optimistic linear support (AOLS), that incrementally builds up a complete set of -optimal plans, exploiting the piecewise-linear and convex shape of the value function. Second, we propose an approximate anytime method, scalarised sample incremental improvement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our methods are able to produce (near-)optimal alternative sets orders of magnitude faster than existing techniques, thereby demonstrating that our methods provide sensible approximations in stochastic multi-objective domains

    Observation of the Isospin-Violating Decay Ds+Ds+π0D_s^{*+}\to D_s^+\pi^0

    Full text link
    Using data collected with the CLEO~II detector, we have observed the isospin-violating decay Ds+Ds+π0D_s^{*+}\to D_s^+\pi^0. The decay rate for this mode, relative to the dominant radiative decay, is found to be Γ(Ds+Ds+π0)/Γ(Ds+Ds+γ)=0.0620.018+0.020±0.022\Gamma(D_s^{*+}\to D_s^+\pi^0)/\Gamma(D_s^{*+}\to D_s^+\gamma)= 0.062^{+0.020}_{-0.018}\pm0.022.Comment: 8 page uuencoded postscript file, also available through http://w4.lns.cornell.edu/public/CLN
    corecore