1,601 research outputs found

    Bayesian multitask inverse reinforcement learning

    Get PDF
    We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

    Regret Bounds for Reinforcement Learning with Policy Advice

    Get PDF
    In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided

    Approximating the Termination Value of One-Counter MDPs and Stochastic Games

    Get PDF
    One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011, invited for submission to Information and Computatio

    On the Benefit of Inventory-Based Dynamic Pricing Strategies

    Full text link
    We study the optimal pricing and replenishment decisions in an inventory system with a price-sensitive demand, focusing on the benefit of the inventory-based dynamic pricing strategy. We find that demand variability impacts the benefit of dynamic pricing not only through the magnitude of the variability but also through its functional form (e.g., whether it is additive, multiplicative, or others). We provide an approach to quantify the profit improvement of dynamic pricing over static pricing without having to solve the dynamic pricing problem. We also demonstrate that dynamic pricing is most effective when it is jointly optimized with inventory replenishment decisions, and that its advantage can be mostly realized by using one or two price changes over a replenishment cycle.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/78685/1/j.1937-5956.2009.01099.x.pd

    Optimal Release of Inventory Using Online Auctions: The Two Item Case

    Get PDF
    In this paper we analyze policies for optimally disposing inventory using online auctions. We assume a seller has a fixed number of items to sell using a sequence of, possibly overlapping, single-item auctions. The decision the seller must make is when to start each auction. The decision involves a trade-off between a holding cost for each period an item remains unsold, and a higher expected final price the fewer the number of simultaneous auctions underway. Consequently the seller must trade-off the expected marginal gain for the ongoing auctions with the expected marginal cost of the unreleased items by further deferring their release. We formulate the problem as a discrete time Markov Decision Problem and consider two cases. In the first case we assume the auctions are guaranteed to be successful, while in the second case we assume there is a positive probability that an auction receives no bids. The reason for considering these two cases are that they require different analysis. We derive conditions to ensure that the optimal release policy is a control limit policy in the current price of the ongoing auctions, and provide several illustration of results. The paper focuses on the two item case which has sufficient complexity to raise challenging questions

    Stafne bone cavity : magnetic resonance imaging

    Get PDF
    A case of Stafne bone cavity (SBC) affecting the body of the mandible of a 51-year-old female is reported. The imaging modalities included panoramic radiograph, computed tomography (CT) and magnetic resonance (MR) imaging. Panoramic radiograph and CT were able to determine the outline of the cavity and its three dimentional shape, but failed to precisely diagnose the soft tissue content of the cavity. MR imaging demonstrated that the bony cavity is filled with soft tissue that is continuous and identical in signal with that of the submandibular salivary gland. Based on the MR imaging a diagnosis of SBC was made and no further studies or surgical treatment were initated. MR imaging should be considered the diagnostic technique in cases where SBC is suspected. Recognition of the lesion should preclude any further treatment or surgical exploration

    Efficient computation of exact solutions for quantitative model checking

    Get PDF
    Quantitative model checkers for Markov Decision Processes typically use finite-precision arithmetic. If all the coefficients in the process are rational numbers, then the model checking results are rational, and so they can be computed exactly. However, exact techniques are generally too expensive or limited in scalability. In this paper we propose a method for obtaining exact results starting from an approximated solution in finite-precision arithmetic. The input of the method is a description of a scheduler, which can be obtained by a model checker using finite precision. Given a scheduler, we show how to obtain a corresponding basis in a linear-programming problem, in such a way that the basis is optimal whenever the scheduler attains the worst-case probability. This correspondence is already known for discounted MDPs, we show how to apply it in the undiscounted case provided that some preprocessing is done. Using the correspondence, the linear-programming problem can be solved in exact arithmetic starting from the basis obtained. As a consequence, the method finds the worst-case probability even if the scheduler provided by the model checker was not optimal. In our experiments, the calculation of exact solutions from a candidate scheduler is significantly faster than the calculation using the simplex method under exact arithmetic starting from a default basis.Comment: In Proceedings QAPL 2012, arXiv:1207.055

    Drill failure during ORIF of the mandible : complication management

    Get PDF
    A case of a drill breakage during open reduction and internal fixation (ORIF) of a mandibular fracture is reported. The clinical decision, diagnosis and surgical management of the complication are described
    corecore