Search CORE

1,614 research outputs found

Bayesian multitask inverse reinforcement learning

Author: C.A. Rothkopf
J. Choi
M.L. Puterman
T.S. Ferguson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Chalmers Research

Regret Bounds for Reinforcement Learning with Policy Advice

Author: C. Tekin
M.L. Puterman
N. Cesa-Bianchi
R. Ortner
R.S. Sutton
T. Jaksch
Publication venue
Publication date: 01/01/2013
Field of study

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Approximating the Termination Value of One-Counter MDPs and Stochastic Games

Author: G.R. Grimmett
J. Lambert
K. Etessami
K. Etessami
L.B. White
M.L. Puterman
T. Brázdil
Publication venue
Publication date: 01/01/2011
Field of study

One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011, invited for submission to Information and Computatio

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Recommended from our members

Associations between childhood adversity and daily suppression and avoidance in response to stress in adulthood: can neurobiological sensitivity help explain this relationship?

Author: Arenander Justine
Bush Nicole
Epel Elissa
Hagan Melissa J
Mendes Wendy Berry
Puterman Eli
Publication venue: eScholarship, University of California
Publication date: 01/03/2017
Field of study

Background and objectivesAlthough it has been postulated that psychological responses to stress in adulthood are grounded in childhood experiences in the family environment, evidence has been inconsistent. This study tested whether two putative measures of neurobiological sensitivity (vagal flexibility and attentional capacity) moderated the relation between women's reported exposure to a risky childhood environment and current engagement in suppressive or avoidant coping in response to daily stress.Design and methodsAdult women (N = 158) recruited for a study of stress, coping, and aging reported on early adversity (EA) in their childhood family environment and completed a week-long daily diary in which they described their most stressful event of the day and indicated the degree to which they used suppression or avoidance in response to that event. In addition, women completed a visual tracking task during which heart rate variability and attentional capacity were assessed.ResultsMultilevel mixed modeling analyses revealed that greater EA predicted greater suppression and avoidance only among women with higher attentional capacity. Similarly, greater EA predicted greater use of suppression, but only among women with greater vagal flexibility.ConclusionChildhood adversity may predispose individuals with high neurobiological sensitivity to a lifetime of maladaptive coping

eScholarship - University of California

On the Benefit of Inventory-Based Dynamic Pricing Strategies

Author: Chan
Chen
Karlin
Puterman
Yano
Publication venue: 'Wiley'
Publication date: 01/05/2010
Field of study

We study the optimal pricing and replenishment decisions in an inventory system with a price-sensitive demand, focusing on the benefit of the inventory-based dynamic pricing strategy. We find that demand variability impacts the benefit of dynamic pricing not only through the magnitude of the variability but also through its functional form (e.g., whether it is additive, multiplicative, or others). We provide an approach to quantify the profit improvement of dynamic pricing over static pricing without having to solve the dynamic pricing problem. We also demonstrate that dynamic pricing is most effective when it is jointly optimized with inventory replenishment decisions, and that its advantage can be mostly realized by using one or two price changes over a replenishment cycle.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/78685/1/j.1937-5956.2009.01099.x.pd

Crossref

Deep Blue Documents at the University of Michigan

Optimal Release of Inventory Using Online Auctions: The Two Item Case

Author: Odegaard Fredrik
Puterman Martin L.
Publication venue: Scholarship@Western
Publication date: 01/10/2006
Field of study

In this paper we analyze policies for optimally disposing inventory using online auctions. We assume a seller has a ﬁxed number of items to sell using a sequence of, possibly overlapping, single-item auctions. The decision the seller must make is when to start each auction. The decision involves a trade-oﬀ between a holding cost for each period an item remains unsold, and a higher expected ﬁnal price the fewer the number of simultaneous auctions underway. Consequently the seller must trade-oﬀ the expected marginal gain for the ongoing auctions with the expected marginal cost of the unreleased items by further deferring their release. We formulate the problem as a discrete time Markov Decision Problem and consider two cases. In the ﬁrst case we assume the auctions are guaranteed to be successful, while in the second case we assume there is a positive probability that an auction receives no bids. The reason for considering these two cases are that they require diﬀerent analysis. We derive conditions to ensure that the optimal release policy is a control limit policy in the current price of the ongoing auctions, and provide several illustration of results. The paper focuses on the two item case which has suﬃcient complexity to raise challenging questions

Scholarship@Western

Stafne bone cavity : magnetic resonance imaging

Author: Bodner Lipa
Puterman Max
Segev Yoram
Publication venue
Publication date: 01/01/2006
Field of study

A case of Stafne bone cavity (SBC) affecting the body of the mandible of a 51-year-old female is reported. The imaging modalities included panoramic radiograph, computed tomography (CT) and magnetic resonance (MR) imaging. Panoramic radiograph and CT were able to determine the outline of the cavity and its three dimentional shape, but failed to precisely diagnose the soft tissue content of the cavity. MR imaging demonstrated that the bony cavity is filled with soft tissue that is continuous and identical in signal with that of the submandibular salivary gland. Based on the MR imaging a diagnosis of SBC was made and no further studies or surgical treatment were initated. MR imaging should be considered the diagnostic technique in cases where SBC is suspected. Recognition of the lesion should preclude any further treatment or surgical exploration

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Efficient computation of exact solutions for quantitative model checking

Author: Aspnes
Bianco
Bradley
Courcoubetis
D'Epenoux
Fearnley
Forejt
Herbert Wiklicky
Kwiatkowska
Mieke Massink
Puterman
Sergio Giro
Stoelinga
Ye
Publication venue: 'Open Publishing Association'
Publication date: 01/07/2012
Field of study

Quantitative model checkers for Markov Decision Processes typically use finite-precision arithmetic. If all the coefficients in the process are rational numbers, then the model checking results are rational, and so they can be computed exactly. However, exact techniques are generally too expensive or limited in scalability. In this paper we propose a method for obtaining exact results starting from an approximated solution in finite-precision arithmetic. The input of the method is a description of a scheduler, which can be obtained by a model checker using finite precision. Given a scheduler, we show how to obtain a corresponding basis in a linear-programming problem, in such a way that the basis is optimal whenever the scheduler attains the worst-case probability. This correspondence is already known for discounted MDPs, we show how to apply it in the undiscounted case provided that some preprocessing is done. Using the correspondence, the linear-programming problem can be solved in exact arithmetic starting from the basis obtained. As a consequence, the method finds the worst-case probability even if the scheduler provided by the model checker was not optimal. In our experiments, the calculation of exact solutions from a candidate scheduler is significantly faster than the calculation using the simplex method under exact arithmetic starting from a default basis.Comment: In Proceedings QAPL 2012, arXiv:1207.055

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Drill failure during ORIF of the mandible : complication management

Author: Bodner Lipa
Puterman Max
Woldenberg Yitzhak
Publication venue
Publication date: 01/01/2007
Field of study

A case of a drill breakage during open reduction and internal fixation (ORIF) of a mandibular fracture is reported. The clinical decision, diagnosis and surgical management of the complication are described

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Recommended from our members

A Mitochondrial Health Index Sensitive to Mood and Caregiving Stress.

Author: Aschbacher Kirstin
Burelle Yan
Coccia Michael
Cuillerier Alexanne
Epel Elissa S
Picard Martin
Prather Aric A
Puterman Eli
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

BACKGROUND:Chronic life stress, such as the stress of caregiving, can promote pathophysiology, but the underlying cellular mechanisms are not well understood. Chronic stress may induce recalibrations in mitochondria leading to changes either in mitochondrial content per cell, or in mitochondrial functional capacity (i.e., quality). METHODS:Here we present a functional index of mitochondrial health (MHI) for human leukocytes that can distinguish between these two possibilities. The MHI integrates nuclear and mitochondrial DNA-encoded respiratory chain enzymatic activities and mitochondrial DNA copy number. We then use the MHI to test the hypothesis that daily emotional states and caregiving stress influence mitochondrial function by comparing healthy mothers of a child with an autism spectrum disorder (high-stress caregivers, n = 46) with mothers of a neurotypical child (control group, n = 45). RESULTS:The MHI outperformed individual mitochondrial function measures. Elevated positive mood at night was associated with higher MHI, and nightly positive mood was also a mediator of the association between caregiving and MHI. Moreover, MHI was correlated to positive mood on the days preceding, but not following the blood draw, suggesting for the first time in humans that mitochondria may respond to proximate emotional states within days. Correspondingly, the caregiver group, which had higher perceived stress and lower positive and greater negative daily affect, exhibited lower MHI. This effect was not explained by a mismatch between nuclear and mitochondrial genomes. CONCLUSIONS:Daily mood and chronic caregiving stress are associated with mitochondrial functional capacity. Mitochondrial health may represent a nexus between psychological stress and health

eScholarship - University of California