1,614 research outputs found
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Regret Bounds for Reinforcement Learning with Policy Advice
In some reinforcement learning problems an agent may be provided with a set
of input policies, perhaps learned from prior experience or provided by
advisors. We present a reinforcement learning with policy advice (RLPA)
algorithm which leverages this input set and learns to use the best policy in
the set for the reinforcement learning task at hand. We prove that RLPA has a
sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and
that both this regret and its computational complexity are independent of the
size of the state and action space. Our empirical simulations support our
theoretical analysis. This suggests RLPA may offer significant advantages in
large domains where some prior good policies are provided
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
Recommended from our members
Associations between childhood adversity and daily suppression and avoidance in response to stress in adulthood: can neurobiological sensitivity help explain this relationship?
Background and objectivesAlthough it has been postulated that psychological responses to stress in adulthood are grounded in childhood experiences in the family environment, evidence has been inconsistent. This study tested whether two putative measures of neurobiological sensitivity (vagal flexibility and attentional capacity) moderated the relation between women's reported exposure to a risky childhood environment and current engagement in suppressive or avoidant coping in response to daily stress.Design and methodsAdult women (N = 158) recruited for a study of stress, coping, and aging reported on early adversity (EA) in their childhood family environment and completed a week-long daily diary in which they described their most stressful event of the day and indicated the degree to which they used suppression or avoidance in response to that event. In addition, women completed a visual tracking task during which heart rate variability and attentional capacity were assessed.ResultsMultilevel mixed modeling analyses revealed that greater EA predicted greater suppression and avoidance only among women with higher attentional capacity. Similarly, greater EA predicted greater use of suppression, but only among women with greater vagal flexibility.ConclusionChildhood adversity may predispose individuals with high neurobiological sensitivity to a lifetime of maladaptive coping
On the Benefit of Inventory-Based Dynamic Pricing Strategies
We study the optimal pricing and replenishment decisions in an inventory system with a price-sensitive demand, focusing on the benefit of the inventory-based dynamic pricing strategy. We find that demand variability impacts the benefit of dynamic pricing not only through the magnitude of the variability but also through its functional form (e.g., whether it is additive, multiplicative, or others). We provide an approach to quantify the profit improvement of dynamic pricing over static pricing without having to solve the dynamic pricing problem. We also demonstrate that dynamic pricing is most effective when it is jointly optimized with inventory replenishment decisions, and that its advantage can be mostly realized by using one or two price changes over a replenishment cycle.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/78685/1/j.1937-5956.2009.01099.x.pd
Optimal Release of Inventory Using Online Auctions: The Two Item Case
In this paper we analyze policies for optimally disposing inventory using online auctions. We assume a seller has a fixed number of items to sell using a sequence of, possibly overlapping, single-item auctions. The decision the seller must make is when to start each auction. The decision involves a trade-off between a holding cost for each period an item remains unsold, and a higher expected final price the fewer the number of simultaneous auctions underway. Consequently the seller must trade-off the expected marginal gain for the ongoing auctions with the expected marginal cost of the unreleased items by further deferring their release. We formulate the problem as a discrete time Markov Decision Problem and consider two cases. In the first case we assume the auctions are guaranteed to be successful, while in the second case we assume there is a positive probability that an auction receives no bids. The reason for considering these two cases are that they require different analysis. We derive conditions to ensure that the optimal release policy is a control limit policy in the current price of the ongoing auctions, and provide several illustration of results. The paper focuses on the two item case which has sufficient complexity to raise challenging questions
Stafne bone cavity : magnetic resonance imaging
A case of Stafne bone cavity (SBC) affecting the body of the mandible of a 51-year-old female is reported. The imaging modalities included panoramic radiograph, computed tomography (CT) and magnetic resonance (MR) imaging. Panoramic radiograph and CT were able to determine the outline of the cavity and its three dimentional shape, but failed to precisely diagnose the soft tissue content of the cavity. MR imaging demonstrated that the bony cavity is filled with soft tissue that is continuous and identical in signal with that of the submandibular salivary gland. Based on the MR imaging a diagnosis of SBC was made and no further studies or surgical treatment were initated. MR imaging should be considered the diagnostic technique in cases where SBC is suspected. Recognition of the lesion should preclude any further treatment or surgical exploration
Efficient computation of exact solutions for quantitative model checking
Quantitative model checkers for Markov Decision Processes typically use
finite-precision arithmetic. If all the coefficients in the process are
rational numbers, then the model checking results are rational, and so they can
be computed exactly. However, exact techniques are generally too expensive or
limited in scalability. In this paper we propose a method for obtaining exact
results starting from an approximated solution in finite-precision arithmetic.
The input of the method is a description of a scheduler, which can be obtained
by a model checker using finite precision. Given a scheduler, we show how to
obtain a corresponding basis in a linear-programming problem, in such a way
that the basis is optimal whenever the scheduler attains the worst-case
probability. This correspondence is already known for discounted MDPs, we show
how to apply it in the undiscounted case provided that some preprocessing is
done. Using the correspondence, the linear-programming problem can be solved in
exact arithmetic starting from the basis obtained. As a consequence, the method
finds the worst-case probability even if the scheduler provided by the model
checker was not optimal. In our experiments, the calculation of exact solutions
from a candidate scheduler is significantly faster than the calculation using
the simplex method under exact arithmetic starting from a default basis.Comment: In Proceedings QAPL 2012, arXiv:1207.055
Drill failure during ORIF of the mandible : complication management
A case of a drill breakage during open reduction and internal fixation (ORIF) of a mandibular fracture is reported. The clinical decision, diagnosis and surgical management of the complication are described
Recommended from our members
A Mitochondrial Health Index Sensitive to Mood and Caregiving Stress.
BACKGROUND:Chronic life stress, such as the stress of caregiving, can promote pathophysiology, but the underlying cellular mechanisms are not well understood. Chronic stress may induce recalibrations in mitochondria leading to changes either in mitochondrial content per cell, or in mitochondrial functional capacity (i.e., quality). METHODS:Here we present a functional index of mitochondrial health (MHI) for human leukocytes that can distinguish between these two possibilities. The MHI integrates nuclear and mitochondrial DNA-encoded respiratory chain enzymatic activities and mitochondrial DNA copy number. We then use the MHI to test the hypothesis that daily emotional states and caregiving stress influence mitochondrial function by comparing healthy mothers of a child with an autism spectrum disorder (high-stress caregivers, n = 46) with mothers of a neurotypical child (control group, n = 45). RESULTS:The MHI outperformed individual mitochondrial function measures. Elevated positive mood at night was associated with higher MHI, and nightly positive mood was also a mediator of the association between caregiving and MHI. Moreover, MHI was correlated to positive mood on the days preceding, but not following the blood draw, suggesting for the first time in humans that mitochondria may respond to proximate emotional states within days. Correspondingly, the caregiver group, which had higher perceived stress and lower positive and greater negative daily affect, exhibited lower MHI. This effect was not explained by a mismatch between nuclear and mitochondrial genomes. CONCLUSIONS:Daily mood and chronic caregiving stress are associated with mitochondrial functional capacity. Mitochondrial health may represent a nexus between psychological stress and health
- …