Search CORE

563 research outputs found

Bayesian multitask inverse reinforcement learning

Author: C.A. Rothkopf
J. Choi
M.L. Puterman
T.S. Ferguson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Chalmers Research

Approximating the Termination Value of One-Counter MDPs and Stochastic Games

Author: G.R. Grimmett
J. Lambert
K. Etessami
K. Etessami
L.B. White
M.L. Puterman
T. Brázdil
Publication venue
Publication date: 01/01/2011
Field of study

One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011, invited for submission to Information and Computatio

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Associations between childhood adversity and daily suppression and avoidance in response to stress in adulthood: can neurobiological sensitivity help explain this relationship?

Author: Arenander Justine
Bush Nicole
Epel Elissa
Hagan Melissa J
Mendes Wendy Berry
Puterman Eli
Publication venue: eScholarship, University of California
Publication date: 19/11/2016
Field of study

Background and objectivesAlthough it has been postulated that psychological responses to stress in adulthood are grounded in childhood experiences in the family environment, evidence has been inconsistent. This study tested whether two putative measures of neurobiological sensitivity (vagal flexibility and attentional capacity) moderated the relation between women's reported exposure to a risky childhood environment and current engagement in suppressive or avoidant coping in response to daily stress.Design and methodsAdult women (N = 158) recruited for a study of stress, coping, and aging reported on early adversity (EA) in their childhood family environment and completed a week-long daily diary in which they described their most stressful event of the day and indicated the degree to which they used suppression or avoidance in response to that event. In addition, women completed a visual tracking task during which heart rate variability and attentional capacity were assessed.ResultsMultilevel mixed modeling analyses revealed that greater EA predicted greater suppression and avoidance only among women with higher attentional capacity. Similarly, greater EA predicted greater use of suppression, but only among women with greater vagal flexibility.ConclusionChildhood adversity may predispose individuals with high neurobiological sensitivity to a lifetime of maladaptive coping

Crossref

eScholarship - University of California

Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation

Author: Christopher J. Ellison
Cover T. M.
Elliott R. J.
Hopcroft J. E.
James P. Crutchfield
John R. Mahoney
Klamka J.
Puterman M. L.
Ryan G. James
Strogatz S.
Publication venue: 'AIP Publishing'
Publication date: 29/07/2010
Field of study

We adapt tools from information theory to analyze how an observer comes to synchronize with the hidden states of a finitary, stationary stochastic process. We show that synchronization is determined by both the process's internal organization and by an observer's model of it. We analyze these components using the convergence of state-block and block-state entropies, comparing them to the previously known convergence properties of the Shannon block entropy. Along the way, we introduce a hierarchy of information quantifiers as derivatives and integrals of these entropies, which parallels a similar hierarchy introduced for block entropy. We also draw out the duality between synchronization properties and a process's controllability. The tools lead to a new classification of a process's alternative representations in terms of minimality, synchronizability, and unifilarity.Comment: 25 pages, 13 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

The Emergence of Norms via Contextual Agreements in Open Societies

Author: AG Barto
D Fudenberg
DJ Watts
J Epstein
JR Kok
M Bowling
ML Puterman
O Sen
R Albert
Y Shoham
Publication venue
Publication date: 24/03/2015
Field of study

This paper explores the emergence of norms in agents' societies when agents play multiple -even incompatible- roles in their social contexts simultaneously, and have limited interaction ranges. Specifically, this article proposes two reinforcement learning methods for agents to compute agreements on strategies for using common resources to perform joint tasks. The computation of norms by considering agents' playing multiple roles in their social contexts has not been studied before. To make the problem even more realistic for open societies, we do not assume that agents share knowledge on their common resources. So, they have to compute semantic agreements towards performing their joint actions. %The paper reports on an empirical study of whether and how efficiently societies of agents converge to norms, exploring the proposed social learning processes w.r.t. different society sizes, and the ways agents are connected. The results reported are very encouraging, regarding the speed of the learning process as well as the convergence rate, even in quite complex settings

arXiv.org e-Print Archive

Crossref

Chronic psychosocial and financial burden accelerates 5-year telomere shortening: findings from the Coronary Artery Risk Development in Young Adults Study.

Author: AJ Schuit
AK Damjanovic
Aric A. Prather
AT Geronimus
B Mezuk
Barbara Sternfeld
C Duggan
C Schaefer
CG Parks
CK Enders
CM Aldwin
DS Lauderdale
E Puterman
E Puterman
E Puterman
E Puterman
E Puterman
E Sahin
EH Blackburn
Eli Puterman
Elissa S. Epel
ES Epel
ES Epel
ES Epel
GD Friedman
GE Miller
GL Schlomer
H Ma
IM Wentzensen
J Campisi
J Deelen
J Humphreys
J Lin
J Svensson
J Zhao
JE Verhoeven
JE Verhoeven
JJW Liu
JP Gouin
JR Piazza
Jue Lin
K Ahola
K Litzelman
L Ala-Mursula
L Bendix
L Rode
L Rode
LI Pearlin
M Booth
M Hamer
M Jaskelioff
M Kimura
ME Glickman
MF Scheier
MH Schafer
Nancy Adler
OT Njajou
PC Haycock
PG Surtees
RM Cawthon
RM Cawthon
S Cohen
S Cohen
S Cohen
S Cohen
S Jodczyk
S Richardson
S Yusuf
SE Taylor
SE Taylor
SL Bakaysa
ST Charles
T Steenstrup
TC Adam
TE Seeman
Tomás Cabeza de Baca
U Svenson
V Codd
W Chen
W Poortinga
Y Zhan
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

Leukocyte telomere length, a marker of immune system function, is sensitive to exposures such as psychosocial stressors and health-maintaining behaviors. Past research has determined that stress experienced in adulthood is associated with shorter telomere length, but is limited to mostly cross-sectional reports. We test whether repeated reports of chronic psychosocial and financial burden is associated with telomere length change over a 5-year period (years 15 and 20) from 969 participants in the Coronary Artery Risk Development in Young Adults (CARDIA) Study, a longitudinal, population-based cohort, ages 18-30 at time of recruitment in 1985. We further examine whether multisystem resiliency, comprised of social connections, health-maintaining behaviors, and psychological resources, mitigates the effects of repeated burden on telomere attrition over 5 years. Our results indicate that adults with high chronic burden do not show decreased telomere length over the 5-year period. However, these effects do vary by level of resiliency, as regression results revealed a significant interaction between chronic burden and multisystem resiliency. For individuals with high repeated chronic burden and low multisystem resiliency (1 SD below the mean), there was a significant 5-year shortening in telomere length, whereas no significant relationships between chronic burden and attrition were evident for those at moderate and higher levels of resiliency. These effects apply similarly across the three components of resiliency. Results imply that interventions should focus on establishing strong social connections, psychological resources, and health-maintaining behaviors when attempting to ameliorate stress-related decline in telomere length among at-risk individuals

Crossref

eScholarship - University of California

"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts

Author: Austin J. L.
Bird S.
Bunt H.
Core M. G.
Gasic M.
Kim S. N.
Kim S. N.
Klüwer T.
Lafferty J. D.
Mohammad S. M.
Puterman M. L.
Sacks H.
Searle J. R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/09/2017
Field of study

Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.Comment: 13 pages, 6 figures, IUI 201

arXiv.org e-Print Archive

Crossref

Probabilistic Model Checking for Energy Analysis in Software Product Lines

Author: Baier C.
Bianco A.
Chatterjee K.
Clements P.
Cordy M.
Dinkelaker T.
Dubslaff C.
Filar J.
Gomaa H.
Haverkort B.
Kang K. C.
Kulkarni V.
Millo J.-V.
Noorian M.
Puterman M.
White J.
Publication venue
Publication date: 30/12/2013
Field of study

In a software product line (SPL), a collection of software products is defined by their commonalities in terms of features rather than explicitly specifying all products one-by-one. Several verification techniques were adapted to establish temporal properties of SPLs. Symbolic and family-based model checking have been proven to be successful for tackling the combinatorial blow-up arising when reasoning about several feature combinations. However, most formal verification approaches for SPLs presented in the literature focus on the static SPLs, where the features of a product are fixed and cannot be changed during runtime. This is in contrast to dynamic SPLs, allowing to adapt feature combinations of a product dynamically after deployment. The main contribution of the paper is a compositional modeling framework for dynamic SPLs, which supports probabilistic and nondeterministic choices and allows for quantitative analysis. We specify the feature changes during runtime within an automata-based coordination component, enabling to reason over strategies how to trigger dynamic feature changes for optimizing various quantitative objectives, e.g., energy or monetary costs and reliability. For our framework there is a natural and conceptually simple translation into the input language of the prominent probabilistic model checker PRISM. This facilitates the application of PRISM's powerful symbolic engine to the operational behavior of dynamic SPLs and their family-based analysis against various quantitative queries. We demonstrate feasibility of our approach by a case study issuing an energy-aware bonding network device.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Crossref

The Impatient May Use Limited Optimism to Minimize Regret

Author: B Aminof
C Reutenauer
CJCH Watkins
E Allender
E Filiot
F Cucker
J Filar
JY Halpern
KR Apt
L Alfaro de
LS Shapley
M Jurdzinski
ML Puterman
P Hunter
R Brenguier
U Zwick
Publication venue
Publication date: 17/11/2018
Field of study

Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computing the minimum possible regret of a given game. To this end, several results of independent interest are shown. (1) We identify a class of regret-minimizing and admissible strategies that first assume that the environment is collaborating, then assume it is adversarial---the precise timing of the switch is key here. (2) Disregarding the computational cost of numerical analysis, we provide an NP algorithm that checks that the regret entailed by a given time-switching strategy exceeds a given value. (3) We show that determining whether a strategy minimizes regret is decidable in PSPACE

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

DI-fusion

Controller Synthesis for Autonomous Systems Interacting With Human Operators

Author: Cooke N. J.
Cummings M. L.
Donath D.
Humphrey L.
Kwiatkowska M.
Kwiatkowska M.
Puterman M.
Publication venue: ScholarlyCommons
Publication date: 01/04/2015
Field of study

We propose an approach to synthesize control protocols for autonomous systems that account for uncertainties and imperfections in interactions with human operators. As an illustrative example, we consider a scenario involving road network surveillance by an unmanned aerial vehicle (UAV) that is controlled remotely by a human operator but also has a certain degree of autonomy. Depending on the type (i.e., probabilistic and/or nondeterministic) of knowledge about the uncertainties and imperfections in the operatorautonomy interactions, we use abstractions based on Markov decision processes and augment these models to stochastic two-player games. Our approach enables the synthesis of operator-dependent optimal mission plans for the UAV, highlighting the effects of operator characteristics (e.g., workload, proficiency, and fatigue) on UAV mission performance; it can also provide informative feedback (e.g., Pareto curves showing the trade-offs between multiple mission objectives), potentially assisting the operator in decision-making

CiteSeerX

Crossref

ScholarlyCommons@Penn