563 research outputs found
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
Associations between childhood adversity and daily suppression and avoidance in response to stress in adulthood: can neurobiological sensitivity help explain this relationship?
Background and objectivesAlthough it has been postulated that psychological responses to stress in adulthood are grounded in childhood experiences in the family environment, evidence has been inconsistent. This study tested whether two putative measures of neurobiological sensitivity (vagal flexibility and attentional capacity) moderated the relation between women's reported exposure to a risky childhood environment and current engagement in suppressive or avoidant coping in response to daily stress.Design and methodsAdult women (N = 158) recruited for a study of stress, coping, and aging reported on early adversity (EA) in their childhood family environment and completed a week-long daily diary in which they described their most stressful event of the day and indicated the degree to which they used suppression or avoidance in response to that event. In addition, women completed a visual tracking task during which heart rate variability and attentional capacity were assessed.ResultsMultilevel mixed modeling analyses revealed that greater EA predicted greater suppression and avoidance only among women with higher attentional capacity. Similarly, greater EA predicted greater use of suppression, but only among women with greater vagal flexibility.ConclusionChildhood adversity may predispose individuals with high neurobiological sensitivity to a lifetime of maladaptive coping
Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation
We adapt tools from information theory to analyze how an observer comes to
synchronize with the hidden states of a finitary, stationary stochastic
process. We show that synchronization is determined by both the process's
internal organization and by an observer's model of it. We analyze these
components using the convergence of state-block and block-state entropies,
comparing them to the previously known convergence properties of the Shannon
block entropy. Along the way, we introduce a hierarchy of information
quantifiers as derivatives and integrals of these entropies, which parallels a
similar hierarchy introduced for block entropy. We also draw out the duality
between synchronization properties and a process's controllability. The tools
lead to a new classification of a process's alternative representations in
terms of minimality, synchronizability, and unifilarity.Comment: 25 pages, 13 figures, 1 tabl
The Emergence of Norms via Contextual Agreements in Open Societies
This paper explores the emergence of norms in agents' societies when agents
play multiple -even incompatible- roles in their social contexts
simultaneously, and have limited interaction ranges. Specifically, this article
proposes two reinforcement learning methods for agents to compute agreements on
strategies for using common resources to perform joint tasks. The computation
of norms by considering agents' playing multiple roles in their social contexts
has not been studied before. To make the problem even more realistic for open
societies, we do not assume that agents share knowledge on their common
resources. So, they have to compute semantic agreements towards performing
their joint actions. %The paper reports on an empirical study of whether and
how efficiently societies of agents converge to norms, exploring the proposed
social learning processes w.r.t. different society sizes, and the ways agents
are connected. The results reported are very encouraging, regarding the speed
of the learning process as well as the convergence rate, even in quite complex
settings
Chronic psychosocial and financial burden accelerates 5-year telomere shortening: findings from the Coronary Artery Risk Development in Young Adults Study.
Leukocyte telomere length, a marker of immune system function, is sensitive to exposures such as psychosocial stressors and health-maintaining behaviors. Past research has determined that stress experienced in adulthood is associated with shorter telomere length, but is limited to mostly cross-sectional reports. We test whether repeated reports of chronic psychosocial and financial burden is associated with telomere length change over a 5-year period (years 15 and 20) from 969 participants in the Coronary Artery Risk Development in Young Adults (CARDIA) Study, a longitudinal, population-based cohort, ages 18-30 at time of recruitment in 1985. We further examine whether multisystem resiliency, comprised of social connections, health-maintaining behaviors, and psychological resources, mitigates the effects of repeated burden on telomere attrition over 5 years. Our results indicate that adults with high chronic burden do not show decreased telomere length over the 5-year period. However, these effects do vary by level of resiliency, as regression results revealed a significant interaction between chronic burden and multisystem resiliency. For individuals with high repeated chronic burden and low multisystem resiliency (1 SD below the mean), there was a significant 5-year shortening in telomere length, whereas no significant relationships between chronic burden and attrition were evident for those at moderate and higher levels of resiliency. These effects apply similarly across the three components of resiliency. Results imply that interventions should focus on establishing strong social connections, psychological resources, and health-maintaining behaviors when attempting to ameliorate stress-related decline in telomere length among at-risk individuals
"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts
Given the increasing popularity of customer service dialogue on Twitter,
analysis of conversation data is essential to understand trends in customer and
agent behavior for the purpose of automating customer service interactions. In
this work, we develop a novel taxonomy of fine-grained "dialogue acts"
frequently observed in customer service, showcasing acts that are more suited
to the domain than the more generic existing taxonomies. Using a sequential
SVM-HMM model, we model conversation flow, predicting the dialogue act of a
given turn in real-time. We characterize differences between customer and agent
behavior in Twitter customer service conversations, and investigate the effect
of testing our system on different customer service industries. Finally, we use
a data-driven approach to predict important conversation outcomes: customer
satisfaction, customer frustration, and overall problem resolution. We show
that the type and location of certain dialogue acts in a conversation have a
significant effect on the probability of desirable and undesirable outcomes,
and present actionable rules based on our findings. The patterns and rules we
derive can be used as guidelines for outcome-driven automated customer service
platforms.Comment: 13 pages, 6 figures, IUI 201
Probabilistic Model Checking for Energy Analysis in Software Product Lines
In a software product line (SPL), a collection of software products is
defined by their commonalities in terms of features rather than explicitly
specifying all products one-by-one. Several verification techniques were
adapted to establish temporal properties of SPLs. Symbolic and family-based
model checking have been proven to be successful for tackling the combinatorial
blow-up arising when reasoning about several feature combinations. However,
most formal verification approaches for SPLs presented in the literature focus
on the static SPLs, where the features of a product are fixed and cannot be
changed during runtime. This is in contrast to dynamic SPLs, allowing to adapt
feature combinations of a product dynamically after deployment. The main
contribution of the paper is a compositional modeling framework for dynamic
SPLs, which supports probabilistic and nondeterministic choices and allows for
quantitative analysis. We specify the feature changes during runtime within an
automata-based coordination component, enabling to reason over strategies how
to trigger dynamic feature changes for optimizing various quantitative
objectives, e.g., energy or monetary costs and reliability. For our framework
there is a natural and conceptually simple translation into the input language
of the prominent probabilistic model checker PRISM. This facilitates the
application of PRISM's powerful symbolic engine to the operational behavior of
dynamic SPLs and their family-based analysis against various quantitative
queries. We demonstrate feasibility of our approach by a case study issuing an
energy-aware bonding network device.Comment: 14 pages, 11 figure
The Impatient May Use Limited Optimism to Minimize Regret
Discounted-sum games provide a formal model for the study of reinforcement
learning, where the agent is enticed to get rewards early since later rewards
are discounted. When the agent interacts with the environment, she may regret
her actions, realizing that a previous choice was suboptimal given the behavior
of the environment. The main contribution of this paper is a PSPACE algorithm
for computing the minimum possible regret of a given game. To this end, several
results of independent interest are shown. (1) We identify a class of
regret-minimizing and admissible strategies that first assume that the
environment is collaborating, then assume it is adversarial---the precise
timing of the switch is key here. (2) Disregarding the computational cost of
numerical analysis, we provide an NP algorithm that checks that the regret
entailed by a given time-switching strategy exceeds a given value. (3) We show
that determining whether a strategy minimizes regret is decidable in PSPACE
Controller Synthesis for Autonomous Systems Interacting With Human Operators
We propose an approach to synthesize control protocols for autonomous systems that account for uncertainties and imperfections in interactions with human operators. As an illustrative example, we consider a scenario involving road network surveillance by an unmanned aerial vehicle (UAV) that is controlled remotely by a human operator but also has a certain degree of autonomy. Depending on the type (i.e., probabilistic and/or nondeterministic) of knowledge about the uncertainties and imperfections in the operatorautonomy interactions, we use abstractions based on Markov decision processes and augment these models to stochastic two-player games. Our approach enables the synthesis of operator-dependent optimal mission plans for the UAV, highlighting the effects of operator characteristics (e.g., workload, proficiency, and fatigue) on UAV mission performance; it can also provide informative feedback (e.g., Pareto curves showing the trade-offs between multiple mission objectives), potentially assisting the operator in decision-making
- …