313 research outputs found
Improving Skin Condition Classification with a Visual Symptom Checker Trained using Reinforcement Learning
We present a visual symptom checker that combines a pre-trained Convolutional
Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question
Answering (QA) model. This method increases the classification confidence and
accuracy of the visual symptom checker, and decreases the average number of
questions asked to narrow down the differential diagnosis. A Deep Q-Network
(DQN)-based RL agent learns how to ask the patient about the presence of
symptoms in order to maximize the probability of correctly identifying the
underlying condition. The RL agent uses the visual information provided by CNN
in addition to the answers to the asked questions to guide the QA system. We
demonstrate that the RL-based approach increases the accuracy more than 20%
compared to the CNN-only approach, which only uses the visual information to
predict the condition. Moreover, the increased accuracy is up to 10% compared
to the approach that uses the visual information provided by CNN along with a
conventional decision tree-based QA system. We finally show that the RL-based
approach not only outperforms the decision tree-based approach, but also
narrows down the diagnosis faster in terms of the average number of asked
questions.Comment: Accepted for the Conference on Medical Image Computing and Computer
Assisted Intervention (MICCAI) 201
The Emergence of Norms via Contextual Agreements in Open Societies
This paper explores the emergence of norms in agents' societies when agents
play multiple -even incompatible- roles in their social contexts
simultaneously, and have limited interaction ranges. Specifically, this article
proposes two reinforcement learning methods for agents to compute agreements on
strategies for using common resources to perform joint tasks. The computation
of norms by considering agents' playing multiple roles in their social contexts
has not been studied before. To make the problem even more realistic for open
societies, we do not assume that agents share knowledge on their common
resources. So, they have to compute semantic agreements towards performing
their joint actions. %The paper reports on an empirical study of whether and
how efficiently societies of agents converge to norms, exploring the proposed
social learning processes w.r.t. different society sizes, and the ways agents
are connected. The results reported are very encouraging, regarding the speed
of the learning process as well as the convergence rate, even in quite complex
settings
Probabilistic Timed Automata with Clock-Dependent Probabilities
Probabilistic timed automata are classical timed automata extended with
discrete probability distributions over edges. We introduce clock-dependent
probabilistic timed automata, a variant of probabilistic timed automata in
which transition probabilities can depend linearly on clock values.
Clock-dependent probabilistic timed automata allow the modelling of a
continuous relationship between time passage and the likelihood of system
events. We show that the problem of deciding whether the maximum probability of
reaching a certain location is above a threshold is undecidable for
clock-dependent probabilistic timed automata. On the other hand, we show that
the maximum and minimum probability of reaching a certain location in
clock-dependent probabilistic timed automata can be approximated using a
region-graph-based approach.Comment: Full version of a paper published at RP 201
The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes
We study the never-worse relation (NWR) for Markov decision processes with an
infinite-horizon reachability objective. A state q is never worse than a state
p if the maximal probability of reaching the target set of states from p is at
most the same value from q, regard- less of the probabilities labelling the
transitions. Extremal-probability states, end components, and essential states
are all special cases of the equivalence relation induced by the NWR. Using the
NWR, states in the same equivalence class can be collapsed. Then, actions
leading to sub- optimal states can be removed. We show the natural decision
problem associated to computing the NWR is coNP-complete. Finally, we ex- tend
a previously known incomplete polynomial-time iterative algorithm to
under-approximate the NWR
The Impatient May Use Limited Optimism to Minimize Regret
Discounted-sum games provide a formal model for the study of reinforcement
learning, where the agent is enticed to get rewards early since later rewards
are discounted. When the agent interacts with the environment, she may regret
her actions, realizing that a previous choice was suboptimal given the behavior
of the environment. The main contribution of this paper is a PSPACE algorithm
for computing the minimum possible regret of a given game. To this end, several
results of independent interest are shown. (1) We identify a class of
regret-minimizing and admissible strategies that first assume that the
environment is collaborating, then assume it is adversarial---the precise
timing of the switch is key here. (2) Disregarding the computational cost of
numerical analysis, we provide an NP algorithm that checks that the regret
entailed by a given time-switching strategy exceeds a given value. (3) We show
that determining whether a strategy minimizes regret is decidable in PSPACE
Maximizing the Conditional Expected Reward for Reaching the Goal
The paper addresses the problem of computing maximal conditional expected
accumulated rewards until reaching a target state (briefly called maximal
conditional expectations) in finite-state Markov decision processes where the
condition is given as a reachability constraint. Conditional expectations of
this type can, e.g., stand for the maximal expected termination time of
probabilistic programs with non-determinism, under the condition that the
program eventually terminates, or for the worst-case expected penalty to be
paid, assuming that at least three deadlines are missed. The main results of
the paper are (i) a polynomial-time algorithm to check the finiteness of
maximal conditional expectations, (ii) PSPACE-completeness for the threshold
problem in acyclic Markov decision processes where the task is to check whether
the maximal conditional expectation exceeds a given threshold, (iii) a
pseudo-polynomial-time algorithm for the threshold problem in the general
(cyclic) case, and (iv) an exponential-time algorithm for computing the maximal
conditional expectation and an optimal scheduler.Comment: 103 pages, extended version with appendices of a paper accepted at
TACAS 201
Optimizing Performance of Continuous-Time Stochastic Systems using Timeout Synthesis
We consider parametric version of fixed-delay continuous-time Markov chains
(or equivalently deterministic and stochastic Petri nets, DSPN) where
fixed-delay transitions are specified by parameters, rather than concrete
values. Our goal is to synthesize values of these parameters that, for a given
cost function, minimise expected total cost incurred before reaching a given
set of target states. We show that under mild assumptions, optimal values of
parameters can be effectively approximated using translation to a Markov
decision process (MDP) whose actions correspond to discretized values of these
parameters
Probabilistic inference for determining options in reinforcement learning
Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks
Mean-Payoff Optimization in Continuous-Time Markov Chains with Parametric Alarms
Continuous-time Markov chains with alarms (ACTMCs) allow for alarm events
that can be non-exponentially distributed. Within parametric ACTMCs, the
parameters of alarm-event distributions are not given explicitly and can be
subject of parameter synthesis. An algorithm solving the -optimal
parameter synthesis problem for parametric ACTMCs with long-run average
optimization objectives is presented. Our approach is based on reduction of the
problem to finding long-run average optimal strategies in semi-Markov decision
processes (semi-MDPs) and sufficient discretization of parameter (i.e., action)
space. Since the set of actions in the discretized semi-MDP can be very large,
a straightforward approach based on explicit action-space construction fails to
solve even simple instances of the problem. The presented algorithm uses an
enhanced policy iteration on symbolic representations of the action space. The
soundness of the algorithm is established for parametric ACTMCs with
alarm-event distributions satisfying four mild assumptions that are shown to
hold for uniform, Dirac and Weibull distributions in particular, but are
satisfied for many other distributions as well. An experimental implementation
shows that the symbolic technique substantially improves the efficiency of the
synthesis algorithm and allows to solve instances of realistic size.Comment: This article is a full version of a paper accepted to the Conference
on Quantitative Evaluation of SysTems (QEST) 201
Dynamic programming with approximation function for nurse scheduling
Although dynamic programming could ideally solve any combinatorial optimization problem, the curse of dimensionality of the search space seriously limits its application to large optimization problems. For example, only few papers in the literature have reported the application of dynamic programming to workforce scheduling problems. This paper investigates approximate dynamic programming to tackle nurse scheduling problems of size that dynamic programming cannot tackle in practice. Nurse scheduling is one of the problems within workforce scheduling that has been tackled with a considerable number of algorithms particularly meta-heuristics. Experimental results indicate that approximate dynamic programming is a suitable method to solve this problem effectively
- …
