193 research outputs found
Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability
In this paper, for POMDPs, we provide the convergence of a Q learning
algorithm for control policies using a finite history of past observations and
control actions, and, consequentially, we establish near optimality of such
limit Q functions under explicit filter stability conditions. We present
explicit error bounds relating the approximation error to the length of the
finite history window. We establish the convergence of such Q-learning
iterations under mild ergodicity assumptions on the state process during the
exploration phase. We further show that the limit fixed point equation gives an
optimal solution for an approximate belief-MDP. We then provide bounds on the
performance of the policy obtained using the limit Q values compared to the
performance of the optimal policy for the POMDP, where we also present explicit
conditions using recent results on filter stability in controlled POMDPs. While
there exist many experimental results, (i) the rigorous asymptotic convergence
(to an approximate MDP value function) for such finite-memory Q-learning
algorithms, and (ii) the near optimality with an explicit rate of convergence
(in the memory size) are results that are new to the literature, to our
knowledge.Comment: 32 pages, 12 figures. arXiv admin note: text overlap with
arXiv:2010.0745
Q-Learning for Continuous State and Action MDPs under Average Cost Criteria
For infinite-horizon average-cost criterion problems, we present several
approximation and reinforcement learning results for Markov Decision Processes
with standard Borel spaces. Toward this end, (i) we first provide a
discretization based approximation method for fully observed Markov Decision
Processes (MDPs) with continuous spaces under average cost criteria, and we
provide error bounds for the approximations when the dynamics are only weakly
continuous under certain ergodicity assumptions. In particular, we relax the
total variation condition given in prior work to weak continuity as well as
Wasserstein continuity conditions. (ii) We provide synchronous and asynchronous
Q-learning algorithms for continuous spaces via quantization, and establish
their convergence. (iii) We show that the convergence is to the optimal Q
values of the finite approximate models constructed via quantization. Our
Q-learning convergence results and their convergence to near optimality are new
for continuous spaces, and the proof method is new even for finite spaces, to
our knowledge.Comment: 3 figure
Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments
As a primary contribution, we present a convergence theorem for stochastic
iterations, and in particular, Q-learning iterates, under a general, possibly
non-Markovian, stochastic environment. Our conditions for convergence involve
an ergodicity and a positivity criterion. We provide a precise characterization
on the limit of the iterates and conditions on the environment and
initializations for convergence. As our second contribution, we discuss the
implications and applications of this theorem to a variety of stochastic
control problems with non-Markovian environments involving (i) quantized
approximations of fully observed Markov Decision Processes (MDPs) with
continuous spaces (where quantization break down the Markovian structure), (ii)
quantized approximations of belief-MDP reduced partially observable MDPS
(POMDPs) with weak Feller continuity and a mild version of filter stability
(which requires the knowledge of the model by the controller), (iii) finite
window approximations of POMDPs under a uniform controlled filter stability
(which does not require the knowledge of the model), and (iv) for multi-agent
models where convergence of learning dynamics to a new class of equilibria,
subjective Q-learning equilibria, will be studied. In addition to the
convergence theorem, some implications of the theorem above are new to the
literature and others are interpreted as applications of the convergence
theorem. Some open problems are noted.Comment: 2 figure
Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity
Reinforcement learning algorithms often require finiteness of state and
action spaces in Markov decision processes (MDPs) and various efforts have been
made in the literature towards the applicability of such algorithms for
continuous state and action spaces. In this paper, we show that under very mild
regularity conditions (in particular, involving only weak continuity of the
transition kernel of an MDP), Q-learning for standard Borel MDPs via
quantization of states and actions converge to a limit, and furthermore this
limit satisfies an optimality equation which leads to near optimality with
either explicit performance bounds or which are guaranteed to be asymptotically
optimal. Our approach builds on (i) viewing quantization as a measurement
kernel and thus a quantized MDP as a POMDP, (ii) utilizing near optimality and
convergence results of Q-learning for POMDPs, and (iii) finally,
near-optimality of finite state model approximations for MDPs with weakly
continuous kernels which we show to correspond to the fixed point of the
constructed POMDP. Thus, our paper presents a very general convergence and
approximation result for the applicability of Q-learning for continuous MDPs
Urinary tract infection in pregnant population, which empirical antimicrobial agent should be specified in each of the three trimesters?
Objective: We aimed to investigate the bacterial profile and the adequacy of antimicrobial treatment in pregnant women with urinary tract infection. Material and Methods: This retrospective observational study was conducted with 753 pregnant women who needed hospitalization because of UTI in each of the three trimesters. Midstream urine culture and antimicrobial susceptibility tests were evaluated. Results: E.Coli was the most frequently isolated bacterial agent (82.2%), followed by Klebsiella spp. (11.2%). In each of the three trimesters, E.Coli remained the most frequently isolated bacterium (86%, 82.2%, 79.5%, respectively), followed by Klebsiella spp. (9%, 11.6%, 12.2%, respectively). Enterococcus spp. were isolated as a third microbial agent, with 43 patients (5.7%) in the three trimesters. The bacteria were found to be highly sensitive to fosfomycin, with 98-99% sensitivity for E.Coli and 88-89% for Klebsiella spp. and for Enterococcus spp. 93-100% nitrofurantoin sensitivity for each of the three trimesters. Conclusions: We demonstrated that E.Coli and Klebsiella spp. are the most common bacterial agents isolated from urine culture of pregnant women with UTI in each of the three trimesters. We consider fosfomycin to be the most adequate first-line treatment regimen due to high sensitivity to the drug, ease of use and safety for use in pregnancy
- …