193 research outputs found

    Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

    Full text link
    In this paper, for POMDPs, we provide the convergence of a Q learning algorithm for control policies using a finite history of past observations and control actions, and, consequentially, we establish near optimality of such limit Q functions under explicit filter stability conditions. We present explicit error bounds relating the approximation error to the length of the finite history window. We establish the convergence of such Q-learning iterations under mild ergodicity assumptions on the state process during the exploration phase. We further show that the limit fixed point equation gives an optimal solution for an approximate belief-MDP. We then provide bounds on the performance of the policy obtained using the limit Q values compared to the performance of the optimal policy for the POMDP, where we also present explicit conditions using recent results on filter stability in controlled POMDPs. While there exist many experimental results, (i) the rigorous asymptotic convergence (to an approximate MDP value function) for such finite-memory Q-learning algorithms, and (ii) the near optimality with an explicit rate of convergence (in the memory size) are results that are new to the literature, to our knowledge.Comment: 32 pages, 12 figures. arXiv admin note: text overlap with arXiv:2010.0745

    Q-Learning for Continuous State and Action MDPs under Average Cost Criteria

    Full text link
    For infinite-horizon average-cost criterion problems, we present several approximation and reinforcement learning results for Markov Decision Processes with standard Borel spaces. Toward this end, (i) we first provide a discretization based approximation method for fully observed Markov Decision Processes (MDPs) with continuous spaces under average cost criteria, and we provide error bounds for the approximations when the dynamics are only weakly continuous under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity as well as Wasserstein continuity conditions. (ii) We provide synchronous and asynchronous Q-learning algorithms for continuous spaces via quantization, and establish their convergence. (iii) We show that the convergence is to the optimal Q values of the finite approximate models constructed via quantization. Our Q-learning convergence results and their convergence to near optimality are new for continuous spaces, and the proof method is new even for finite spaces, to our knowledge.Comment: 3 figure

    Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

    Full text link
    As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability (which requires the knowledge of the model by the controller), (iii) finite window approximations of POMDPs under a uniform controlled filter stability (which does not require the knowledge of the model), and (iv) for multi-agent models where convergence of learning dynamics to a new class of equilibria, subjective Q-learning equilibria, will be studied. In addition to the convergence theorem, some implications of the theorem above are new to the literature and others are interpreted as applications of the convergence theorem. Some open problems are noted.Comment: 2 figure

    Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

    Full text link
    Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions converge to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a POMDP, (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs

    Urinary tract infection in pregnant population, which empirical antimicrobial agent should be specified in each of the three trimesters?

    Get PDF
    Objective: We aimed to investigate the bacterial profile and the adequacy of antimicrobial treatment in pregnant women with urinary tract infection. Material and Methods: This retrospective observational study was conducted with 753 pregnant women who needed hospitalization because of UTI in each of the three trimesters. Midstream urine culture and antimicrobial susceptibility tests were evaluated. Results: E.Coli was the most frequently isolated bacterial agent (82.2%), followed by Klebsiella spp. (11.2%). In each of the three trimesters, E.Coli remained the most frequently isolated bacterium (86%, 82.2%, 79.5%, respectively), followed by Klebsiella spp. (9%, 11.6%, 12.2%, respectively). Enterococcus spp. were isolated as a third microbial agent, with 43 patients (5.7%) in the three trimesters. The bacteria were found to be highly sensitive to fosfomycin, with 98-99% sensitivity for E.Coli and 88-89% for Klebsiella spp. and for Enterococcus spp. 93-100% nitrofurantoin sensitivity for each of the three trimesters. Conclusions: We demonstrated that E.Coli and Klebsiella spp. are the most common bacterial agents isolated from urine culture of pregnant women with UTI in each of the three trimesters. We consider fosfomycin to be the most adequate first-line treatment regimen due to high sensitivity to the drug, ease of use and safety for use in pregnancy
    • …
    corecore