109 research outputs found

    Zap Q-Learning for Optimal Stopping Time Problems

    Full text link
    The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of Rn\mathbb{R}^n. We build on the dynamic programming approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning algorithm to estimate the optimal state-action value function, which then defines an optimal stopping rule. We provide insights as to why the convergence rate of this algorithm can be slow, and propose a fast-converging alternative, the "Zap-Q-learning" algorithm, designed to achieve optimal rate of convergence. For the first time, we prove the convergence of the Zap-Q-learning algorithm under the assumption of linear function approximation setting. We use ODE analysis for the proof, and the optimal asymptotic variance property of the algorithm is reflected via fast convergence in a finance example

    Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

    Full text link
    This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of O(1/n)O(1/n), subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design

    The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

    Full text link
    The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: θn+1=θn+αn+1f(θn,Φn+1),n0, \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, in which each θnd\theta_n\in\Re^d, {Φn} \{ \Phi_n \} is a Markov chain on a general state space X with stationary distribution π\pi, and f:d×Xdf:\Re^d\times \text{X} \to\Re^d. In addition to standard Lipschitz bounds on ff, and conditions on the vanishing step-size sequence {αn}\{\alpha_n\}, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted θ\theta^*, where fˉ(θ)=E[f(θ,Φ)]\bar f(\theta)=E[f(\theta,\Phi)] with Φπ\Phi\sim\pi. Moreover, the ODE@\infty defined with respect to the vector field, fˉ(θ):=limrr1fˉ(rθ),θd, \bar f_\infty(\theta):= \lim_{r\to\infty} r^{-1} \bar f(r\theta) \,,\qquad \theta\in\Re^d, is asymptotically stable. The main contributions are summarized as follows: (i) The sequence θ\theta is convergent if Φ\Phi is geometrically ergodic, and subject to compatible bounds on ff. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process {θn,Φn}\{\theta_n,\Phi_n\} that implies convergence of {θn}\{ \theta_n\} in L4L_4. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error zn:=(θnθ)/αnz_n:= (\theta_n-\theta^*)/\sqrt{\alpha_n}. Moment bounds combined with the CLT imply convergence of the normalized covariance, limnE[znznT]=Σθ, \lim_{n \to \infty} E [ z_n z_n^T ] = \Sigma_\theta, where Σθ\Sigma_\theta is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain Φ\Phi is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded

    Multifunctional gold nanostar conjugates for tumor imaging and combined photothermal and chemo-therapy

    Get PDF
    Uniform gold nanostars (Au NS) were conjugated with cyclic RGD (cRGD) and near infrared (NIR) fluorescence probe (MPA) or anti-cancer drug (DOX) to obtain multi-functional nanoconstructs, Au-cRGD-MPA and Au-cRGD-DOX respectively. The NIR contrast agent Au-cRGD-MPA was shown to have low cytotoxicity. Using tumor cells and tumor bearing mice, these imaging nanoparticles demonstrated favorable tumor-targeting capability mediated by RGD peptide binding to its over-expressed receptor on the tumor cells. The multi-therapeutic analogue, Au-cRGD-DOX, integrates targeting tumor, chemotherapy and photo-thermotherapy into a single system. The synergistic effect of photo-thermal therapy and chemotherapy was demonstrated in different tumor cell lines and in vivo using S180 tumor-bearing mouse models. The viability of MDA-MB-231 cells was only 40 % after incubation with Au-cRGD-DOX and irradiation with NIR light. Both tail vein and intratumoral injections showed Au-cRGD-DOX treated mice exhibiting the slowest tumor increase. These results indicate that the multifunctional nanoconstruct is a promising combined therapeutic agent for tumor-targeting treatment, with the potential to enhance the anti-cancer treatment outcomes

    Initial ablation ratio predicts the recurrence of low-risk papillary thyroid microcarcinomas treated with microwave ablation: a 5-year, single-institution cohort study

    Get PDF
    Objective: To assess the long-term efficacy and safety of microwave ablation (MWA) in treating low-risk papillary thyroid microcarcinomas (PTMC) and to identify predictive factors for the postoperative local tumor progression of PTMC. Methods: A total of 154 low-risk PTMC patients treated with MWA who were followed up for at least 3 months were retrospectively recruited. Ultrasonography was performed after MWA to assess the local tumor progression. Adverse events associated with MWA were recorded. The ablated volume (Va) and initial ablation ratio (IAR) were measured to assess their influences on the recurrence risk of PTMC. Results: The mean tumor volume of PTMC before MWA was 0.071 (0.039, 0.121) cm3, with a maximum diameter of 0.60 ± 0.18 cm. All PTMC patients were followed up for 6 (3, 18) months. Va increased immediately after MWA, then gradually decreased over time, till significantly smaller at 12 months than that before MWA (P 2.0 mU/L) of PTMC patients were not correlated with local tumor progression. Conclusion: MWA is an effective therapeutic strategy for low-risk PTMC with high safety. The maximum tumor diameter and IAR are predictive factors for the local tumor progression of PTMC after MWA

    Overexpression of the Glutathione Peroxidase 5 (RcGPX5) Gene From Rhodiola crenulata Increases Drought Tolerance in Salvia miltiorrhiza

    Get PDF
    Excessive cellular accumulation of reactive oxygen species (ROS) due to environmental stresses can critically disrupt plant development and negatively affect productivity. Plant glutathione peroxidases (GPXs) play an important role in ROS scavenging by catalyzing the reduction of H2O2 and other organic hydroperoxides to protect plant cells from oxidative stress damage. RcGPX5, a member of the GPX gene family, was isolated from a traditional medicinal plant Rhodiola crenulata and constitutively expressed in Salvia miltiorrhiza under control of the CaMV 35S promoter. Transgenic plants showed increased tolerance to oxidative stress caused by application of H2O2 and drought, and had reduced production of malondialdehyde (MDA) compared with the wild type. Under drought stress, seedlings of the transgenic lines wilted later than the wild type and recovered growth 1 day after re-watering. In addition, the reduced glutathione (GSH) and total glutathione (T-GSH) contents were higher in the transgenic lines, with increased enzyme activities including glutathione reductase (GR), ascorbate peroxidase (APX), and GPX. These changes prevent H2O2 and O2- accumulation in cells of the transgenic lines compared with wild type. Overexpression of RcGPX5 alters the relative expression levels of multiple endogenous genes in S. miltiorrhiza, including transcription factor genes and genes in the ROS and ABA pathways. In particular, RcGPX5 expression increases the mass of S. miltiorrhiza roots while reducing the concentration of the active ingredients. These results show that heterologous expression of RcGPX5 in S. miltiorrhiza can affect the regulation of multiple biochemical pathways to confer tolerance to drought stress, and RcGPX5 might act as a competitor with secondary metabolites in the S. miltiorrhiza response to environmental stimuli
    corecore