109 research outputs found
Zap Q-Learning for Optimal Stopping Time Problems
The objective in this paper is to obtain fast converging reinforcement
learning algorithms to approximate solutions to the problem of discounted cost
optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on
a compact subset of . We build on the dynamic programming
approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning
algorithm to estimate the optimal state-action value function, which then
defines an optimal stopping rule. We provide insights as to why the convergence
rate of this algorithm can be slow, and propose a fast-converging alternative,
the "Zap-Q-learning" algorithm, designed to achieve optimal rate of
convergence. For the first time, we prove the convergence of the Zap-Q-learning
algorithm under the assumption of linear function approximation setting. We use
ODE analysis for the proof, and the optimal asymptotic variance property of the
algorithm is reflected via fast convergence in a finance example
Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation
This paper concerns error bounds for recursive equations subject to Markovian
disturbances. Motivating examples abound within the fields of Markov chain
Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these
algorithms can be interpreted as special cases of stochastic approximation
(SA). It is argued that it is not possible in general to obtain a Hoeffding
bound on the error sequence, even when the underlying Markov chain is
reversible and geometrically ergodic, such as the M/M/1 queue. This is
motivation for the focus on mean square error bounds for parameter estimates.
It is shown that mean square error achieves the optimal rate of ,
subject to conditions on the step-size sequence. Moreover, the exact constants
in the rate are obtained, which is of great value in algorithm design
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
The paper concerns convergence and asymptotic statistics for stochastic
approximation driven by Markovian noise: in which each
, is a Markov chain on a general state space
X with stationary distribution , and . In
addition to standard Lipschitz bounds on , and conditions on the vanishing
step-size sequence , it is assumed that the associated ODE is
globally asymptotically stable with stationary point denoted , where
with . Moreover, the
ODE@ defined with respect to the vector field, is asymptotically stable. The main contributions are
summarized as follows:
(i) The sequence is convergent if is geometrically ergodic,
and subject to compatible bounds on .
The remaining results are established under a stronger assumption on the
Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift
condition known as (DV3).
(ii) A Lyapunov function is constructed for the joint process
that implies convergence of in .
(iii) A functional CLT is established, as well as the usual one-dimensional
CLT for the normalized error .
Moment bounds combined with the CLT imply convergence of the normalized
covariance, where
is the asymptotic covariance appearing in the CLT.
(iv) An example is provided where the Markov chain is geometrically
ergodic but it does not satisfy (DV3). While the algorithm is convergent, the
second moment is unbounded
Multifunctional gold nanostar conjugates for tumor imaging and combined photothermal and chemo-therapy
Uniform gold nanostars (Au NS) were conjugated with cyclic RGD (cRGD) and near infrared (NIR) fluorescence probe (MPA) or anti-cancer drug (DOX) to obtain multi-functional nanoconstructs, Au-cRGD-MPA and Au-cRGD-DOX respectively. The NIR contrast agent Au-cRGD-MPA was shown to have low cytotoxicity. Using tumor cells and tumor bearing mice, these imaging nanoparticles demonstrated favorable tumor-targeting capability mediated by RGD peptide binding to its over-expressed receptor on the tumor cells. The multi-therapeutic analogue, Au-cRGD-DOX, integrates targeting tumor, chemotherapy and photo-thermotherapy into a single system. The synergistic effect of photo-thermal therapy and chemotherapy was demonstrated in different tumor cell lines and in vivo using S180 tumor-bearing mouse models. The viability of MDA-MB-231 cells was only 40 % after incubation with Au-cRGD-DOX and irradiation with NIR light. Both tail vein and intratumoral injections showed Au-cRGD-DOX treated mice exhibiting the slowest tumor increase. These results indicate that the multifunctional nanoconstruct is a promising combined therapeutic agent for tumor-targeting treatment, with the potential to enhance the anti-cancer treatment outcomes
Initial ablation ratio predicts the recurrence of low-risk papillary thyroid microcarcinomas treated with microwave ablation: a 5-year, single-institution cohort study
Objective: To assess the long-term efficacy and safety of microwave ablation (MWA) in treating low-risk papillary thyroid microcarcinomas (PTMC) and to identify predictive factors for the postoperative local tumor progression of PTMC.
Methods: A total of 154 low-risk PTMC patients treated with MWA who were followed up for at least 3 months were retrospectively recruited. Ultrasonography was performed after MWA to assess the local tumor progression. Adverse events associated with MWA were recorded. The ablated volume (Va) and initial ablation ratio (IAR) were measured to assess their influences on the recurrence risk of PTMC.
Results: The mean tumor volume of PTMC before MWA was 0.071 (0.039, 0.121) cm3, with a maximum diameter of 0.60 ± 0.18 cm. All PTMC patients were followed up for 6 (3, 18) months. Va increased immediately after MWA, then gradually decreased over time, till significantly smaller at 12 months than that before MWA (P 2.0 mU/L) of PTMC patients were not correlated with local tumor progression.
Conclusion: MWA is an effective therapeutic strategy for low-risk PTMC with high safety. The maximum tumor diameter and IAR are predictive factors for the local tumor progression of PTMC after MWA
Overexpression of the Glutathione Peroxidase 5 (RcGPX5) Gene From Rhodiola crenulata Increases Drought Tolerance in Salvia miltiorrhiza
Excessive cellular accumulation of reactive oxygen species (ROS) due to environmental stresses can critically disrupt plant development and negatively affect productivity. Plant glutathione peroxidases (GPXs) play an important role in ROS scavenging by catalyzing the reduction of H2O2 and other organic hydroperoxides to protect plant cells from oxidative stress damage. RcGPX5, a member of the GPX gene family, was isolated from a traditional medicinal plant Rhodiola crenulata and constitutively expressed in Salvia miltiorrhiza under control of the CaMV 35S promoter. Transgenic plants showed increased tolerance to oxidative stress caused by application of H2O2 and drought, and had reduced production of malondialdehyde (MDA) compared with the wild type. Under drought stress, seedlings of the transgenic lines wilted later than the wild type and recovered growth 1 day after re-watering. In addition, the reduced glutathione (GSH) and total glutathione (T-GSH) contents were higher in the transgenic lines, with increased enzyme activities including glutathione reductase (GR), ascorbate peroxidase (APX), and GPX. These changes prevent H2O2 and O2- accumulation in cells of the transgenic lines compared with wild type. Overexpression of RcGPX5 alters the relative expression levels of multiple endogenous genes in S. miltiorrhiza, including transcription factor genes and genes in the ROS and ABA pathways. In particular, RcGPX5 expression increases the mass of S. miltiorrhiza roots while reducing the concentration of the active ingredients. These results show that heterologous expression of RcGPX5 in S. miltiorrhiza can affect the regulation of multiple biochemical pathways to confer tolerance to drought stress, and RcGPX5 might act as a competitor with secondary metabolites in the S. miltiorrhiza response to environmental stimuli
- …