464 research outputs found
Reinforcement learning in continuous state and action spaces
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains.
Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically
Double Q-learning
In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way to approximate the maximum expected value for any set of random variables. The obtained double estimator method is shown to sometimes underestimate rather than overestimate the maximum expected value. We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm. We show the new algorithm converges to the optimal policy and that it performs well in some settings in which Q-learning performs poorly due to its overestimation
Community health center efficiency: The role of grant revenue in health center efficiency
Abstract: Objective: To test the relationship between external environments, organizational characteristics, and technical efficiency in federally qualified health centers (FQHCs). We tested the relationship between grant revenue and technical efficiency in FQHCs. Principal Findings: Increased grant revenues did not increase the probability that a health center would be on the efficiency frontier. However, increased grant revenues had a negative association with technical efficiency for health centers that were not fully efficient. Data Conclusion: If all health centers were operating efficiently, anywhere from 39 to 45 million patient encounters could have been delivered instead of the actual total of 29 million in 2007. Policy makers should consider tying grant revenues to performance indicators, and future work is needed to understand the mechanisms through which diseconomies of scale are present in FQHCs
On the importance of the heterogeneity assumption in the characterization of reservoir geomechanical properties
The geomechanical analysis of a highly compartmentalized reservoir is performed to simulate
the seafloor subsidence due to gas production. The available observations over the hydrocarbon
reservoir consist of bathymetric surveys carried out before and at the end of a 10-yr
production life. The main goal is the calibration of the reservoir compressibility cM, that is,
the main geomechanical parameter controlling the surface response. Two conceptual models
are considered: in one (i) cM varies only with the depth and the vertical effective stress
(heterogeneity due to lithostratigraphic variability); in another (ii) cM varies also in the horizontal
plane, that is, it is spatially distributed within the reservoir stratigraphic units. The latter
hypothesis accounts for a possible partitioning of the reservoir due to the presence of sealing
faults and thrusts that suggests the idea of a block heterogeneous system with the number of
reservoir blocks equal to the number of uncertain parameters. The method applied here relies
on an ensemble-based data assimilation (DA) algorithm (i.e. the ensemble smoother, ES),
which incorporates the information from the bathymetric measurements into the geomechanical
model response to infer and reduce the uncertainty of the parameter cM. The outcome from
conceptual model (i) indicates that DA is effective in reducing the cM uncertainty. However,
the maximum settlement still remains underestimated, while the areal extent of the subsidence
bowl is overestimated. We demonstrate that the selection of the heterogeneous conceptual
model (ii) allows to reproduce much better the observations thus removing a clear bias of
the model structure. DA allows significantly reducing the cM uncertainty in the five blocks
(out of the seven) characterized by large volume and large pressure decline. Conversely, the
assimilation of land displacements only partially constrains the prior cM uncertainty in the
reservoir blocks marginally contributing to the cumulative seafloor subsidence, that is, blocks
with low pressure
DC-electric-field-induced and low-frequency electromodulation second-harmonic generation spectroscopy of Si(001)-SiO interfaces
The mechanism of DC-Electric-Field-Induced Second-Harmonic (EFISH) generation
at weakly nonlinear buried Si(001)-SiO interfaces is studied experimentally
in planar Si(001)-SiO-Cr MOS structures by optical second-harmonic
generation (SHG) spectroscopy with a tunable Ti:sapphire femtosecond laser. The
spectral dependence of the EFISH contribution near the direct two-photon
transition of silicon is extracted. A systematic phenomenological model of the
EFISH phenomenon, including a detailed description of the space charge region
(SCR) at the semiconductor-dielectric interface in accumulation, depletion, and
inversion regimes, has been developed. The influence of surface quantization
effects, interface states, charge traps in the oxide layer, doping
concentration and oxide thickness on nonlocal screening of the DC-electric
field and on breaking of inversion symmetry in the SCR is considered. The model
describes EFISH generation in the SCR using a Green function formalism which
takes into account all retardation and absorption effects of the fundamental
and second harmonic (SH) waves, optical interference between field-dependent
and field-independent contributions to the SH field and multiple reflection
interference in the SiO layer. Good agreement between the phenomenological
model and our recent and new EFISH spectroscopic results is demonstrated.
Finally, low-frequency electromodulated EFISH is demonstrated as a useful
differential spectroscopic technique for studies of the Si-SiO interface in
silicon-based MOS structures.Comment: 31 pages, 14 figures, 1 table, figures are also available at
http://kali.ilc.msu.su/articles/50/efish.ht
Bilateral posterior lamellar corneal transplant surgery in an infant of 17 weeks old: Surgical challenges and the added value of intraoperative optical coherence tomography
This study aimed to describe the surgical challenges, management, and value of intraoperative optical coherence tomography in a case of a bilateral Descemet Stripping Automated Endothelial Keratoplasty corneal transplantation at 17 weeks of age for the treatment of severe posterior polymorphous corneal dystrophy resulting from a de novo mutation of the OVOL2-gene
Identification of highâdimensional omicsâderived predictors for tumor growth dynamics using machine learning and pharmacometric modeling
Analytical BioScience
Increase in circulating Foxp3+CD4+CD25high regulatory T cells in nasopharyngeal carcinoma patients
Nasopharyngeal carcinoma (NPC) is an EpsteinâBarr virus-associated disease with high prevalence in Southern Chinese. Using multiparametric flow cytometry, we identified significant expansions of circulating naĂŻve and memory CD4+CD25high T cells in 56 NPC patients compared with healthy age- and sex-matched controls. These were regulatory T cells (Treg), as they overexpressed Foxp3 and GITR, and demonstrated enhanced suppressive activities against autologous CD4+CD25â T-cell proliferation in functional studies on five patients. Abundant intraepithelial infiltrations of Treg with very high levels of Foxp3 expression and absence of CCR7 expression were also detected in five primary tumours. Our current study is the first to demonstrate an expansion of functional Treg in the circulation of NPC patients and the presence of infiltrating Treg in the tumour microenvironment. As Treg may play an important role in suppressing antitumour immunity, our findings provide critical insights for clinical management of NPC
Understanding acute metabolic decompensation in propionic and methylmalonic acidemias: A deep metabolic phenotyping approach
Background: Pathophysiology of life-threatening acute metabolic decompensations (AMD) in propionic acidemia (PA) and isolated methylmalonic acidemia (MMA) is insufficiently understood. Here, we study the metabolomes of PA and MMA patients over time, to improve insight in which biochemical processes are at play during AMD. Methods: Longitudinal data from clinical chemistry analyses and metabolic assays over the life-course of 11 PA and 13 MMA patients were studied retrospectively. Direct-infusion high-resolution mass spectrometry was performed on 234 and 154 remnant dried blood spot and plasma samples of PA and MMA patients, respectively. In addition, a systematic literature search was performed on reported biomarkers. All results were integrated in an assessment of biochemical processes at play during AMD. Results: We confirmed many of the metabolite alterations reported in literature, including increases of plasma valine and isoleucine during AMD in PA patients. We revealed that plasma leucine and phenylalanine, and urinary pyruvic acid were increased during AMD in PA patients. 3-hydroxyisovaleric acid correlated positively with plasma ammonia. We found that known diagnostic biomarkers were not significantly further increased, while intermediates of the branched-chain amino acid (BCAA) degradation pathway were significantly increased during AMD. Conclusions: We revealed that during AMD in PA and MMA, BCAA and BCAA intermediates accumulate, while known diagnostic biomarkers remain essentially unaltered. This implies that these acidic BCAA intermediates are responsible for metabolic acidosis. Based on this, we suggest to measure plasma 3-hydroxyisovaleric acid and urinary ketones or 3-hydroxybutyric acid for the biochemical follow-up of a patient's metabolic stability
- âŠ