464 research outputs found

    Reinforcement learning in continuous state and action spaces

    Get PDF
    Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

    Double Q-learning

    Get PDF
    In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way to approximate the maximum expected value for any set of random variables. The obtained double estimator method is shown to sometimes underestimate rather than overestimate the maximum expected value. We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm. We show the new algorithm converges to the optimal policy and that it performs well in some settings in which Q-learning performs poorly due to its overestimation

    Community health center efficiency: The role of grant revenue in health center efficiency

    Get PDF
    Abstract: Objective: To test the relationship between external environments, organizational characteristics, and technical efficiency in federally qualified health centers (FQHCs). We tested the relationship between grant revenue and technical efficiency in FQHCs. Principal Findings: Increased grant revenues did not increase the probability that a health center would be on the efficiency frontier. However, increased grant revenues had a negative association with technical efficiency for health centers that were not fully efficient. Data Conclusion: If all health centers were operating efficiently, anywhere from 39 to 45 million patient encounters could have been delivered instead of the actual total of 29 million in 2007. Policy makers should consider tying grant revenues to performance indicators, and future work is needed to understand the mechanisms through which diseconomies of scale are present in FQHCs

    On the importance of the heterogeneity assumption in the characterization of reservoir geomechanical properties

    Get PDF
    The geomechanical analysis of a highly compartmentalized reservoir is performed to simulate the seafloor subsidence due to gas production. The available observations over the hydrocarbon reservoir consist of bathymetric surveys carried out before and at the end of a 10-yr production life. The main goal is the calibration of the reservoir compressibility cM, that is, the main geomechanical parameter controlling the surface response. Two conceptual models are considered: in one (i) cM varies only with the depth and the vertical effective stress (heterogeneity due to lithostratigraphic variability); in another (ii) cM varies also in the horizontal plane, that is, it is spatially distributed within the reservoir stratigraphic units. The latter hypothesis accounts for a possible partitioning of the reservoir due to the presence of sealing faults and thrusts that suggests the idea of a block heterogeneous system with the number of reservoir blocks equal to the number of uncertain parameters. The method applied here relies on an ensemble-based data assimilation (DA) algorithm (i.e. the ensemble smoother, ES), which incorporates the information from the bathymetric measurements into the geomechanical model response to infer and reduce the uncertainty of the parameter cM. The outcome from conceptual model (i) indicates that DA is effective in reducing the cM uncertainty. However, the maximum settlement still remains underestimated, while the areal extent of the subsidence bowl is overestimated. We demonstrate that the selection of the heterogeneous conceptual model (ii) allows to reproduce much better the observations thus removing a clear bias of the model structure. DA allows significantly reducing the cM uncertainty in the five blocks (out of the seven) characterized by large volume and large pressure decline. Conversely, the assimilation of land displacements only partially constrains the prior cM uncertainty in the reservoir blocks marginally contributing to the cumulative seafloor subsidence, that is, blocks with low pressure

    DC-electric-field-induced and low-frequency electromodulation second-harmonic generation spectroscopy of Si(001)-SiO2_2 interfaces

    Get PDF
    The mechanism of DC-Electric-Field-Induced Second-Harmonic (EFISH) generation at weakly nonlinear buried Si(001)-SiO2_2 interfaces is studied experimentally in planar Si(001)-SiO2_2-Cr MOS structures by optical second-harmonic generation (SHG) spectroscopy with a tunable Ti:sapphire femtosecond laser. The spectral dependence of the EFISH contribution near the direct two-photon E1E_1 transition of silicon is extracted. A systematic phenomenological model of the EFISH phenomenon, including a detailed description of the space charge region (SCR) at the semiconductor-dielectric interface in accumulation, depletion, and inversion regimes, has been developed. The influence of surface quantization effects, interface states, charge traps in the oxide layer, doping concentration and oxide thickness on nonlocal screening of the DC-electric field and on breaking of inversion symmetry in the SCR is considered. The model describes EFISH generation in the SCR using a Green function formalism which takes into account all retardation and absorption effects of the fundamental and second harmonic (SH) waves, optical interference between field-dependent and field-independent contributions to the SH field and multiple reflection interference in the SiO2_2 layer. Good agreement between the phenomenological model and our recent and new EFISH spectroscopic results is demonstrated. Finally, low-frequency electromodulated EFISH is demonstrated as a useful differential spectroscopic technique for studies of the Si-SiO2_2 interface in silicon-based MOS structures.Comment: 31 pages, 14 figures, 1 table, figures are also available at http://kali.ilc.msu.su/articles/50/efish.ht

    Bilateral posterior lamellar corneal transplant surgery in an infant of 17 weeks old: Surgical challenges and the added value of intraoperative optical coherence tomography

    Get PDF
    This study aimed to describe the surgical challenges, management, and value of intraoperative optical coherence tomography in a case of a bilateral Descemet Stripping Automated Endothelial Keratoplasty corneal transplantation at 17 weeks of age for the treatment of severe posterior polymorphous corneal dystrophy resulting from a de novo mutation of the OVOL2-gene

    Increase in circulating Foxp3+CD4+CD25high regulatory T cells in nasopharyngeal carcinoma patients

    Get PDF
    Nasopharyngeal carcinoma (NPC) is an Epstein–Barr virus-associated disease with high prevalence in Southern Chinese. Using multiparametric flow cytometry, we identified significant expansions of circulating naïve and memory CD4+CD25high T cells in 56 NPC patients compared with healthy age- and sex-matched controls. These were regulatory T cells (Treg), as they overexpressed Foxp3 and GITR, and demonstrated enhanced suppressive activities against autologous CD4+CD25− T-cell proliferation in functional studies on five patients. Abundant intraepithelial infiltrations of Treg with very high levels of Foxp3 expression and absence of CCR7 expression were also detected in five primary tumours. Our current study is the first to demonstrate an expansion of functional Treg in the circulation of NPC patients and the presence of infiltrating Treg in the tumour microenvironment. As Treg may play an important role in suppressing antitumour immunity, our findings provide critical insights for clinical management of NPC

    Understanding acute metabolic decompensation in propionic and methylmalonic acidemias: A deep metabolic phenotyping approach

    Get PDF
    Background: Pathophysiology of life-threatening acute metabolic decompensations (AMD) in propionic acidemia (PA) and isolated methylmalonic acidemia (MMA) is insufficiently understood. Here, we study the metabolomes of PA and MMA patients over time, to improve insight in which biochemical processes are at play during AMD. Methods: Longitudinal data from clinical chemistry analyses and metabolic assays over the life-course of 11 PA and 13 MMA patients were studied retrospectively. Direct-infusion high-resolution mass spectrometry was performed on 234 and 154 remnant dried blood spot and plasma samples of PA and MMA patients, respectively. In addition, a systematic literature search was performed on reported biomarkers. All results were integrated in an assessment of biochemical processes at play during AMD. Results: We confirmed many of the metabolite alterations reported in literature, including increases of plasma valine and isoleucine during AMD in PA patients. We revealed that plasma leucine and phenylalanine, and urinary pyruvic acid were increased during AMD in PA patients. 3-hydroxyisovaleric acid correlated positively with plasma ammonia. We found that known diagnostic biomarkers were not significantly further increased, while intermediates of the branched-chain amino acid (BCAA) degradation pathway were significantly increased during AMD. Conclusions: We revealed that during AMD in PA and MMA, BCAA and BCAA intermediates accumulate, while known diagnostic biomarkers remain essentially unaltered. This implies that these acidic BCAA intermediates are responsible for metabolic acidosis. Based on this, we suggest to measure plasma 3-hydroxyisovaleric acid and urinary ketones or 3-hydroxybutyric acid for the biochemical follow-up of a patient's metabolic stability
    • 

    corecore