25 research outputs found

    Finite Sample Analysis of Mean-Volatility Actor-Critic for Risk-Averse Reinforcement Learning

    Get PDF
    The goal in the standard reinforcement learning problem is to find a policy that optimizes the expected return. However, such an objective is not adequate in a lot of real-life applications, like finance, where controlling the uncertainty of the outcome is imperative. The mean-volatility objective penalizes, through a tunable parameter, policies with high variance of the per-step reward. An interesting property of this objective is that it admits simple linear Bellman equations that resemble, up to a reward transformation, those of the risk-neutral case. However, the required reward transformation is policy-dependent, and requires the (usually unknown) expected return of the used policy. In this work, we propose two general methods for policy evaluation under the mean-volatility objective: the direct method and the factored method. We then extend recent results for finite sample analysis in the risk-neutral actor-critic setting to the mean-volatility case. Our analysis shows that the sample complexity to attain an ϵ-accurate stationary point is the same as that of the risk-neutral version, using either policy evaluation method for training the critic. Finally, we carry out experiments to test the proposed methods in a simple environment that exhibits some trade-off between optimality, in expectation, and uncertainty of outcome

    Simultaneously Updating All Persistence Values in Reinforcement Learning

    Get PDF
    In Reinforcement Learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel operator, the All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence set. We prove the contraction property of the All-Persistence Bellman Operator and, based on it, we extend classic Q-learning and DQN. After providing a study on the effects of persistence, we experimentally evaluate our approach in both tabular contexts and more challenging frameworks, including some Atari games

    Simultaneously Updating All Persistence Values in Reinforcement Learning

    Full text link
    In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence set. We prove the contraction property of the All-Persistence Bellman Operator and, based on it, we extend classic Q-learning and DQN. After providing a study on the effects of persistence, we experimentally evaluate our approach in both tabular contexts and more challenging frameworks, including some Atari games

    Multivariate analysis of brain metabolism reveals chemotherapy effects on prefrontal cerebellar system when related to dorsal attention network

    Get PDF
    BACKGROUND: Functional brain changes induced by chemotherapy are still not well characterized. We used a novel approach with a multivariate technique to analyze brain resting state [(18) F]FDG-PET in patients with lymphoma, to explore differences on cerebral metabolic glucose rate between chemotherapy-treated and non-treated patients. METHODS: PET/CT scan was performed on 28 patients, with 14 treated with systemic chemotherapy. We used a support vector machine (SVM) classification, extracting the mean metabolism from the metabolic patterns, or networks, that discriminate the two groups. We calculated the correct classifications of the two groups using the mean metabolic values extracted by the networks. RESULTS: The SVM classification analysis gave clear-cut patterns that discriminate the two groups. The first, hypometabolic network in chemotherapy patients, included mostly prefrontal cortex and cerebellar areas (central executive network, CEN, and salience network, SN); the second, which is equal between groups, included mostly parietal areas and the frontal eye field (dorsal attention network, DAN). The correct classification membership to chemotherapy or not chemotherapy-treated patients, using only one network, was of 50% to 68%; however, when all the networks were used together, it reached 80%. CONCLUSIONS: The evidenced networks were related to attention and executive functions, with CEN and SN more specialized in shifting, inhibition and monitoring, DAN in orienting attention. Only using DAN as a reference point, indicating the global frontal functioning before chemotherapy, we could better classify the subjects. The emerging concept consists in the importance of the investigation of brain intrinsic networks and their relations in chemotherapy cognitive induced changes

    Kinetic models for optimal control of wealth inequalities

    Get PDF
    We introduce and discuss optimal control strategies for kinetic models for wealth distribution in a simple market economy, acting to minimize the variance of the wealth density among the population. Our analysis is based on a finite time horizon approximation, or model predictive control, of the corresponding control problem for the microscopic agents' dynamic and results in an alternative theoretical approach to the taxation and redistribution policy at a global level. It is shown that in general the control is able to modify the Pareto index of the stationary solution of the corresponding Boltzmann kinetic equation, and that this modification can be exactly quantified. Connections between previous Fokker-Planck based models and taxation-redistribution policies and the present approach are also discussed

    Censo de Productores de Arándanos. Partido de San Pedro – 2004

    Get PDF
    Este estudio se realizó con el objetivo de conocer la situación de la producción de arándanos en el Partido de San Pedro. Se identifican y ubican territorialmente a los establecimientos, releva información estratégica relativa a indicadores tecnológicos, el panorama varietal (grado de adopción y nº de plantas por variedad) y los sistemas de protección contra contingencias climáticas (mallas antigranizo, defensas contra heladas)EEA San PedroFil: Ros, Patricio Guillermo. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Nicolás; ArgentinaFil: Hansen, Laura. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria San Pedro; ArgentinaFil: Marcozzi, Paula. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: Gordó, Manuela. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: López Serrano, Fernando Alberto. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: Heguiabeheri, Adolfo Ricardo. Instituto Nacional de Tecnología Agropecuaria. Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: Biglia, Jorge Lorenzo. Instituto Nacional de Tecnología Agropecuaria. Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: Bisi, Marcelo Alejandro. Instituto Nacional de Tecnología Agropecuaria. Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; ArgentinaFil: Basaldúa, Beatriz Blanca. Instituto Nacional de Tecnología Agropecuaria. Estación Experimental Agropecuaria San Pedro. Agencia de Extensión Rural San Pedro; Argentin

    Efficacy and safety of reparixin in patients with severe covid-19 Pneumonia. A phase 3, randomized, double-blind placebo-controlled study

    Get PDF
    Introduction: Polymorphonuclear cell influx into the interstitial and bronchoalveolar spaces is a cardinal feature of severe coronavirus disease 2019 (COVID-19), principally mediated by interleukin-8 (IL-8). We sought to determine whether reparixin, a novel IL-8 pathway inhibitor, could reduce disease progression in patients hospitalized with severe COVID-19 pneumonia. Methods: In this Phase 3, randomized, double-blind, placebo-controlled, multicenter study, hospitalized adult patients with severe COVID-19 pneumonia were randomized 2:1 to receive oral reparixin 1200 mg three times daily or placebo for up to 21 days or until hospital discharge. The primary endpoint was the proportion of patients alive and free of respiratory failure at Day 28, with key secondary endpoints being the proportion of patients free of respiratory failure at Day 60, incidence of intensive care unit (ICU) admission by Day 28 and time to recovery by Day 28. Results: Of 279 patients randomized, 182 received at least one dose of reparixin and 88 received placebo. The proportion of patients alive and free of respiratory failure at Day 28 was similar in the two groups {83.5% versus 80.7%; odds ratio 1.63 [95% confidence interval (CI) 0.75, 3.51]; p = 0.216}. There were no statistically significant differences in the key secondary endpoints, but a numerically higher proportion of patients in the reparixin group were alive and free of respiratory failure at Day 60 (88.7% versus 84.6%; p = 0.195), fewer required ICU admissions by Day 28 (15.8% versus 21.7%; p = 0.168), and a higher proportion recovered by Day 28 compared with placebo (81.6% versus 74.9%; p = 0.167). Fewer patients experienced adverse events with reparixin than placebo (45.6% versus 54.5%), most mild or moderate intensity and not related to study treatment. Conclusions: This trial did not meet the primary efficacy endpoints, yet reparixin showed a trend toward limiting disease progression as an add-on therapy in COVID-19 severe pneumonia and was well tolerated. Trial registration: ClinicalTrials.gov: NCT04878055, EudraCT: 2020-005919-51

    Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

    Get PDF
    The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy. In this paper, we introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps, having the effect of modifying the control frequency. We start analyzing how action persistence affects the performance of the optimal policy, and then we present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence. After having provided a theoretical study of PFQI and a heuristic approach to identify the optimal persistence, we present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method
    corecore