94,039 research outputs found

    Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

    Full text link
    The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

    Diseño para operabilidad: Una revisión de enfoques y estrategias de solución

    Get PDF
    In the last decades the chemical engineering scientific research community has largely addressed the design-foroperability problem. Such an interest responds to the fact that the operability quality of a process is determined by design, becoming evident the convenience of considering operability issues in early design stages rather than later when the impact of modifications is less effective and more expensive. The necessity of integrating design and operability is dictated by the increasing complexity of the processes as result of progressively stringent economic, quality, safety and environmental constraints. Although the design-for-operability problem concerns to practically every technical discipline, it has achieved a particular identity within the chemical engineering field due to the economic magnitude of the involved processes. The work on design and analysis for operability in chemical engineering is really vast and a complete review in terms of papers is beyond the scope of this contribution. Instead, two major approaches will be addressed and those papers that in our belief had the most significance to the development of the field will be described in some detail.En las últimas décadas, la comunidad científica de ingeniería química ha abordado intensamente el problema de diseño-para-operabilidad. Tal interés responde al hecho de que la calidad operativa de un proceso esta determinada por diseño, resultando evidente la conveniencia de considerar aspectos operativos en las etapas tempranas del diseño y no luego, cuando el impacto de las modificaciones es menos efectivo y más costoso. La necesidad de integrar diseño y operabilidad esta dictada por la creciente complejidad de los procesos como resultado de las cada vez mayores restricciones económicas, de calidad de seguridad y medioambientales. Aunque el problema de diseño para operabilidad concierne a prácticamente toda disciplina, ha adquirido una identidad particular dentro de la ingeniería química debido a la magnitud económica de los procesos involucrados. El trabajo sobre diseño y análisis para operabilidad es realmente vasto y una revisión completa en términos de artículos supera los alcances de este trabajo. En su lugar, se discutirán los dos enfoques principales y aquellos artículos que en nuestra opinión han tenido mayor impacto para el desarrollo de la disciplina serán descriptos con cierto detalle.Fil: Blanco, Anibal Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Bandoni, Jose Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentin

    Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability

    Get PDF
    Internet-of-Things (IoT) envisions an intelligent infrastructure of networked smart devices offering task-specific monitoring and control services. The unique features of IoT include extreme heterogeneity, massive number of devices, and unpredictable dynamics partially due to human interaction. These call for foundational innovations in network design and management. Ideally, it should allow efficient adaptation to changing environments, and low-cost implementation scalable to massive number of devices, subject to stringent latency constraints. To this end, the overarching goal of this paper is to outline a unified framework for online learning and management policies in IoT through joint advances in communication, networking, learning, and optimization. From the network architecture vantage point, the unified framework leverages a promising fog architecture that enables smart devices to have proximity access to cloud functionalities at the network edge, along the cloud-to-things continuum. From the algorithmic perspective, key innovations target online approaches adaptive to different degrees of nonstationarity in IoT dynamics, and their scalable model-free implementation under limited feedback that motivates blind or bandit approaches. The proposed framework aspires to offer a stepping stone that leads to systematic designs and analysis of task-specific learning and management schemes for IoT, along with a host of new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive and Scalable Communication Network

    Stochastic model predictive control for constrained networked control systems with random time delay

    Get PDF
    In this paper the continuous time stochastic constrained optimal control problem is formulated for the class of networked control systems assuming that time delays follow a discrete-time, finite Markov chain . Polytopic overapproximations of the system's trajectories are employed to produce a polyhedral inner approximation of the non-convex constraint set resulting from imposing the constraints in continuous time. The problem is cast in a Markov jump linear systems (MJLS) framework and a stochastic MPC controller is calculated explicitly, oine, coupling dynamic programming with parametric piecewise quadratic (PWQ) optimization. The calculated control law leads to stochastic stability of the closed loop system, in the mean square sense and respects the state and input constraints in continuous time

    A dynamic programming approach to constrained portfolios

    Get PDF
    This paper studies constrained portfolio problems that may involve constraints on the probability or the expected size of a shortfall of wealth or consumption. Our first contribution is that we solve the problems by dynamic programming, which is in contrast to the existing literature that applies the martingale method. More precisely, we construct the non-separable value function by formalizing the optimal constrained terminal wealth to be a (conjectured) contingent claim on the optimal non-constrained terminal wealth. This is relevant by itself, but also opens up the opportunity to derive new solutions to constrained problems. As a second contribution, we thus derive new results for non-strict constraints on the shortfall of inter¬mediate wealth and/or consumption
    corecore