888 research outputs found

    Discrete-Time Control with Non-Constant Discount Factor

    Get PDF
    This paper deals with discrete-time Markov decision processes (MDPs) with Borel state and action spaces, and total expected discounted cost optimality criterion. We assume that the discount factor is not constant: it may depend on the state and action; moreover, it can even take the extreme values zero or one. We propose sufficient conditions on the data of the model ensuring the existence of optimal control policies and allowing the characterization of the optimal value function as a solution to the dynamic programming equation. As a particular case of these MDPs with varying discount factor, we study MDPs with stopping, as well as the corresponding optimal stopping times and contact set. We show applications to switching MDPs models and, in particular, we study a pollution accumulation problem

    Partially observable Markov decision processes with partially observable random discount factors

    Get PDF
    summary:This paper deals with a class of partially observable discounted Markov decision processes defined on Borel state and action spaces, under unbounded one-stage cost. The discount rate is a stochastic process evolving according to a difference equation, which is also assumed to be partially observable. Introducing a suitable control model and filtering processes, we prove the existence of optimal control policies. In addition, we illustrate our results in a class of GI/GI/1 queueing systems where we obtain explicitly the corresponding optimality equation and the filtering process

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Theory of Stochastic Optimal Economic Growth

    Get PDF
    This paper is a survey of the theory of stochastic optimal economic growth.International Development,

    Optimal Control of Parallel Queues for Managing Volunteer Convergence

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/2/poms13224.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/1/poms13224_am.pd

    Growth-optimal portfolios under transaction costs

    Get PDF
    This paper studies a portfolio optimization problem in a discrete-time Markovian model of a financial market, in which asset price dynamics depend on an external process of economic factors. There are transaction costs with a structure that covers, in particular, the case of fixed plus proportional costs. We prove that there exists a self-financing trading strategy maximizing the average growth rate of the portfolio wealth. We show that this strategy has a Markovian form. Our result is obtained by large deviations estimates on empirical measures of the price process and by a generalization of the vanishing discount method to discontinuous transition operators.Comment: 32 page
    corecore