12,164 research outputs found
An Iterative Scheme for the Approximate Linear Programming Solution to the Optimal Control of a Markov Decision Process
This paper addresses the computational issues involved in the solution to an infinite-horizon optimal control problem for a Markov Decision Process (MDP) with a continuous state component and a discrete control input. The optimal Markov policy for the MDP can be determined based on the fixed point solution to the Bellman equation, which can be rephrased as a constrained Linear Program (LP) with an infinite number of constraints and an infinite dimensional optimization variable (the optimal value function). To compute an (approximate) solution to the LP, an iterative randomized scheme is proposed where the optimization variable is expressed as a linear combination of basis functions in a given class: at each iteration, the resulting semi-infinite LP is solved via constraint sampling, whereas the number of basis functions is progressively increased through the iterations so as to meet some performance goal. The effectiveness of the proposed scheme is shown on a multi-room heating system example
Some numerical methods for solving stochastic impulse control in natural gas storage facilities
The valuation of gas storage facilities is characterized as a stochastic impulse control problem with finite horizon resulting in Hamilton-Jacobi-Bellman (HJB) equations for the value function. In this context the two catagories of solving schemes for optimal switching are discussed in a stochastic control framework. We reviewed some numerical methods which include approaches related to partial differential equations (PDEs), Markov chain approximation, nonparametric regression, quantization method and some practitioners’ methods. This paper considers optimal switching problem arising in valuation of gas storage contracts for leasing the storage facilities, and investigates the recent developments as well as their advantages and disadvantages of each scheme based on dynamic programming principle (DPP
From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming
We consider linear programming (LP) problems in infinite dimensional spaces
that are in general computationally intractable. Under suitable assumptions, we
develop an approximation bridge from the infinite-dimensional LP to tractable
finite convex programs in which the performance of the approximation is
quantified explicitly. To this end, we adopt the recent developments in two
areas of randomized optimization and first order methods, leading to a priori
as well as a posterior performance guarantees. We illustrate the generality and
implications of our theoretical results in the special case of the long-run
average cost and discounted cost optimal control problems for Markov decision
processes on Borel spaces. The applicability of the theoretical results is
demonstrated through a constrained linear quadratic optimal control problem and
a fisheries management problem.Comment: 30 pages, 5 figure
Delay-Optimal User Scheduling and Inter-Cell Interference Management in Cellular Network via Distributive Stochastic Learning
In this paper, we propose a distributive queueaware intra-cell user
scheduling and inter-cell interference (ICI) management control design for a
delay-optimal celluar downlink system with M base stations (BSs), and K users
in each cell. Each BS has K downlink queues for K users respectively with
heterogeneous arrivals and delay requirements. The ICI management control is
adaptive to joint queue state information (QSI) over a slow time scale, while
the user scheduling control is adaptive to both the joint QSI and the joint
channel state information (CSI) over a faster time scale. We show that the
problem can be modeled as an infinite horizon average cost Partially Observed
Markov Decision Problem (POMDP), which is NP-hard in general. By exploiting the
special structure of the problem, we shall derive an equivalent Bellman
equation to solve the POMDP problem. To address the distributive requirement
and the issue of dimensionality and computation complexity, we derive a
distributive online stochastic learning algorithm, which only requires local
QSI and local CSI at each of the M BSs. We show that the proposed learning
algorithm converges almost surely (with probability 1) and has significant gain
compared with various baselines. The proposed solution only has linear
complexity order O(MK)
- …