6,524 research outputs found
Dynamic programming principle for classical and singular stochastic control with discretionary stopping
We prove the dynamic programming principle (DPP) in a class of problems where
an agent controls a -dimensional diffusive dynamics via both classical and
singular controls and, moreover, is able to terminate the optimisation at a
time of her choosing, prior to a given maturity. The time-horizon of the
problem is random and it is the smallest between a fixed terminal time and the
first exit time of the state dynamics from a Borel set. We consider both the
cases in which the total available fuel for the singular control is either
bounded or unbounded. We build upon existing proofs of DPP and extend results
available in the traditional literature on singular control (e.g., Haussmann
and Suo, SIAM J. Control Optim., 33, 1995) by relaxing some key assumptions and
including the discretionary stopping feature. We also connect with more general
versions of the DPP (e.g., Bouchard and Touzi, SIAM J. Control Optim., 49,
2011) by showing in detail how our class of problems meets the abstract
requirements therein
Interbank lending with benchmark rates: Pareto optima for a class of singular control games
We analyze a class of stochastic differential games of singular control,
motivated by the study of a dynamic model of interbank lending with benchmark
rates. We describe Pareto optima for this game and show how they may be
achieved through the intervention of a regulator, whose policy is a solution to
a singular stochastic control problem. Pareto optima are characterized in terms
of the solutions to a new class of Skorokhod problems with piecewise-continuous
free boundary.
Pareto optimal policies are shown to correspond to the enforcement of
endogenous bounds on interbank lending rates. Analytical comparison between
Pareto optima and Nash equilibria provides insight into the impact of
regulatory intervention on the stability of interbank rates.Comment: 31 pages; 1 figur
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
This paper presents the MAXQ approach to hierarchical reinforcement learning
based on decomposing the target Markov decision process (MDP) into a hierarchy
of smaller MDPs and decomposing the value function of the target MDP into an
additive combination of the value functions of the smaller MDPs. The paper
defines the MAXQ hierarchy, proves formal results on its representational
power, and establishes five conditions for the safe use of state abstractions.
The paper presents an online model-free learning algorithm, MAXQ-Q, and proves
that it converges wih probability 1 to a kind of locally-optimal policy known
as a recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q
through a series of experiments in three domains and shows experimentally that
MAXQ-Q (with state abstractions) converges to a recursively optimal policy much
faster than flat Q learning. The fact that MAXQ learns a representation of the
value function has an important benefit: it makes it possible to compute and
execute an improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates the
effectiveness of this non-hierarchical execution experimentally. Finally, the
paper concludes with a comparison to related work and a discussion of the
design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure
Numerical investigation of the heterogeneous combustion processes of solid fuels
Two-phase computational modelling based on the Euler–Euler was developed to investigate the heterogeneous combustion processes of biomass, in the solid carbon phase, inside a newly designed combustion chamber (Model 1). A transient simulation was carried out for a small amount of carbon powder situated in a cup which was located at the centre of the combustion chamber. A heat source was provided to initiate the combustion with the air supplied by three injection nozzles. The results show that the combustion is sustained in the chamber, as evidenced by the flame temperature. An axisymmetric combustion model (Model 2) based on the Euler–Lagrange approach was formulated to model the combustion of pulverized coal. Three cases with three different char oxidation models are presented. The predicted results have good agreement with the available experimental data and showed that the combustion inside the reactor was affected by the particulate size. A number of simulations were carried out to find the best values of parameters suitable for predicting NOx pollutants
- …