643 research outputs found
Weakly Coupled Deep Q-Networks
We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement
learning algorithm that enhances performance in a class of structured problems
called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of
multiple independent subproblems connected by an action space constraint, which
is a structural property that frequently emerges in practice. Despite this
appealing structure, WCMDPs quickly become intractable as the number of
subproblems grows. WCDQN employs a single network to train multiple DQN
"subagents", one for each subproblem, and then combine their solutions to
establish an upper bound on the optimal action value. This guides the main DQN
agent towards optimality. We show that the tabular version, weakly coupled
Q-learning (WCQL), converges almost surely to the optimal action value.
Numerical experiments show faster convergence compared to DQN and related
techniques in settings with as many as 10 subproblems, total actions,
and a continuous state space.Comment: To appear in proceedings of the 37th Conference on Neural Information
Processing Systems (NeurIPS 2023
On bounds for network revenue management
The Network Revenue Management problem can be formulated as a stochastic dynamic programming problem (DP or the\optimal" solution V *) whose exact solution is computationally intractable. Consequently, a number of heuristics have been proposed in the literature, the most popular of which are the deterministic linear programming (DLP) model, and a simulation based method, the randomized linear programming (RLP) model. Both methods give upper bounds on the optimal solution value (DLP and PHLP respectively). These bounds are used to provide control values that can be used in practice to make accept/deny decisions for booking requests. Recently Adelman [1] and Topaloglu [18] have proposed alternate upper bounds, the affine relaxation (AR) bound and the Lagrangian relaxation (LR) bound respectively, and showed that their bounds are tighter than the DLP bound. Tight bounds are of great interest as it appears from empirical studies and practical experience that models that give tighter bounds also lead to better controls (better in the sense that they lead to more revenue). In this paper we give tightened versions of three bounds, calling themsAR (strong Affine Relaxation), sLR (strong Lagrangian Relaxation) and sPHLP (strong Perfect Hindsight LP), and show relations between them. Speciffically, we show that the sPHLP bound is tighter than sLR bound and sAR bound is tighter than the LR bound. The techniques for deriving the sLR and sPHLP bounds can potentially be applied to other instances of weakly-coupled dynamic programming.revenue management, bid-prices, relaxations, bounds
Optimal index rules for single resource allocation to stochastic dynamic competitors
In this paper we present a generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors. The main goal is to identify sufficient conditions under which this problem is optimally solved by an index rule. The main focus is on the frozen-if-not-allocated assumption, which is notoriously found in problems including the multi-armed bandit problem, tax problem, Klimov network, job sequencing, object search and detection. The problem is approached by a Lagrangian relaxation and decomposed into a collection of normalized parametric single-competitor subproblems, which are then optimally solved by the well-known Gittins index. We show that the problem is equivalent to solving a time sequence of its Lagrangian relaxations. We further show that our approach gives insights on sufficient conditions for optimality of index rules in restless problems (in which the frozen-if-not-allocated assumption is dropped) with single resource; this paper is the first to prove such conditions
- …