643 research outputs found

    Weakly Coupled Deep Q-Networks

    Full text link
    We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value. This guides the main DQN agent towards optimality. We show that the tabular version, weakly coupled Q-learning (WCQL), converges almost surely to the optimal action value. Numerical experiments show faster convergence compared to DQN and related techniques in settings with as many as 10 subproblems, 3103^{10} total actions, and a continuous state space.Comment: To appear in proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    On bounds for network revenue management

    Get PDF
    The Network Revenue Management problem can be formulated as a stochastic dynamic programming problem (DP or the\optimal" solution V *) whose exact solution is computationally intractable. Consequently, a number of heuristics have been proposed in the literature, the most popular of which are the deterministic linear programming (DLP) model, and a simulation based method, the randomized linear programming (RLP) model. Both methods give upper bounds on the optimal solution value (DLP and PHLP respectively). These bounds are used to provide control values that can be used in practice to make accept/deny decisions for booking requests. Recently Adelman [1] and Topaloglu [18] have proposed alternate upper bounds, the affine relaxation (AR) bound and the Lagrangian relaxation (LR) bound respectively, and showed that their bounds are tighter than the DLP bound. Tight bounds are of great interest as it appears from empirical studies and practical experience that models that give tighter bounds also lead to better controls (better in the sense that they lead to more revenue). In this paper we give tightened versions of three bounds, calling themsAR (strong Affine Relaxation), sLR (strong Lagrangian Relaxation) and sPHLP (strong Perfect Hindsight LP), and show relations between them. Speciffically, we show that the sPHLP bound is tighter than sLR bound and sAR bound is tighter than the LR bound. The techniques for deriving the sLR and sPHLP bounds can potentially be applied to other instances of weakly-coupled dynamic programming.revenue management, bid-prices, relaxations, bounds

    Optimal index rules for single resource allocation to stochastic dynamic competitors

    Get PDF
    In this paper we present a generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors. The main goal is to identify sufficient conditions under which this problem is optimally solved by an index rule. The main focus is on the frozen-if-not-allocated assumption, which is notoriously found in problems including the multi-armed bandit problem, tax problem, Klimov network, job sequencing, object search and detection. The problem is approached by a Lagrangian relaxation and decomposed into a collection of normalized parametric single-competitor subproblems, which are then optimally solved by the well-known Gittins index. We show that the problem is equivalent to solving a time sequence of its Lagrangian relaxations. We further show that our approach gives insights on sufficient conditions for optimality of index rules in restless problems (in which the frozen-if-not-allocated assumption is dropped) with single resource; this paper is the first to prove such conditions
    • …
    corecore