74,739 research outputs found

    Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

    Get PDF
    An online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. Because existing online learning techniques do not yield vanishing regret for this problem, we develop a novel online learning framework over deterministic Markov decision processes with dynamic state transition and reward functions. Following that, we prove, based on a reduction to the well-studied problem of online learning with switching costs, that if the Markov decision process admits a chasing oracle (i.e., an oracle that simulates any given policy from any initial state with bounded loss), then the online learning problem can be solved with vanishing regret. Our results for the DRACC problem and its applications are then obtained by devising (randomized and deterministic) chasing oracles that exploit the particular structure of these problems

    Online optimization problems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 149-153).In this thesis, we study online optimization problems in routing and allocation applications. Online problems are problems where information is revealed incrementally, and decisions must be made before all information is available. We design and analyze algorithms for a variety of online problems, including traveling salesman problems with rejection options, generalized assignment problems, stochastic matching problems, and resource allocation problems. We use worst case competitive ratios to analyze the performance of proposed algorithms. We begin our study with online traveling salesman problems with rejection options where acceptance/rejection decisions are not required to be explicitly made. We propose an online algorithm in arbitrary metric spaces, and show that it is the best possible. We then consider problems where acceptance/rejection decisions must be made at the time when requests arrive. For dierent metric spaces, we propose dierent online algorithms, some of which are asymptotically optimal. We then consider generalized online assignment problems with budget constraints and resource constraints. We first prove that all online algorithms are arbitrarily bad for general cases. Then, under some assumptions, we propose, analyze, and empirically compare two online algorithms, a greedy algorithm and a primal dual algorithm. We study online stochastic matching problems. Instances with a fixed number of arrivals are studied first. A novel algorithm based on discretization is proposed and analyzed for unweighted problems. The same algorithm is modified to accommodate vertex-weighted cases. Finally, we consider cases where arrivals follow a Poisson Process. Finally, we consider online resource allocation problems. We first consider the problems with free but fixed inventory under certain assumptions, and present near optimal algorithms. We then relax some unrealistic assumptions. Finally, we generalize the technique to problems with flexible inventory with non-decreasing marginal costs.by Xin Lu.Ph.D

    The Value-of-Information in Matching with Queues

    Full text link
    We consider the problem of \emph{optimal matching with queues} in dynamic systems and investigate the value-of-information. In such systems, the operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching cost. This problem appears in many practical systems and the main challenges are the no-underflow constraints, and the lack of matching-reward information and system dynamics statistics. We develop two online matching algorithms: Learning-aided Reward optimAl Matching (LRAM\mathtt{LRAM}) and Dual-LRAM\mathtt{LRAM} (DRAM\mathtt{DRAM}) to effectively resolve both challenges. Both algorithms are equipped with a learning module for estimating the matching-reward information, while DRAM\mathtt{DRAM} incorporates an additional module for learning the system dynamics. We show that both algorithms achieve an O(ϵ+δr)O(\epsilon+\delta_r) close-to-optimal utility performance for any ϵ>0\epsilon>0, while DRAM\mathtt{DRAM} achieves a faster convergence speed and a better delay compared to LRAM\mathtt{LRAM}, i.e., O(δz/ϵ+log(1/ϵ)2))O(\delta_{z}/\epsilon + \log(1/\epsilon)^2)) delay and O(δz/ϵ)O(\delta_z/\epsilon) convergence under DRAM\mathtt{DRAM} compared to O(1/ϵ)O(1/\epsilon) delay and convergence under LRAM\mathtt{LRAM} (δr\delta_r and δz\delta_z are maximum estimation errors for reward and system dynamics). Our results reveal that information of different system components can play very different roles in algorithm performance and provide a systematic way for designing joint learning-control algorithms for dynamic systems
    corecore