20 research outputs found

    The embedding of the traveling salesman problem in a Markov Decision Process

    Get PDF
    In this paper we derive a new LP-relaxation of the Traveling Salesman Problem (TSP, for short). This formulation comes from first embedding the TSP in a Markov Decision Process (MDP: for short), and from perturbing this MDP appropriately

    Percentile objective criteria in limiting average Markov Control Problems

    Get PDF
    Infinite horizon Markov Control Problems, or Markov Decision Processes (MDP's, for short), have been extensively studied since the 1950's. One of the most commonly considered versions is the so-called "limiting average reward" model. In this model the controller aims to maximize the expected value of the limit-average ("long-run average") of an infinite stream of single-stage rewards or outputs. There are now a number of good algorithms for computing optimal deterministic policies in the limiting average MDP's. In this paper we adopt the point of view that there are many natural situations where the controller is interested in finding a policy that will achieve a sufficiently high long-run average reward, that is, a target level with a sufficiently high probability, that is, a percentile

    A weighted Markov decision process

    Get PDF
    The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure

    Facility Location with Stochastic Demand and Constraints on Waiting Time

    No full text
    We analyze the problem of optimal location of a set of facilities in the presence of stochastic demand and congestion. Customers travel to the closest facility to obtain service; the problem is to determine the number, locations, and capacity of the facilities. Under rather general assumptions (spatially distributed continuous demand, general arrival and service processes, and nonlinear location and capacity costs) we show that the problem can be decomposed, and construct an efficient optimization algorithm. The analysis yields several insights, including the importance of equitable facility configurations (EFCs), the behavior of optimal and near-optimal capacities, and robust class of solutions that can be constructed for this problem.facility location, stochastic demand, queueing, service level

    Percentile performance criteria for limiting average Markov Decision Processes

    No full text
    In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP’s): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases

    Using strategic idleness to improve customer service experience in service networks

    No full text
    The most common measure of waiting time is the overall expected waiting time for service. However, in service networks the perception of waiting may also depend on how it is distributed among different stations. Therefore, reducing the probability of a long wait at any station may be important in improving customers' perception of service quality. In a single-station queue it is known that the policy that minimizes the waiting time and the probability of long waits is nonidling. However, this is not necessarily the case for queueing networks with several stations. We present a family of threshold-based policies (TBPs) that strategically idle some stations. We demonstrate the advantage of strategically idling by applying TBP in a network with two single-server queues in tandem. We provide closed form results for the special case where the first station has infinite capacity and develop efficient algorithms when this is not the case. We compare TBPs with the nonidling and Kanban policies, and we discuss when a TBP is advantageous. Using simulation, we demonstrate that the analytical insights for the two-station case hold for a three-station serial queue as well.Accepted versio
    corecore