20 research outputs found
The embedding of the traveling salesman problem in a Markov Decision Process
In this paper we derive a new LP-relaxation of the Traveling
Salesman Problem (TSP, for short). This formulation
comes from first embedding the TSP in a Markov Decision
Process (MDP: for short), and from perturbing this MDP
appropriately
Percentile objective criteria in limiting average Markov Control Problems
Infinite horizon Markov Control Problems, or Markov Decision
Processes (MDP's, for short), have been extensively studied since
the 1950's. One of the most commonly considered versions is
the so-called "limiting average reward" model. In this model
the controller aims to maximize the expected value of the limit-average
("long-run average") of an infinite stream of single-stage
rewards or outputs. There are now a number of good algorithms
for computing optimal deterministic policies in the limiting average
MDP's. In this paper we adopt the point of view that there are
many natural situations where the controller is interested in finding
a policy that will achieve a sufficiently high long-run average
reward, that is, a target level with a sufficiently high probability,
that is, a percentile
A weighted Markov decision process
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure
Analysis of supply contracts with minimum total order quantity commitments and non-stationary demands
Facility Location with Stochastic Demand and Constraints on Waiting Time
We analyze the problem of optimal location of a set of facilities in the presence of stochastic demand and congestion. Customers travel to the closest facility to obtain service; the problem is to determine the number, locations, and capacity of the facilities. Under rather general assumptions (spatially distributed continuous demand, general arrival and service processes, and nonlinear location and capacity costs) we show that the problem can be decomposed, and construct an efficient optimization algorithm. The analysis yields several insights, including the importance of equitable facility configurations (EFCs), the behavior of optimal and near-optimal capacities, and robust class of solutions that can be constructed for this problem.facility location, stochastic demand, queueing, service level
Percentile performance criteria for limiting average Markov Decision Processes
In this paper we address the following basic feasibility
problem for infinite-horizon Markov decision processes
(MDP’s): can a policy be found that achieves a specified value
(target) of the long-run limiting average reward at a specified
probability level (percentile)? Related optimization problems of
maximizing the target for a specified percentile and vice versa
are also considered. We present a complete (and discrete) classification
of both the maximal achievable target levels and of
their corresponding percentiles. We also provide an algorithm for
computing a deterministic policy corresponding to any feasible
target-percentile pair.
Next we consider similar problems for an MDP with multiple
rewards and/or constraints. This case presents some difficulties
and leads to several open problems. An LP-based formulation
provides constructive solutions for most cases
Using strategic idleness to improve customer service experience in service networks
The most common measure of waiting time is the overall expected waiting time for service. However, in service networks the perception of waiting may also depend on how it is distributed among different stations. Therefore, reducing the probability of a long wait at any station may be important in improving customers' perception of service quality. In a single-station queue it is known that the policy that minimizes the waiting time and the probability of long waits is nonidling. However, this is not necessarily the case for queueing networks with several stations. We present a family of threshold-based policies (TBPs) that strategically idle some stations. We demonstrate the advantage of strategically idling by applying TBP in a network with two single-server queues in tandem. We provide closed form results for the special case where the first station has infinite capacity and develop efficient algorithms when this is not the case. We compare TBPs with the nonidling and Kanban policies, and we discuss when a TBP is advantageous. Using simulation, we demonstrate that the analytical insights for the two-station case hold for a three-station serial queue as well.Accepted versio