5,118 research outputs found
Optimality of mixed policies for average continuous-time Markov decision processes with constraints
This article concerns the average criteria for continuous-time Markov decision processes with N constraints. We show the following; (a) every extreme point of the space of performance vectors corresponding to the set of stable measures is generated by a deterministic stationary policy; and (b) there exists a mixed optimal policy, where the mixture is over no more than N + 1 deterministic stationary policies
Computing (R, S) policies with correlated demand
This paper considers the single-item single-stocking non-stationary
stochastic lot-sizing problem under correlated demand. By operating under a
nonstationary (R, S) policy, in which R denote the reorder period and S the
associated order-up-to-level, we introduce a mixed integer linear programming
(MILP) model which can be easily implemented by using off-theshelf optimisation
software. Our modelling strategy can tackle a wide range of time-seriesbased
demand processes, such as autoregressive (AR), moving average(MA),
autoregressive moving average(ARMA), and autoregressive with autoregressive
conditional heteroskedasticity process(AR-ARCH). In an extensive computational
study, we compare the performance of our model against the optimal policy
obtained via stochastic dynamic programming. Our results demonstrate that the
optimality gap of our approach averages 2.28% and that computational
performance is good
Restless bandit marginal productivity indices I: singleproject case and optimal control of a make-to-stock M/G/1 queue
This paper develops a framework based on convex optimization and economic ideas to formulate and solve by an index policy the problem of optimal dynamic effort allocation to a generic discrete-state restless bandit (i.e. binary-action: work/rest) project, elucidating a host of issues raised by Whittle (1988)Žs seminal work on the topic. Our contributions include: (i) a unifying definition of a projectŽs marginal productivity index (MPI), characterizing optimal policies; (ii) a complete characterization of indexability (existence of the MPI) as satisfaction by the project of the law of diminishing returns (to effort); (iii) sufficient indexability conditions based on partial conservation laws (PCLs), extending previous results of the author from the finite to the countable state case; (iv) application to a semi-Markov project, including a new MPI for a mixed longrun-average (LRA)/ bias criterion, which exists in relevant queueing control models where the index proposed by Whittle (1988) does not; and (v) optimal MPI policies for service-controlled make-to-order (MTO) and make-to-stock (MTS) M/G/1 queues with convex back order and stock holding cost rates, under discounted and LRA criteria
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Coordinating agents to complete a set of tasks with intercoupled temporal and
resource constraints is computationally challenging, yet human domain experts
can solve these difficult scheduling problems using paradigms learned through
years of apprenticeship. A process for manually codifying this domain knowledge
within a computational framework is necessary to scale beyond the
``single-expert, single-trainee" apprenticeship model. However, human domain
experts often have difficulty describing their decision-making processes,
causing the codification of this knowledge to become laborious. We propose a
new approach for capturing domain-expert heuristics through a pairwise ranking
formulation. Our approach is model-free and does not require enumerating or
iterating through a large state space. We empirically demonstrate that this
approach accurately learns multifaceted heuristics on a synthetic data set
incorporating job-shop scheduling and vehicle routing problems, as well as on
two real-world data sets consisting of demonstrations of experts solving a
weapon-to-target assignment problem and a hospital resource allocation problem.
We also demonstrate that policies learned from human scheduling demonstration
via apprenticeship learning can substantially improve the efficiency of a
branch-and-bound search for an optimal schedule. We employ this human-machine
collaborative optimization technique on a variant of the weapon-to-target
assignment problem. We demonstrate that this technique generates solutions
substantially superior to those produced by human domain experts at a rate up
to 9.5 times faster than an optimization approach and can be applied to
optimally solve problems twice as complex as those solved by a human
demonstrator.Comment: Portions of this paper were published in the Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and
in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper
consists of 50 pages with 11 figures and 4 table
Adaptive Matching for Expert Systems with Uncertain Task Types
A matching in a two-sided market often incurs an externality: a matched
resource may become unavailable to the other side of the market, at least for a
while. This is especially an issue in online platforms involving human experts
as the expert resources are often scarce. The efficient utilization of experts
in these platforms is made challenging by the fact that the information
available about the parties involved is usually limited.
To address this challenge, we develop a model of a task-expert matching
system where a task is matched to an expert using not only the prior
information about the task but also the feedback obtained from the past
matches. In our model the tasks arrive online while the experts are fixed and
constrained by a finite service capacity. For this model, we characterize the
maximum task resolution throughput a platform can achieve. We show that the
natural greedy approaches where each expert is assigned a task most suitable to
her skill is suboptimal, as it does not internalize the above externality. We
develop a throughput optimal backpressure algorithm which does so by accounting
for the `congestion' among different task types. Finally, we validate our model
and confirm our theoretical findings with data-driven simulations via logs of
Math.StackExchange, a StackOverflow forum dedicated to mathematics.Comment: A part of it presented at Allerton Conference 2017, 18 page
- …