409 research outputs found
An index for dynamic product promotion and the knapsack problem for perishable items
This paper introduces the knapsack problem for perishable items (KPPI), which concerns the optimal dynamic allocation of a limited promotion space to a collection of perishable items. Such a problem is motivated by applications in a variety of industries, where products have an associated lifetime after which they cannot be sold. The paper builds on recent developments on restless bandit indexation and gives an optimal marginal productivity index policy for the dynamic (single) product promotion problem with closed-form indices that yield estructural insights. The performance of the proposed policy for KPPI is investigated in a computational study.Dynamic promotion, Perishable items, Index policies, Knapsack problem, Festless bandits, Finite horizon, Marginal productivity index
Two-stage index computation for bandits with switching penalties I : switching costs
This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with
unknown reward models. At each time, a player selects one arm to play, aiming
to maximize the total expected reward over a horizon of length T. An approach
based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is
developed for constructing sequential arm selection policies. It is shown that
for all light-tailed reward distributions, DSEE achieves the optimal
logarithmic order of the regret, where regret is defined as the total expected
reward loss against the ideal case with known reward models. For heavy-tailed
reward distributions, DSEE achieves O(T^1/p) regret when the moments of the
reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2))
for p>2. With the knowledge of an upperbound on a finite moment of the
heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret
order. The proposed DSEE approach complements existing work on MAB by providing
corresponding results for general reward distributions. Furthermore, with a
clearly defined tunable parameter-the cardinality of the exploration sequence,
the DSEE approach is easily extendable to variations of MAB, including MAB with
various objectives, decentralized MAB with multiple players and incomplete
reward observations under collisions, MAB with unknown Markov dynamics, and
combinatorial MAB with dependent arms that often arise in network optimization
problems such as the shortest path, the minimum spanning, and the dominating
set problems under unknown random weights.Comment: 22 pages, 2 figure
Two-stage index computation for bandits with switching penalties I : switching costs
This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O() arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most +O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.
- …