5,215 research outputs found
Incentivizing Exploration with Heterogeneous Value of Money
Recently, Frazier et al. proposed a natural model for crowdsourced
exploration of different a priori unknown options: a principal is interested in
the long-term welfare of a population of agents who arrive one by one in a
multi-armed bandit setting. However, each agent is myopic, so in order to
incentivize him to explore options with better long-term prospects, the
principal must offer the agent money. Frazier et al. showed that a simple class
of policies called time-expanded are optimal in the worst case, and
characterized their budget-reward tradeoff.
The previous work assumed that all agents are equally and uniformly
susceptible to financial incentives. In reality, agents may have different
utility for money. We therefore extend the model of Frazier et al. to allow
agents that have heterogeneous and non-linear utilities for money. The
principal is informed of the agent's tradeoff via a signal that could be more
or less informative.
Our main result is to show that a convex program can be used to derive a
signal-dependent time-expanded policy which achieves the best possible
Lagrangian reward in the worst case. The worst-case guarantee is matched by
so-called "Diamonds in the Rough" instances; the proof that the guarantees
match is based on showing that two different convex programs have the same
optimal solution for these specific instances. These results also extend to the
budgeted case as in Frazier et al. We also show that the optimal policy is
monotone with respect to information, i.e., the approximation ratio of the
optimal policy improves as the signals become more informative.Comment: WINE 201
On the use of biased-randomized algorithms for solving non-smooth optimization problems
Soft constraints are quite common in real-life applications. For example, in freight transportation, the fleet size can be enlarged by outsourcing part of the distribution service and some deliveries to customers can be postponed as well; in inventory management, it is possible to consider stock-outs generated by unexpected demands; and in manufacturing processes and project management, it is frequent that some deadlines cannot be met due to delays in critical steps of the supply chain. However, capacity-, size-, and time-related limitations are included in many optimization problems as hard constraints, while it would be usually more realistic to consider them as soft ones, i.e., they can be violated to some extent by incurring a penalty cost. Most of the times, this penalty cost will be nonlinear and even noncontinuous, which might transform the objective function into a non-smooth one. Despite its many practical applications, non-smooth optimization problems are quite challenging, especially when the underlying optimization problem is NP-hard in nature. In this paper, we propose the use of biased-randomized algorithms as an effective methodology to cope with NP-hard and non-smooth optimization problems in many practical applications. Biased-randomized algorithms extend constructive heuristics by introducing a nonuniform randomization pattern into them. Hence, they can be used to explore promising areas of the solution space without the limitations of gradient-based approaches, which assume the existence of smooth objective functions. Moreover, biased-randomized algorithms can be easily parallelized, thus employing short computing times while exploring a large number of promising regions. This paper discusses these concepts in detail, reviews existing work in different application areas, and highlights current trends and open research lines
- âŠ