648 research outputs found

    Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

    Full text link
    We consider a task assignment problem in crowdsourcing, which is aimed at collecting as many reliable labels as possible within a limited budget. A challenge in this scenario is how to cope with the diversity of tasks and the task-dependent reliability of workers, e.g., a worker may be good at recognizing the name of sports teams, but not be familiar with cosmetics brands. We refer to this practical setting as heterogeneous crowdsourcing. In this paper, we propose a contextual bandit formulation for task assignment in heterogeneous crowdsourcing, which is able to deal with the exploration-exploitation trade-off in worker selection. We also theoretically investigate the regret bounds for the proposed method, and demonstrate its practical usefulness experimentally

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Robust Dynamic Assortment Optimization in the Presence of Outlier Customers

    Full text link
    We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the ε\varepsilon-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length TT, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a (1−ε1-\varepsilon)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining ε\varepsilon-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter ε\varepsilon. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.Comment: 27 pages, 1 figur
    • …
    corecore