Search CORE

8 research outputs found

Ballooning Multi-Armed Bandits

Author: Dhamal Swapnil
Ghalme Ganesh
Gujar Sujit
Jain Shweta
Narahari Y.
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a novel extension of the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing stochastic MAB algorithms result in linear regret for the BL-MAB model. We prove that, if the best arm is equally likely to arrive at any time instant, a sub-linear regret cannot be achieved. Next, we show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution's tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results. We conclude by showing that our algorithm would achieve sub-linear regret even if (a) the distributional parameters are not exactly known, but are obtained using a reasonable learning mechanism or (b) the best arm is not more likely to arrive early, but a large fraction of arms is likely to arrive relatively early.Comment: A full version of this paper is accepted in the Journal of Artificial Intelligence (AIJ) of Elsevier. A preliminary version is published as an extended abstract in AAMAS 2020. Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. 202

arXiv.org e-Print Archive

Chalmers Research

Hyper-Heuristics based on Reinforcement Learning, Balanced Heuristic Selection and Group Decision Acceptance

Author: Carvalho Vinicius Renan de
Santiago Júnior Valdivino Alexandre de
Özcan Ender
Publication venue: 'Elsevier BV'
Publication date: 01/12/2020
Field of study

In this paper, we introduce a multi-objective selection hyper-heuristic approach combining Reinforcement Learning, (meta)heuristic selection, and group decision-making as acceptance methods, referred to as Hyper-Heuristic based on Reinforcement LearnIng, Balanced Heuristic Selection and Group Decision AccEptance (HRISE), controlling a set of Multi-Objective Evolutionary Algorithms (MOEAs) as Low-Level (meta)Heuristics (LLHs). Along with the use of multiple MOEAs, we believe that having a robust LLH selection method as well as several move acceptance methods at our disposal would lead to an improved general-purpose method producing most adequate solutions to the problem instances across multiple domains. We present two learning hyper-heuristics based on the HRISE framework for multi-objective optimisation, each embedding a group decision-making acceptance method under a different rule: majority rule (HRISE_M) and responsibility rule (HRISE_R). A third hyper-heuristic is also defined where both a random LLH selection and a random move acceptance strategy are used. We also propose two variants of the late acceptance method and a new quality indicator supporting the initialisation of selection hyper-heuristics using low computational budget. An extensive set of experiments were performed using 39 multi-objective problem instances from various domains where 24 are from four different benchmark function classes, and the remaining 15 instances are from four different real-world problems. The cross-domain search performance of the proposed learning hyper-heuristics indeed turned out to be the best, particularly HRISE_R, when compared to three other selection hyper-heuristics, including a recently proposed one, and all low-level MOEAs each run in isolation

Repository@Nottingham

A budget feasible peer graded mechanism for iot-based crowdsourcing

Author: A Slivkins
Aniruddh Sharma
F Daniel
Fatos Xhafa
G Chatzimilioudis
JS Lee
LS Shapley
M Kobayashi
N Mazlan
N Nisan
S Jain
Sajal Mukhopadhyay
SS Fatima
T Luo
TH Cormen
Vikash Kumar Singh
Y Gao
Y Li
Z Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We develop and extend a line of recent works on the design of mechanisms for heterogeneous tasks assignment problem in ’crowdsourcing’. The budgeted market we consider consists of multiple task requesters and multiple IoT devices as task executers. In this, each task requester is endowed with a single distinct task along with the publicly known budget. Also, each IoT device has valuations as the cost for executing the tasks and quality, which are private. Given such scenario, the objective is to select a subset of IoT devices for each task, such that the total payment made is within the allotted quota of the budget while attaining a threshold quality. For the purpose of determining the unknown quality of the IoT devices we have utilized the concept of peer grading. In this paper, we have carefully crafted a truthful budget feasible mechanism for the problem under investigation that also allows us to have the true information about the quality of the IoT devices. Further, we have extended the set-up considering the case where the tasks are divisible in nature and the IoT devices are working collaboratively, instead of, a single entity for executing each task. We have designed the budget feasible mechanisms for the extended versions. The simulations are performed in order to measure the efficacy of our proposed mechanismPeer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing

Author: Bhat S
Gujar S
Jain S
Narahari Y
Zoeter O
Publication venue: 'Elsevier BV'
Publication date: 08/11/2018
Field of study

Infoscience - École polytechnique fédérale de Lausanne

A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing

Author: Bhat Satyanath
Gujar Sujit
Jain Shweta
Narahari Y
Zoeter Onno
Publication venue: 10.1016/j.artint.2017.10.001
Publication date
Field of study

There are numerous situations when a service requester wishes to expertsource a series of identical but non-trivial tasks from a pool of experts so as to achieve an assured accuracy level for each task, in a cost optimal way. The experts available are typically heterogeneous with unknown but fixed qualities and different service costs. The service costs are usually private to the experts and the experts could be strategic about their costs. The problem is to select for each task an optimal subset of experts so that the outcome obtained after aggregating the opinions from the selected experts guarantees a target level of accuracy. The problem is a challenging one even in a non-strategic setting since the accuracy of an aggregated outcome depends on unknown qualities. We develop a novel multi-armed bandit (MAB) mechanism for solving this problem. First, we propose a framework, Assured Accuracy Bandit (MB) framework, which leads to a MAB algorithm, Constrained Confidence Bound for Non-Strategic Setting (CCB-NS). We derive an upper bound on the number of time steps this algorithm chooses a sub-optimal set, which depends on the target accuracy and true qualities. A more challenging situation arises when the requester not only has to learn the qualities of the experts but has to elicit their true service costs as well. We modify the CCB-NS algorithm to obtain an adaptive exploration separated algorithm Constrained Confidence Bound for Strategic Setting (CCB-S). The CCB-S algorithm produces an ex-post monotone allocation rule that can then be transformed into an ex post incentive compatible and ex-post individually rational mechanism. This mechanism learns the qualities of the experts and guarantees a given target accuracy level in a cost optimal way. We also provide a lower bound on the number of times any algorithm must select a sub-optimal set and we see that the lower bound matches our upper bound up to a constant factor. We provide insights on a practical implementation of this framework through an illustrative example and demonstrate the efficacy of our algorithms through simulations. (C) 2017 Elsevier B.V. All rights reserved

Open Access Repository of IISc Research Publications

A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing

Author: Abraham
Agrawal
Auer
Babaioff
Babaioff
Babaioff
Badanidiyuru
Badanidiyuru
Berrang-Ford
Bhat
Bhat
Bubeck
Cavallo
Chen
Chen
Csirik
Das Sarma
Dawid
Devanur
Even-Dar
Fan
Fye
Garg
Gatti
Gujar
Gujar
Ho
Jain
Jain
Kale
Kalyanakrishnan
Karger
Kremer
Kveton
Lai
Lattimore
Li
Mansour
Mavandadi
Myerson
O'Neil
Onno Zoeter
Raykar
Satyanath Bhat
Shweta Jain
Singer
Singla
Streeter
Sujit Gujar
Tran-Thanh
Tran-Thanh
Tran-Thanh
Viappiani
Wen
Witkowski
Y. Narahari
Zhou
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref