Search CORE

196 research outputs found

Skyline Identification in Multi-Armed Bandits

Author: Cheu Albert
Sundaram Ravi
Ullman Jonathan
Publication venue
Publication date: 09/01/2018
Field of study

We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of

n

arms

A[1],\dots,A[n]

, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the

skyline

of the set

A

, consisting of all arms

A[i]

such that

A[i]

has larger expected reward than all lower-numbered arms

A[1],\dots,A[i-1]

. We define a natural notion of an

\varepsilon

-approximate skyline and prove matching upper and lower bounds for identifying an

\varepsilon

-skyline. Specifically, we show that in order to identify an

\varepsilon

-skyline from among

n

arms with probability

1-\delta

\Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg\{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg\} \bigg)

samples are necessary and sufficient. When

\varepsilon \gg 1/n

, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio

arXiv.org e-Print Archive

Crossref

Vector Optimization with Stochastic Bandit Feedback

Author: Ararat Çağın
Tekin Cem
Publication venue
Publication date: 08/06/2022
Field of study

We introduce vector optimization problems with stochastic bandit feedback, which extends the best arm identification problem to vector-valued rewards. We consider

K

designs with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone

C

. This generalizes the concept of the Pareto set in multi-objective optimization and allows different sets of preferences of decision-makers to be encoded by

C

. Different than prior work, we define approximations of the Pareto set based on direction-free covering and gap notions. We study an (

\epsilon,\delta

)-PAC Pareto set identification problem where an evaluation of each design yields a noisy observation of the mean reward vector. In order to characterize the difficulty of learning the Pareto set, we introduce the concept of {\em ordering complexity}, i.e., geometric conditions on the deviations of empirical reward vectors from their mean under which the Pareto front can be approximated accurately. We show how to compute the ordering complexity of any polyhedral ordering cone. We provide gap-dependent and worst-case lower bounds on the sample complexity and show that in the worst-case the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na\"ive elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how

C

and sampling budget affect the Pareto set, returned (

\epsilon,\delta

)-PAC Pareto set and the success of identification.Comment: 28 pages, 3 tables, 1 figur

arXiv.org e-Print Archive

Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

Author: Chen Kwang-Cheng
Hanzo Lajos
Jiang Chunxiao
Ren Yong
Wang Jingjing
Zhang Haijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2019
Field of study

Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Pareto Front Identification with Regret Minimization

Author: Iyengar Garud
Kim Wonyoung
Zeevi Assaf
Publication venue
Publication date: 31/05/2023
Field of study

We consider Pareto front identification for linear bandits (PFILin) where the goal is to identify a set of arms whose reward vectors are not dominated by any of the others when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is

\tilde{O}(d/\Delta^2)

, where

d

is the dimension of contexts and

\Delta

is a measure of problem complexity. Our sample complexity is optimal up to a logarithmic factor. A novel feature of our algorithm is that it uses the contexts of all actions. In addition to efficiently identifying the Pareto front, our algorithm also guarantees

\tilde{O}(\sqrt{d/t})

bound for instantaneous Pareto regret when the number of samples is larger than

\Omega(d\log dL)

for

L

dimensional vector rewards. By using the contexts of all arms, our proposed algorithm simultaneously provides efficient Pareto front identification and regret minimization. Numerical experiments demonstrate that the proposed algorithm successfully identifies the Pareto front while minimizing the regret.Comment: 25 pages including appendi

arXiv.org e-Print Archive

Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

Author: Jonker Catholijn M
Linders Sjoerd
Nowé Ann
Roijers Diederik M
Zintgraf Luisa M
Publication venue
Publication date: 01/01/2018
Field of study

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elici

arXiv.org e-Print Archive

VU Research Portal