196 research outputs found
Skyline Identification in Multi-Armed Bandits
We introduce a variant of the classical PAC multi-armed bandit problem. There
is an ordered set of arms , each with some stochastic
reward drawn from some unknown bounded distribution. The goal is to identify
the of the set , consisting of all arms such that
has larger expected reward than all lower-numbered arms . We
define a natural notion of an -approximate skyline and prove
matching upper and lower bounds for identifying an -skyline.
Specifically, we show that in order to identify an -skyline from
among arms with probability , samples are necessary and sufficient. When , our results improve over the naive algorithm, which draws enough samples
to approximate the expected reward of every arm; the algorithm of (Auer et al.,
AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our
results show that the sample complexity of the skyline problem lies strictly in
between that of best arm identification (Even-Dar et al., COLT'02) and that of
approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio
Vector Optimization with Stochastic Bandit Feedback
We introduce vector optimization problems with stochastic bandit feedback,
which extends the best arm identification problem to vector-valued rewards. We
consider designs with multi-dimensional mean reward vectors, which are
partially ordered according to a polyhedral ordering cone . This generalizes
the concept of the Pareto set in multi-objective optimization and allows
different sets of preferences of decision-makers to be encoded by .
Different than prior work, we define approximations of the Pareto set based on
direction-free covering and gap notions. We study an ()-PAC
Pareto set identification problem where an evaluation of each design yields a
noisy observation of the mean reward vector. In order to characterize the
difficulty of learning the Pareto set, we introduce the concept of {\em
ordering complexity}, i.e., geometric conditions on the deviations of empirical
reward vectors from their mean under which the Pareto front can be approximated
accurately. We show how to compute the ordering complexity of any polyhedral
ordering cone. We provide gap-dependent and worst-case lower bounds on the
sample complexity and show that in the worst-case the sample complexity scales
with the square of ordering complexity. Furthermore, we investigate the sample
complexity of the na\"ive elimination algorithm and prove that it nearly
matches the worst-case sample complexity. Finally, we run experiments to verify
our theoretical results and illustrate how and sampling budget affect the
Pareto set, returned ()-PAC Pareto set and the success of
identification.Comment: 28 pages, 3 tables, 1 figur
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Pareto Front Identification with Regret Minimization
We consider Pareto front identification for linear bandits (PFILin) where the
goal is to identify a set of arms whose reward vectors are not dominated by any
of the others when the mean reward vector is a linear function of the context.
PFILin includes the best arm identification problem and multi-objective active
learning as special cases. The sample complexity of our proposed algorithm is
, where is the dimension of contexts and is
a measure of problem complexity. Our sample complexity is optimal up to a
logarithmic factor. A novel feature of our algorithm is that it uses the
contexts of all actions. In addition to efficiently identifying the Pareto
front, our algorithm also guarantees bound for
instantaneous Pareto regret when the number of samples is larger than
for dimensional vector rewards. By using the contexts of
all arms, our proposed algorithm simultaneously provides efficient Pareto front
identification and regret minimization. Numerical experiments demonstrate that
the proposed algorithm successfully identifies the Pareto front while
minimizing the regret.Comment: 25 pages including appendi
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
- …