1,412 research outputs found
Sustainable Cooperative Coevolution with a Multi-Armed Bandit
This paper proposes a self-adaptation mechanism to manage the resources
allocated to the different species comprising a cooperative coevolutionary
algorithm. The proposed approach relies on a dynamic extension to the
well-known multi-armed bandit framework. At each iteration, the dynamic
multi-armed bandit makes a decision on which species to evolve for a
generation, using the history of progress made by the different species to
guide the decisions. We show experimentally, on a benchmark and a real-world
problem, that evolving the different populations at different paces allows not
only to identify solutions more rapidly, but also improves the capacity of
cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201
An Information-Theoretic Analysis of Thompson Sampling
We provide an information-theoretic analysis of Thompson sampling that
applies across a broad range of online optimization problems in which a
decision-maker must learn from partial feedback. This analysis inherits the
simplicity and elegance of information theory and leads to regret bounds that
scale with the entropy of the optimal-action distribution. This strengthens
preexisting results and yields new insight into how information improves
performance
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
A multi-arm bandit neighbourhood search for routing and scheduling problems
Abstract Local search based meta-heuristics such as variable neighbourhood search have achieved remarkable success in solving complex combinatorial problems. Local search techniques are becoming increasingly popular and are used in a wide variety of meta-heuristics, such as genetic algorithms. Typically, local search iteratively improves a solution by making a series of small moves. Traditionally these methods do not employ any learning mechanism. We treat the selection of a local search neighbourhood as a dynamic multi- armed bandit (D-MAB) problem where learning techniques for solving the D-MAB can be used to guide the local search process. We present a D-MAB neighbourhood search (D-MABNS) which can be embedded within any meta- heuristic or hyperheuristic framework. Given a set of neighbourhoods, the aim of D-MABNS is to adapt the search sequence, testing promising solutions rst. We demonstrate the eectiveness of D-MABNS on two vehicle routing and scheduling problems, the real-world geographically distributed mainte- nance problem (GDMP) and the periodic vehicle routing problem (PVRP). We present comparisons to benchmark instances and give a detailed analysis of parameters, performance and behaviour. Keywords Meta-heuristic Local search Vehicle routin
Sequential Selection of Correlated Ads by POMDPs
Online advertising has become a key source of revenue for both web search
engines and online publishers. For them, the ability of allocating right ads to
right webpages is critical because any mismatched ads would not only harm web
users' satisfactions but also lower the ad income. In this paper, we study how
online publishers could optimally select ads to maximize their ad incomes over
time. The conventional offline, content-based matching between webpages and ads
is a fine start but cannot solve the problem completely because good matching
does not necessarily lead to good payoff. Moreover, with the limited display
impressions, we need to balance the need of selecting ads to learn true ad
payoffs (exploration) with that of allocating ads to generate high immediate
payoffs based on the current belief (exploitation). In this paper, we address
the problem by employing Partially observable Markov decision processes
(POMDPs) and discuss how to utilize the correlation of ads to improve the
efficiency of the exploration and increase ad incomes in a long run. Our
mathematical derivation shows that the belief states of correlated ads can be
naturally updated using a formula similar to collaborative filtering. To test
our model, a real world ad dataset from a major search engine is collected and
categorized. Experimenting over the data, we provide an analyse of the effect
of the underlying parameters, and demonstrate that our algorithms significantly
outperform other strong baselines
- …