Search CORE

1,412 research outputs found

Sustainable Cooperative Coevolution with a Multi-Armed Bandit

Author: De Rainville François-Michel
Gagné Christian
Laurendeau Denis
Schoenauer Marc
Sebag Michèle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

An Information-Theoretic Analysis of Thompson Sampling

Author: Russo Daniel
Van Roy Benjamin
Publication venue
Publication date: 01/01/2014
Field of study

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance

arXiv.org e-Print Archive

CiteSeerX

Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

Author: Chen Kwang-Cheng
Hanzo Lajos
Jiang Chunxiao
Ren Yong
Wang Jingjing
Zhang Haijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2019
Field of study

Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

A multi-arm bandit neighbourhood search for routing and scheduling problems

Author: Chen Yujie
Cowling Peter Ivan
Mourdjis Philip James
Polack Fiona A C
Publication venue
Publication date: 01/01/2016
Field of study

Abstract Local search based meta-heuristics such as variable neighbourhood search have achieved remarkable success in solving complex combinatorial problems. Local search techniques are becoming increasingly popular and are used in a wide variety of meta-heuristics, such as genetic algorithms. Typically, local search iteratively improves a solution by making a series of small moves. Traditionally these methods do not employ any learning mechanism. We treat the selection of a local search neighbourhood as a dynamic multi- armed bandit (D-MAB) problem where learning techniques for solving the D-MAB can be used to guide the local search process. We present a D-MAB neighbourhood search (D-MABNS) which can be embedded within any meta- heuristic or hyperheuristic framework. Given a set of neighbourhoods, the aim of D-MABNS is to adapt the search sequence, testing promising solutions rst. We demonstrate the eectiveness of D-MABNS on two vehicle routing and scheduling problems, the real-world geographically distributed mainte- nance problem (GDMP) and the periodic vehicle routing problem (PVRP). We present comparisons to benchmark instances and give a detailed analysis of parameters, performance and behaviour. Keywords Meta-heuristic Local search Vehicle routin

White Rose Research Online

Sequential Selection of Correlated Ads by POMDPs

Author: Wang Jun
Yuan Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Online advertising has become a key source of revenue for both web search engines and online publishers. For them, the ability of allocating right ads to right webpages is critical because any mismatched ads would not only harm web users' satisfactions but also lower the ad income. In this paper, we study how online publishers could optimally select ads to maximize their ad incomes over time. The conventional offline, content-based matching between webpages and ads is a fine start but cannot solve the problem completely because good matching does not necessarily lead to good payoff. Moreover, with the limited display impressions, we need to balance the need of selecting ads to learn true ad payoffs (exploration) with that of allocating ads to generate high immediate payoffs based on the current belief (exploitation). In this paper, we address the problem by employing Partially observable Markov decision processes (POMDPs) and discuss how to utilize the correlation of ads to improve the efficiency of the exploration and increase ad incomes in a long run. Our mathematical derivation shows that the belief states of correlated ads can be naturally updated using a formula similar to collaborative filtering. To test our model, a real world ad dataset from a major search engine is collected and categorized. Experimenting over the data, we provide an analyse of the effect of the underlying parameters, and demonstrate that our algorithms significantly outperform other strong baselines

arXiv.org e-Print Archive

Crossref