Search CORE

25,926 research outputs found

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Author: Clérot Fabrice
Gajane Pratik
Urvoy Tanguy
Publication venue
Publication date: 01/01/2015
Field of study

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Mathematical control of complex systems 2013

Author: Dong H
He X
Hu J
Karimi HR
Shen B
Wang Z
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Mathematical control of complex systems have already become an ideal research area for control engineers, mathematicians, computer scientists, and biologists to understand, manage, analyze, and interpret functional information/dynamical behaviours from real-world complex dynamical systems, such as communication systems, process control, environmental systems, intelligent manufacturing systems, transportation systems, and structural systems. This special issue aims to bring together the latest/innovative knowledge and advances in mathematics for handling complex systems. Topics include, but are not limited to the following: control systems theory (behavioural systems, networked control systems, delay systems, distributed systems, infinite-dimensional systems, and positive systems); networked control (channel capacity constraints, control over communication networks, distributed filtering and control, information theory and control, and sensor networks); and stochastic systems (nonlinear filtering, nonparametric methods, particle filtering, partial identification, stochastic control, stochastic realization, system identification)

Archivio istituzionale della ricerca - Politecnico di Milano

Directory of Open Access Journals

Brunel University Research Archive

Agder University Research Archive

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository

Reducing Dueling Bandits to Cardinal Bandits

Author: Ailon Nir
Joachims Thorsten
Karnin Zohar
Publication venue
Publication date: 14/05/2014
Field of study

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms

arXiv.org e-Print Archive

CiteSeerX