Search CORE

3 research outputs found

An exploration strategy for non-stationary opponents

Author: Hernandez-Leal P. (Pablo)
Munoz de Cote E. (Enrique)
Sucar L.E. (Enrique)
Taylor M.E. (Matthew)
Zhan Y. (Yusen)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2017
Field of study

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains

CWI's Institutional Repository

A Reinforcement Learning Approach for Interdomain Routing with Link Prices

Author: Abdel Rodriguez
Ann Nowé
Barth Dominique
Buşoniu Lucian
Hansan
Kris Steenhaut
Pasquale Gurzi
Peter Vrancx
Rodriguez Abdel
Rodriguez Abdel
Rodríguez Abdel
Thathachar Mandayam A. L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Service Level Agreement-based adaptation management for Internet Service Provider (ISP) using Fuzzy Q-learning

Author: Bin Ramli Ahmad Kamal
Publication venue: University of Leeds
Publication date: 01/05/2018
Field of study

Internet access is the vital catalyst for online users, and the number of mobile subscribers is predicted to grow from dramatically in the next few years. This huge demand is the main issue facing the Internet Service Providers (ISPs) who need to handle users’ expectations along with their current resources. An adaptive mechanism within the ISPs architecture is a promising solution to handle such situation. A Service Level Agreement (SLA)is the legal catalyst to monitor any contract violation between end users and ISPs and is embedded within a Quality of Service (QoS) framework. It strengthens and advances the quality of control over the user’s application and network resources and can be further stretched to fulfill the QoS terms through negotiation and re-negotiation. Moreover, the present literature does not focus on the combination of rule-based approaches and adaptation together to update the established learning repository. Therefore, this mainstream of this research in the context of SLAs is to fill in this gap by addressing the combination of rule-base uncertainties and iteration of the learning ability. The key to the proposed architecture is the utilization of self - * capabilities designed to have self-management over uncertainties and the provision of self-adaptive interactions. Thus, the Monitor, Analyse, Plan, Execute and Knowledge Base (MAPE-K) approach is able to deal with this problem together with the integration of Fuzzy and Q-Learning algorithms. The proposed architecture is in the context of autonomic computing. An adaptation manager is the main proposed component to update admission control on the ISP current resources and the ability to manage SLAs. A general methodology type-2 fuzzy logic is applied to ensure the uncertainties and precise decision-making are well addressed in this research. The proposed solution, demonstrating Q-Learning works adaptive with QoS parameters, e.g. Latency, Availability and Packet Loss. With the combination of fuzzy and Q-Learning, we demonstrate that the proposed adaptation manager is able to handle the uncertainties and learning abilities. Q-Learning is able to identify the initial state from various ISPs iterations and update them with appropriate actions, reflecting the reward configurations. The higher the iterations process the higher is the increase the learning ability,rewards and exploration probability. The research outcomes benefit the SLA framework by incorporating the information for SLA policies and Service Level Objectives (SLOs). Lastly, an important contribution is the ability to demonstrate that the MAPE-K approach is a contender for ISP SLA-based frameworks for QoS provision

White Rose E-theses Online