Search CORE

18 research outputs found

On the Bitrate Adaptation of Shared Media Experience Services

Author: Atarashi R
Pavlou G
Psaras I
Tasiopoulos AG
Publication venue: ACM SIGCOMM Workshop on QoE-Based Analysis and Management of Data Communication Networks (Internet QoE)
Publication date: 21/08/2017
Field of study

In Shared Media Experience Services (SMESs), a group of people is interested in streaming consumption in a synchronised way, like in the case of cloud gaming, live streaming, and interactive social applications. However, group synchronisation comes at the expense of other Quality of Experience (QoE) factors due to both the dynamic and diverse network conditions that each group member experiences. Someone might wonder if there is a way to keep a group synchronised while maintaining the highest possible QoE for each one of its members. In this work, at first we create a Quality Assessment Framework capable of evaluating different SMESs improvement approaches with respect to traditional metrics like media bitrate quality, playback disruption, and end user desynchronisation. Secondly, we focus on the bitrate adaptation for improving the QoE of SMESs, as an incrementally deployable end user triggered approach, and we formulate the problem in the context of Adaptive Real Time Dynamic Programming (ARTDP). Finally, we develop and apply a simple QoE aware bitrate adaptation mechanism that we compare against youtube live-streaming traces to find that it improves the youtube performance by more than 30%

Crossref

UCL Discovery

Modular reinforcement learning : a case study in a robot domain

Author: Kalmár Zsolt
Lőrincz András
Szepesvári Csaba
Publication venue
Publication date: 01/01/2000
Field of study

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future

University of Szeged

A parallel implementation of Q-learning based on communication with cache

Author: Errecalde Marcelo Luis
Montoya Cecilia Inés
Printista Alicia Marcela
Publication venue
Publication date: 01/01/2002
Field of study

Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and other Reinforcement Learning methods to be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others. Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In order to solve or at least reduce this problem, we propose a parallel implementation model of Q-learning using a tabular representation and via a communication scheme based on cache. This model is applied to a particular problem and the results obtained with different processor configurations are reported. A brief discussion about the properties and current limitations of our approach is finally presented.Facultad de Informátic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Software tools for the cognitive development of autonomous robots

Author: Jimenez Schlegl Pablo
Publication venue
Publication date: 01/01/2017
Field of study

Robotic systems are evolving towards higher degrees of autonomy. This paper reviews the cognitive tools available nowadays for the fulfilment of abstract or long-term goals as well as for learning and modifying their behaviour.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A parallel implementation of Q-learning based on communication with cache

Author: Errecalde Marcelo Luis
Montoya Cecilia Inés
Printista Alicia Marcela
Publication venue
Publication date: 09/02/2004
Field of study

Servicio de Difusión de la Creación Intelectual

Recommended from our members

A study of model-based average reward reinforcement learning

Author: Ok DoKyeong
Publication venue: 'Oregon State University'
Publication date
Field of study

Reinforcement Learning (RL) is the study of learning agents that improve their performance from rewards and punishments. Most reinforcement learning methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this thesis, we introduce a model-based average reward reinforcement learning method called "H-learning" and show that it performs better than other average reward and discounted RL methods in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning which automatically explores the unexplored parts of the state space, while always choosing an apparently best action with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs much better than the original H-learning under many previously studied exploration strategies. To scale H-learning to large state spaces, we extend it to learn action models and reward functions in the form of Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are very effective in significantly reducing the space requirement of H-learning, and in making it converge much faster in the AGV scheduling task. Further, Auto-exploratory H-learning synergistically combines with Bayesian network model learning and value function approximation by local linear regression, yielding a highly effective average reward RL algorithm. We believe that the algorithms presented here have the potential to scale to large applications in the context of average reward optimization

ScholarsArchive@OSU

Acta Cybernetica : Volume 14. Number 3.

Author
Publication venue
Publication date: 01/01/2000
Field of study

University of Szeged

Recommended from our members

An experimental evaluation of auto-exploratory, average-reward reinforcement learning

Author: Mach Kimberly
Publication venue: 'Oregon State University'
Publication date
Field of study

ScholarsArchive@OSU

On monte carlo tree search and reinforcement learning

Author: Brank Ster
Samothrakis Spyridon
Tom Vodopivec
Publication venue: 'AI Access Foundation'
Publication date: 20/12/2017
Field of study

Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search

University of Essex Research Repository

Crossref