18 research outputs found

    On the Bitrate Adaptation of Shared Media Experience Services

    Get PDF
    In Shared Media Experience Services (SMESs), a group of people is interested in streaming consumption in a synchronised way, like in the case of cloud gaming, live streaming, and interactive social applications. However, group synchronisation comes at the expense of other Quality of Experience (QoE) factors due to both the dynamic and diverse network conditions that each group member experiences. Someone might wonder if there is a way to keep a group synchronised while maintaining the highest possible QoE for each one of its members. In this work, at first we create a Quality Assessment Framework capable of evaluating different SMESs improvement approaches with respect to traditional metrics like media bitrate quality, playback disruption, and end user desynchronisation. Secondly, we focus on the bitrate adaptation for improving the QoE of SMESs, as an incrementally deployable end user triggered approach, and we formulate the problem in the context of Adaptive Real Time Dynamic Programming (ARTDP). Finally, we develop and apply a simple QoE aware bitrate adaptation mechanism that we compare against youtube live-streaming traces to find that it improves the youtube performance by more than 30%

    Modular reinforcement learning : a case study in a robot domain

    Get PDF
    The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future

    A parallel implementation of Q-learning based on communication with cache

    Get PDF
    Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and other Reinforcement Learning methods to be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others. Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In order to solve or at least reduce this problem, we propose a parallel implementation model of Q-learning using a tabular representation and via a communication scheme based on cache. This model is applied to a particular problem and the results obtained with different processor configurations are reported. A brief discussion about the properties and current limitations of our approach is finally presented.Facultad de Informátic

    Software tools for the cognitive development of autonomous robots

    Get PDF
    Robotic systems are evolving towards higher degrees of autonomy. This paper reviews the cognitive tools available nowadays for the fulfilment of abstract or long-term goals as well as for learning and modifying their behaviour.Peer ReviewedPostprint (author's final draft

    A parallel implementation of Q-learning based on communication with cache

    Get PDF
    Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and other Reinforcement Learning methods to be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others. Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In order to solve or at least reduce this problem, we propose a parallel implementation model of Q-learning using a tabular representation and via a communication scheme based on cache. This model is applied to a particular problem and the results obtained with different processor configurations are reported. A brief discussion about the properties and current limitations of our approach is finally presented.Facultad de Informátic

    Acta Cybernetica : Volume 14. Number 3.

    Get PDF

    On monte carlo tree search and reinforcement learning

    Get PDF
    Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search
    corecore