Search CORE

3,450 research outputs found

Genetic Programming for Smart Phone Personalisation

Author: Cotillon Alban
Haak Aiden
Jurdak Raja
Valencia Philip
Publication venue
Publication date: 01/01/2014
Field of study

Personalisation in smart phones requires adaptability to dynamic context based on user mobility, application usage and sensor inputs. Current personalisation approaches, which rely on static logic that is developed a priori, do not provide sufficient adaptability to dynamic and unexpected context. This paper proposes genetic programming (GP), which can evolve program logic in realtime, as an online learning method to deal with the highly dynamic context in smart phone personalisation. We introduce the concept of collaborative smart phone personalisation through the GP Island Model, in order to exploit shared context among co-located phone users and reduce convergence time. We implement these concepts on real smartphones to demonstrate the capability of personalisation through GP and to explore the benefits of the Island Model. Our empirical evaluations on two example applications confirm that the Island Model can reduce convergence time by up to two-thirds over standalone GP personalisation.Comment: 43 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Author: Brys Tim
Nowé Ann
Kudenko Daniel
Taylor Matthew
Publication venue
Publication date: 01/01/2014
Field of study

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Author: Brys Tim
Kudenko Daniel
Nowé Ann
Taylor Matthew
Publication venue
Publication date: 01/01/2014
Field of study

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved

White Rose Research Online

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fast Damage Recovery in Robotics with the T-Resilience Algorithm

Author: Cully Antoine
Koos Sylvain
Mouret Jean-Baptiste
Publication venue: 'SAGE Publications'
Publication date: 02/02/2013
Field of study

Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require anticipating each potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience algorithm, a new algorithm that allows robots to quickly and autonomously discover compensatory behaviors in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behaviors by learning to avoid those that perform differently in the self-model and in reality. Our algorithm thus does not identify the damaged parts but it implicitly searches for efficient behaviors that do not use them. We evaluate the T-Resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we compare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 minutes, T-Resilience consistently leads to substantially better results than the other approaches

arXiv.org e-Print Archive

Crossref

HAL Descartes

Spiral - Imperial College Digital Repository

Hal-Diderot

A Shared Task on Bandit Learning for Machine Translation

Author: Danchenko Pavel
Fürstenau Hagen
Kreutzer Julia
Riezler Stefan
Sokolov Artem
Sunderland Kellen
Szymaniak Witold
Publication venue
Publication date: 01/01/2017
Field of study

We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task's learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.Comment: Conference on Machine Translation (WMT) 201

arXiv.org e-Print Archive

Crossref

Is morality a gadget? Nature, nurture and culture in moral development

Author: Heyes Cecilia
Publication venue
Publication date: 26/07/2019
Field of study

Research on ‘moral learning’ examines the roles of domain-general processes, such as Bayesian inference and reinforcement learning, in the development of moral beliefs and values. Alert to the power of these processes and equipped with both the analytic resources of philosophy and the empirical methods of psychology, ‘moral learners’ are ideally placed to discover the contributions of nature, nurture and culture to moral development. However, I argue that to achieve these objectives research on moral learning needs to 1) overcome nativist bias, and 2) distinguish two kinds of social learning: learning from and learning about. An agent learns from others when there is transfer of competence - what the learner learns is similar to, and causally dependent on, what the model knows. When an agent learns about the social world there is no transfer of competence - observable features of other agents are just the content of what-is-learned. Learning from does not require explicit instruction. A novice can learn from an expert who is ‘leaking’ her morality in the form of emotionally charged behaviour or involuntary use of vocabulary. To the extent that moral development depends on learning from other agents, there is the potential for cultural selection of moral beliefs and values

Fault Recovery in Swarm Robotics Systems using Learning Algorithms

Author: Oladiran Oyinlola OJUOLAPE
Publication venue: University of York
Publication date: 01/12/2019
Field of study

When faults occur in swarm robotic systems they can have a detrimental effect on collective behaviours, to the point that failed individuals may jeopardise the swarm's ability to complete its task. Although fault tolerance is a desirable property of swarm robotic systems, fault recovery mechanisms have not yet been thoroughly explored. Individual robots may suffer a variety of faults, which will affect collective behaviours in different ways, therefore a recovery process is required that can cope with many different failure scenarios. In this thesis, we propose a novel approach for fault recovery in robot swarms that uses Reinforcement Learning and Self-Organising Maps to select the most appropriate recovery strategy for any given scenario. The learning process is evaluated in both centralised and distributed settings. Additionally, we experimentally evaluate the performance of this approach in comparison to random selection of fault recovery strategies, using simulated collective phototaxis, aggregation and foraging tasks as case studies. Our results show that this machine learning approach outperforms random selection, and allows swarm robotic systems to recover from faults that would otherwise prevent the swarm from completing its mission. This work builds upon existing research in fault detection and diagnosis in robot swarms, with the aim of creating a fully fault-tolerant swarm capable of long-term autonomy

White Rose E-theses Online