3,450 research outputs found

    Genetic Programming for Smart Phone Personalisation

    Full text link
    Personalisation in smart phones requires adaptability to dynamic context based on user mobility, application usage and sensor inputs. Current personalisation approaches, which rely on static logic that is developed a priori, do not provide sufficient adaptability to dynamic and unexpected context. This paper proposes genetic programming (GP), which can evolve program logic in realtime, as an online learning method to deal with the highly dynamic context in smart phone personalisation. We introduce the concept of collaborative smart phone personalisation through the GP Island Model, in order to exploit shared context among co-located phone users and reduce convergence time. We implement these concepts on real smartphones to demonstrate the capability of personalisation through GP and to explore the benefits of the Island Model. Our empirical evaluations on two example applications confirm that the Island Model can reduce convergence time by up to two-thirds over standalone GP personalisation.Comment: 43 pages, 11 figure

    Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

    Get PDF
    Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved

    Fast Damage Recovery in Robotics with the T-Resilience Algorithm

    Full text link
    Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require anticipating each potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience algorithm, a new algorithm that allows robots to quickly and autonomously discover compensatory behaviors in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behaviors by learning to avoid those that perform differently in the self-model and in reality. Our algorithm thus does not identify the damaged parts but it implicitly searches for efficient behaviors that do not use them. We evaluate the T-Resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we compare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 minutes, T-Resilience consistently leads to substantially better results than the other approaches

    A Shared Task on Bandit Learning for Machine Translation

    Full text link
    We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task's learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.Comment: Conference on Machine Translation (WMT) 201

    Is morality a gadget? Nature, nurture and culture in moral development

    Get PDF
    Research on ‘moral learning’ examines the roles of domain-general processes, such as Bayesian inference and reinforcement learning, in the development of moral beliefs and values. Alert to the power of these processes and equipped with both the analytic resources of philosophy and the empirical methods of psychology, ‘moral learners’ are ideally placed to discover the contributions of nature, nurture and culture to moral development. However, I argue that to achieve these objectives research on moral learning needs to 1) overcome nativist bias, and 2) distinguish two kinds of social learning: learning from and learning about. An agent learns from others when there is transfer of competence - what the learner learns is similar to, and causally dependent on, what the model knows. When an agent learns about the social world there is no transfer of competence - observable features of other agents are just the content of what-is-learned. Learning from does not require explicit instruction. A novice can learn from an expert who is ‘leaking’ her morality in the form of emotionally charged behaviour or involuntary use of vocabulary. To the extent that moral development depends on learning from other agents, there is the potential for cultural selection of moral beliefs and values

    Fault Recovery in Swarm Robotics Systems using Learning Algorithms

    Get PDF
    When faults occur in swarm robotic systems they can have a detrimental effect on collective behaviours, to the point that failed individuals may jeopardise the swarm's ability to complete its task. Although fault tolerance is a desirable property of swarm robotic systems, fault recovery mechanisms have not yet been thoroughly explored. Individual robots may suffer a variety of faults, which will affect collective behaviours in different ways, therefore a recovery process is required that can cope with many different failure scenarios. In this thesis, we propose a novel approach for fault recovery in robot swarms that uses Reinforcement Learning and Self-Organising Maps to select the most appropriate recovery strategy for any given scenario. The learning process is evaluated in both centralised and distributed settings. Additionally, we experimentally evaluate the performance of this approach in comparison to random selection of fault recovery strategies, using simulated collective phototaxis, aggregation and foraging tasks as case studies. Our results show that this machine learning approach outperforms random selection, and allows swarm robotic systems to recover from faults that would otherwise prevent the swarm from completing its mission. This work builds upon existing research in fault detection and diagnosis in robot swarms, with the aim of creating a fully fault-tolerant swarm capable of long-term autonomy
    • …
    corecore