14,943 research outputs found

    Knowledge-Based Reward Shaping with Knowledge Revision in Reinforcement Learning

    Get PDF
    Reinforcement learning has proven to be a successful artificial intelligence technique when an agent needs to act and improve in a given environment. The agent receives feedback about its behaviour in terms of rewards through constant interaction with the environment and in time manages to identify which actions are more beneficial for each situation. Typically reinforcement learning assumes the agent has no prior knowledge about the environment it is acting on. Nevertheless, in many cases (potentially abstract and heuristic) domain knowledge of the reinforcement learning tasks is available by domain experts, and can be used toimprove the learning performance. One way of imparting knowledge to an agent is through reward shaping which guides an agent by providing additional rewards. One common assumption when imparting knowledge to an agent, is that the domain knowledge is always correct. Given that the provided knowledge is of a heuristic nature, there are cases when this assumption is not met and it has been shown that in cases where the provided knowledge is wrong, the agent takes longer to learn the optimal policy. As reinforcement learning methods are shifting more towards informed agents, the assumption that expert domain knowledge is always correct needs to be relaxed in order to scale these methods to more complex, real-life scenarios. To accomplish that, the agents need to have a mechanism to deal with those cases where the provided expert knowledge is not perfect. This thesis investigates and documents the adverse effects erroneous knowledge can have to the learning process of an agent if care is not taken. Moreover, it provides a novel approach to deal with erroneous knowledge through the use of knowledge revision principles, in order to allow agents to use their experiences to revise knowledge and thus benefit from more accurate shaping. Empirical evaluation shows that agents that are able to revise erroneous parts of the provided knowledge, can reach better policies faster when compared to agents that do not have knowledge revision capabilities

    The Social Shaping of Technology

    Get PDF

    A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

    Get PDF
    A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature

    Psychological aspects of motivation

    Get PDF
    The purpose of the article is to make people aware of what motivation and motivating employees are in organizations. The article discusses definitions and sources of motivation, historical and contemporary views on motivation, tools for comprehensive reward, remuneration as an instrument of motivational impact, success determinants and assessment of the motivation system, motivational impact of employee assessment and motivation in the face of organizational changes

    D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

    Full text link
    While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward. If, however, suboptimal demonstrations are provided, a fundamental challenge appears in that the demonstration-matching objective of IL conflicts with the return-maximization objective of RL. This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict. D-Shape allows learning from suboptimal demonstrations while retaining the ability to find the optimal policy with respect to the task reward. We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy in the presence of suboptimal demonstrations

    Accelerating decision making under partial observability using learned action priors

    Get PDF
    Thesis (M.Sc.)--University of the Witwatersrand, Faculty of Science, School of Computer Science and Applied Mathematics, 2017.Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical framework allowing a robot to reason about the consequences of actions and observations with respect to the agent's limited perception of its environment. They allow an agent to plan and act optimally in uncertain environments. Although they have been successfully applied to various robotic tasks, they are infamous for their high computational cost. This thesis demonstrates the use of knowledge transfer, learned from previous experiences, to accelerate the learning of POMDP tasks. We propose that in order for an agent to learn to solve these tasks quicker, it must be able to generalise from past behaviours and transfer knowledge, learned from solving multiple tasks, between di erent circumstances. We present a method for accelerating this learning process by learning the statistics of action choices over the lifetime of an agent, known as action priors. Action priors specify the usefulness of actions in situations and allow us to bias exploration, which in turn improves the performance of the learning process. Using navigation domains, we study the degree to which transferring knowledge between tasks in this way results in a considerable speed up in solution times. This thesis therefore makes the following contributions. We provide an algorithm for learning action priors from a set of approximately optimal value functions and two approaches with which a prior knowledge over actions can be used in a POMDP context. As such, we show that considerable gains in speed can be achieved in learning subsequent tasks using prior knowledge rather than learning from scratch. Learning with action priors can particularly be useful in reducing the cost of exploration in the early stages of the learning process as the priors can act as mechanism that allows the agent to select more useful actions given particular circumstances. Thus, we demonstrate how the initial losses associated with unguided exploration can be alleviated through the use of action priors which allow for safer exploration. Additionally, we illustrate that action priors can also improve the computation speeds of learning feasible policies in a shorter period of time.MT201

    RePLan: Robotic Replanning with Perception and Language Models

    Full text link
    Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactically correct plans, robots can still fail to achieve their intended goals due to imperfect plans or unexpected environmental issues. To overcome this, Vision Language Models (VLMs) have shown remarkable success in tasks such as visual question answering. Leveraging the capabilities of VLMs, we present a novel framework called Robotic Replanning with Perception and Language Models (RePLan) that enables online replanning capabilities for long-horizon tasks. This framework utilizes the physical grounding provided by a VLM's understanding of the world's state to adapt robot actions when the initial plan fails to achieve the desired goal. We developed a Reasoning and Control (RC) benchmark with eight long-horizon tasks to test our approach. We find that RePLan enables a robot to successfully adapt to unforeseen obstacles while accomplishing open-ended, long-horizon goals, where baseline models cannot, and can be readily applied to real robots. Find more information at https://replan-lm.github.io/replan.github.io

    Psycho-correction of motivation for learning achievements among schoolchildren with cognitive development disorders

    Get PDF
    За міжнародною класифікацією хвороб 10-го перегляду до загальних ознак порушення когнітивного розвитку (F06.7) відносять різноманітні відносно легкі аномалії розвитку, що характеризуються незрілістю емоційно-вольових функцій, уповільненим темпом психічного розвитку, негрубими порушенням пізнавальної діяльності, особистісною незрілістю, які за структурою та якісними показниками відрізняються від розумової відсталості і мають тенденцію до компенсації та розвитку. Організація навчання дітей з порушеннями когнітивного розвитку залежить від того, наскільки їхній актуальний розвиток наближається до рівня готовності засвоювати шкільну програму, прийняти позицію учня. Як відомо, навчальна діяльність дитини формується на підґрунті поступового переходу від домінування неусвідомлених мотиваційних процесів до домінування усвідомлених мотивів діяльності. У підґрунті мотивації школярів з порушеннями когнітивного розвитку полягають потреби, які не спрямовують активність дітей на досягнення поставленої мети, водночас, унаслідок церебральних чи системних причин, процес мотивації у таких дітей протікає з певними особливостями і ускладненнями. Метою статті є висвітлення результатів дослідження формування якісних мотиваційних процесів, від яких залежить продуктивність пізнавальної діяльності, що сприяє підвищенню рівня реалізації можливостей дитини у вирішенні поставлених перед нею завдань. Реалізація цих завдань спонукала визначення критеріїв мотивації досягнення, до яких увійшли: співвіднесення заданих умов з уявним очікуваним результатом; вибір альтернативного варіанту дій; утворення наміру; контроль над задумом; оцінка мотиваційних процесів після завершення дії (оцінка міри досягнення поставленої мети). Експериментальна програма з формування мотивації досягнення школярів передбачала адаптацію/модифікацію таких методик як: «Комунікативна атака», «Дискусія», «Створення проблемної ситуації», «Створення ситуації успіху» (H. Eysenck, D. McClelland, Д.Ельконіна,В.Давидова, Ю. Тамберг, К. Поппер). Кількісний і якісний аналіз результатів дав змогу зробити висновок – розроблена психокорекційна програма з формування мотивації навчальних досягнень сприяла якісним зрушенням у розвитку всіх компонентів саморегуляції школярів з порушеннями когнітивного розвитку.The paper aims to highlight the results of the research into the development of qualitative motivational processes determining the performance of cognitive activity, which contributes to raising the level of realizing the child’s possibil-ities when solving the assigned tasks. The experimental program for developing the achievement motivation in school-children included adaptation/modification of such methods as “Communicative Attack”, “Discussion”, “Creating a Problem Situation”, and “Creating a Situation of Success” (H. Eysenck, D. McClelland, D. Elkonin, V. Davydov, Y. Tamberg, K. Popper). Quantitative and qualitative analysis of the results allowed to make a conclusion that the de-signed psycho-corrective program for creating the motivation of academic achievements has fostered qualitative changes in the development of all the components in self-regulation of students with cognitive development disorders
    corecore