14,943 research outputs found
Knowledge-Based Reward Shaping with Knowledge Revision in Reinforcement Learning
Reinforcement learning has proven to be a successful artificial intelligence technique when an agent needs to act and improve in a given environment. The agent receives feedback about its behaviour in terms of rewards through constant interaction with the environment and in time manages to identify which actions are more beneficial for each situation.
Typically reinforcement learning assumes the agent has no prior knowledge about the environment it is acting on. Nevertheless, in many cases (potentially abstract and heuristic) domain knowledge of the reinforcement learning tasks is available by domain experts, and can be used toimprove the learning performance. One way of imparting knowledge to an agent is through reward shaping which guides an agent by providing additional rewards.
One common assumption when imparting knowledge to an agent, is that the domain knowledge is always correct. Given that the provided knowledge is of a heuristic nature, there are cases when this assumption is not met and it has been shown that in cases where the provided knowledge is wrong, the agent takes longer to learn the optimal policy. As reinforcement learning methods are shifting more towards informed agents, the assumption that expert domain knowledge is always correct needs to be relaxed in order to scale these methods to more complex, real-life scenarios. To accomplish that, the agents need to have a mechanism to deal with those cases where the provided expert knowledge is not perfect.
This thesis investigates and documents the adverse effects erroneous knowledge can have to the learning process of an agent if care is not taken. Moreover, it provides a novel approach to deal with erroneous knowledge through the use of knowledge revision principles, in order to allow agents to use their experiences to revise knowledge and thus benefit from more accurate shaping. Empirical evaluation shows that agents that are able to revise erroneous parts of the provided knowledge, can reach better policies faster when compared to agents that do not have knowledge revision capabilities
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature
Psychological aspects of motivation
The purpose of the article is to make people aware of what motivation and motivating employees are in organizations. The article discusses definitions and sources of motivation, historical and contemporary views on motivation, tools for comprehensive reward, remuneration as an instrument of motivational impact, success determinants and assessment of the motivation system, motivational impact of employee assessment and motivation in the face of organizational changes
D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning
While combining imitation learning (IL) and reinforcement learning (RL) is a
promising way to address poor sample efficiency in autonomous behavior
acquisition, methods that do so typically assume that the requisite behavior
demonstrations are provided by an expert that behaves optimally with respect to
a task reward. If, however, suboptimal demonstrations are provided, a
fundamental challenge appears in that the demonstration-matching objective of
IL conflicts with the return-maximization objective of RL. This paper
introduces D-Shape, a new method for combining IL and RL that uses ideas from
reward shaping and goal-conditioned RL to resolve the above conflict. D-Shape
allows learning from suboptimal demonstrations while retaining the ability to
find the optimal policy with respect to the task reward. We experimentally
validate D-Shape in sparse-reward gridworld domains, showing that it both
improves over RL in terms of sample efficiency and converges consistently to
the optimal policy in the presence of suboptimal demonstrations
Accelerating decision making under partial observability using learned action priors
Thesis (M.Sc.)--University of the Witwatersrand, Faculty of Science, School of Computer Science and Applied Mathematics, 2017.Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical
framework allowing a robot to reason about the consequences of actions and
observations with respect to the agent's limited perception of its environment. They
allow an agent to plan and act optimally in uncertain environments. Although they
have been successfully applied to various robotic tasks, they are infamous for their high
computational cost. This thesis demonstrates the use of knowledge transfer, learned
from previous experiences, to accelerate the learning of POMDP tasks. We propose
that in order for an agent to learn to solve these tasks quicker, it must be able to generalise
from past behaviours and transfer knowledge, learned from solving multiple tasks,
between di erent circumstances. We present a method for accelerating this learning
process by learning the statistics of action choices over the lifetime of an agent, known
as action priors. Action priors specify the usefulness of actions in situations and allow
us to bias exploration, which in turn improves the performance of the learning process.
Using navigation domains, we study the degree to which transferring knowledge
between tasks in this way results in a considerable speed up in solution times.
This thesis therefore makes the following contributions. We provide an algorithm
for learning action priors from a set of approximately optimal value functions and two
approaches with which a prior knowledge over actions can be used in a POMDP context.
As such, we show that considerable gains in speed can be achieved in learning subsequent
tasks using prior knowledge rather than learning from scratch. Learning with
action priors can particularly be useful in reducing the cost of exploration in the early
stages of the learning process as the priors can act as mechanism that allows the agent
to select more useful actions given particular circumstances. Thus, we demonstrate how
the initial losses associated with unguided exploration can be alleviated through the
use of action priors which allow for safer exploration. Additionally, we illustrate that
action priors can also improve the computation speeds of learning feasible policies in a
shorter period of time.MT201
RePLan: Robotic Replanning with Perception and Language Models
Advancements in large language models (LLMs) have demonstrated their
potential in facilitating high-level reasoning, logical reasoning and robotics
planning. Recently, LLMs have also been able to generate reward functions for
low-level robot actions, effectively bridging the interface between high-level
planning and low-level robot control. However, the challenge remains that even
with syntactically correct plans, robots can still fail to achieve their
intended goals due to imperfect plans or unexpected environmental issues. To
overcome this, Vision Language Models (VLMs) have shown remarkable success in
tasks such as visual question answering. Leveraging the capabilities of VLMs,
we present a novel framework called Robotic Replanning with Perception and
Language Models (RePLan) that enables online replanning capabilities for
long-horizon tasks. This framework utilizes the physical grounding provided by
a VLM's understanding of the world's state to adapt robot actions when the
initial plan fails to achieve the desired goal. We developed a Reasoning and
Control (RC) benchmark with eight long-horizon tasks to test our approach. We
find that RePLan enables a robot to successfully adapt to unforeseen obstacles
while accomplishing open-ended, long-horizon goals, where baseline models
cannot, and can be readily applied to real robots. Find more information at
https://replan-lm.github.io/replan.github.io
Psycho-correction of motivation for learning achievements among schoolchildren with cognitive development disorders
За міжнародною класифікацією хвороб 10-го перегляду до загальних ознак порушення когнітивного розвитку (F06.7) відносять різноманітні відносно легкі аномалії розвитку, що характеризуються незрілістю емоційно-вольових функцій, уповільненим темпом психічного розвитку, негрубими порушенням пізнавальної діяльності, особистісною незрілістю, які за структурою та якісними показниками відрізняються від розумової відсталості і мають тенденцію до компенсації та розвитку. Організація навчання дітей з порушеннями когнітивного розвитку залежить від того, наскільки їхній актуальний розвиток наближається до рівня готовності засвоювати шкільну програму, прийняти позицію учня. Як відомо, навчальна діяльність дитини формується на підґрунті поступового переходу від домінування неусвідомлених мотиваційних процесів до домінування усвідомлених мотивів діяльності. У підґрунті мотивації школярів з порушеннями когнітивного розвитку полягають потреби, які не спрямовують активність дітей на досягнення поставленої мети, водночас, унаслідок церебральних чи системних причин, процес мотивації у таких дітей протікає з певними особливостями і ускладненнями. Метою статті є висвітлення результатів дослідження формування якісних мотиваційних процесів, від яких залежить продуктивність пізнавальної діяльності, що сприяє підвищенню рівня реалізації можливостей дитини у вирішенні поставлених перед нею завдань. Реалізація цих завдань спонукала визначення критеріїв мотивації досягнення, до яких увійшли: співвіднесення заданих умов з уявним очікуваним результатом; вибір альтернативного варіанту дій; утворення наміру; контроль над задумом; оцінка мотиваційних процесів після завершення дії (оцінка міри досягнення поставленої мети). Експериментальна програма з формування мотивації досягнення школярів передбачала адаптацію/модифікацію таких методик як: «Комунікативна атака», «Дискусія», «Створення проблемної ситуації», «Створення ситуації успіху» (H. Eysenck, D. McClelland, Д.Ельконіна,В.Давидова, Ю. Тамберг, К. Поппер). Кількісний і якісний аналіз результатів дав змогу зробити висновок – розроблена психокорекційна програма з формування мотивації навчальних досягнень сприяла якісним зрушенням у розвитку всіх компонентів саморегуляції школярів з порушеннями когнітивного розвитку.The paper aims to highlight the results of the research into the development of qualitative motivational processes determining the performance of cognitive activity, which contributes to raising the level of realizing the child’s possibil-ities when solving the assigned tasks. The experimental program for developing the achievement motivation in school-children included adaptation/modification of such methods as “Communicative Attack”, “Discussion”, “Creating a Problem Situation”, and “Creating a Situation of Success” (H. Eysenck, D. McClelland, D. Elkonin, V. Davydov, Y. Tamberg, K. Popper). Quantitative and qualitative analysis of the results allowed to make a conclusion that the de-signed psycho-corrective program for creating the motivation of academic achievements has fostered qualitative changes in the development of all the components in self-regulation of students with cognitive development disorders
- …