222 research outputs found
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Reinforcement Learning algorithms require a large number of samples to solve
complex tasks with sparse and delayed rewards. Complex tasks can often be
hierarchically decomposed into sub-tasks. A step in the Q-function can be
associated with solving a sub-task, where the expectation of the return
increases. RUDDER has been introduced to identify these steps and then
redistribute reward to them, thus immediately giving reward if sub-tasks are
solved. Since the problem of delayed rewards is mitigated, learning is
considerably sped up. However, for complex tasks, current exploration
strategies as deployed in RUDDER struggle with discovering episodes with high
rewards. Therefore, we assume that episodes with high rewards are given as
demonstrations and do not have to be discovered by exploration. Typically the
number of demonstrations is small and RUDDER's LSTM model as a deep learning
method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER
with two major modifications. First, Align-RUDDER assumes that episodes with
high rewards are given as demonstrations, replacing RUDDER's safe exploration
and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile
model that is obtained from multiple sequence alignment of demonstrations.
Profile models can be constructed from as few as two demonstrations as known
from bioinformatics. Align-RUDDER inherits the concept of reward
redistribution, which considerably reduces the delay of rewards, thus speeding
up learning. Align-RUDDER outperforms competitors on complex artificial tasks
with delayed reward and few demonstrations. On the MineCraft ObtainDiamond
task, Align-RUDDER is able to mine a diamond, though not frequently. Github:
https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-U
Recommended from our members
End to End Learning in Autonomous Driving Systems
Convolutional neural networks have advanced visual perception significantly in recent years. Two major ingredients that enable such a success are the composition of simple modules into a complex network and the end to end optimization. However, such success has not yet revolutionized robotics as much as vision, even if robotics suffer from similar problems as traditional computer vision, i.e. imperfectness of the manual pipeline design of the system. This thesis investigates using end-to-end learning for the autonomous driving system, a concrete robotic application. End to end learning can produce reasonable driving behaviors, even in the complex urban driving scenarios. Representation learning in end-to-end driving models is crucial, and auxiliary vision tasks such as semantic segmentation can help to form a more informative driving representation especially when training data is limited. Naive convolutional neural networks are usually only capable of doing reactive control and can not involve complex reasoning in a particular scenario. This thesis also studies how to handle scene conditioned driving behavior, which goes beyond the capability of reactive control. Alongside the end-to-end structure, learning methods also play a critical role. Imitation learning methods will acquire meaningful behaviors but usually, the robot can not master the skill. Reinforcement learning, on the contrary, either barely learns anything if the environment is too complex, or it can master the skill otherwise. To get the best of both worlds, this thesis proposes an algorithmically unified method to learn from both demonstration data and the environment
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Large language models (LLMs) have dramatically enhanced the field of language
intelligence, as demonstrably evidenced by their formidable empirical
performance across a spectrum of complex reasoning tasks. Additionally,
theoretical proofs have illuminated their emergent reasoning capabilities,
providing a compelling showcase of their advanced cognitive abilities in
linguistic contexts. Critical to their remarkable efficacy in handling complex
reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning
techniques, obliging them to formulate intermediate steps en route to deriving
an answer. The CoT reasoning approach has not only exhibited proficiency in
amplifying reasoning performance but also in enhancing interpretability,
controllability, and flexibility. In light of these merits, recent research
endeavors have extended CoT reasoning methodologies to nurture the development
of autonomous language agents, which adeptly adhere to language instructions
and execute actions within varied environments. This survey paper orchestrates
a thorough discourse, penetrating vital research dimensions, encompassing: (i)
the foundational mechanics of CoT techniques, with a focus on elucidating the
circumstances and justification behind its efficacy; (ii) the paradigm shift in
CoT; and (iii) the burgeoning of language agents fortified by CoT approaches.
Prospective research avenues envelop explorations into generalization,
efficiency, customization, scaling, and safety. This paper caters to a wide
audience, including beginners seeking comprehensive knowledge of CoT reasoning
and language agents, as well as experienced researchers interested in
foundational mechanics and engaging in cutting-edge discussions on these
topics. A repository for the related papers is available at
https://github.com/Zoeyyao27/CoT-Igniting-Agent
Children's Third-Party Punishment Behaviour: The Roles of Deterrent Motives, Affective States and Moral Domains
Children engage in third-party punishment (3PP) from a young age in response to
harm and fairness violations. However, several areas about children’s 3PP are still
un-investigated: their motivations for engaging in 3PP; the emotional consequences of
enacting 3PP; and the effect of moral domains on 3PP.
In order to explore these topics, I developed two computerised paradigms: the
MegaAttack game and the Minecraft Justice System. The former was used with 5- to
11-year-olds in the UK (Experiments 1-2) and Colombia (Experiment 3); the latter with
British, Colombian and Italian 7- to 11-year-olds (Experiment 4). In both paradigms, as
players violated different types of moral norms, children were asked to judge their behaviour
and offered the opportunity to punish them. Additionally, in the Minecraft paradigm children
could also compensate the victims.
The type of transgression children watched did not fully predict their choice of 3PP
type in terms of moral domains (Experiments 1-2), but significantly affected their severity
and endorsement of 3PP (Experiment 4).
Children did not appear motivated by reputational concerns, as their 3PP severity was
not influenced by an audience, operationalised as cues of observation (Experiment 2) or
accountability (Experiment 3).
Children’s enjoyment of 3PP was generally low, although there were differences
across countries (Experiments 2-3).
In Experiment 4 children enjoyed compensating more than punishing. When asked
whether they endorsed deterrence or retribution as their 3PP motive, children overwhelmingly chose deterrence, irrespective of their country, age and framing
manipulation received. Reported deterrent motives, and lack of 3PP enjoyment or preference for
compensation, together suggest that children, differently from adults, are not motivated by
the retributive desire to see wrongdoers suffer.
Results have implications for theoretical accounts of the cognitive and affective
processes involved in 3PP, methodological implications for future research avenues and,
potentially, practical implications for the development of intervention studies
Multi-task Hierarchical Reinforcement Learning for Compositional Tasks
This thesis presents the algorithms for solve multiple compositional tasks with high sample efficiency and strong generalization ability.
Central to this work is the subtask graph which models the structure in compositional tasks into a graph form. We formulate the compositional tasks as a multi-task and meta-RL problems using the subtask graph and discuss different approaches to tackle the problem.
Specifically, we present four contributions, where the common idea is to exploit the inductive bias in the hierarchical task structure for efficien learning and strong generalization.
The first part of the thesis formally introduces the subtask graph execution problem: a modeling of the compositional task as an multi-task RL problem where the agent is given a task description input in a graph form as an additional input.
We present the hierarchical architecture where high-level policy determines the subtask to execute and low-level policy executes the given subtask. The high-level policy learns the modular neural network that can be dynamically assmbled according to the input task description to choose the optimal sequence of subtasks to maximize the reward.
We demonstrate that the proposed method can achieve a strong zero-shot task generalization ability, and also improve the search efficiency of existing planning method when combined together.
The second part studies the more general setting where the task structure is not available to agent such that the task should be inferred from the agent's own experience; ie, few-shot reinforcement learning setting.
Specifically, we combine the meta-reinforcemenet learning with an inductive logic programming (ILP) method to explicitly infer the latent task structure in terms of subtask graph from agent's trajectory.
Our empirical study shows that the underlying task structure can be accurately inferred from a small amount of environment interaction without any explicit supervision on complex 3D environments with high-dimensional state and actions space.
The third contribution extends thesecond contribution by transfer-learning the prior over the task structure from training tasks to the unseen testing task to achieve a faster adaptation. Although the meta-policy learned the general exploration strategy over the distribution of tasks, the task structure was independently inferred from scratch for each task in the previous part. We overcome such limitation by modeling the prior of the tasks from the subtask graph inferred via ILP, and transfer-learning the learned prior when performing the inference of novel test tasks. To achieve this, we propose a novel prior sampling and posterior update method to incorporate the knowledge learned from the seen task that is most relevant to the current task.
The last part investigates more indirect form of inductive bias that is implemented as a constraint on the trajectory rolled out by the policy in MDP.
We present a theoretical result proving that the proposed constraint preserves the optimality while reducing the policy search space.
Empirically, the proposed method improves the sample effciency of the policy gradient method on a wide range of challenging sparse-reward tasks.
Overall, this work formulates the hierarchical structure in the compositional tasks and provides the evidences that such structure exists in many important problems.
In addition, we present diverse principled approaches to exploit the inductive bias on the hierarchical structure in MDP in different problem settings and assumptions, and demonstrate the usefulness of such inductive bias when tackling compositional tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169951/1/srsohn_1.pd
Envisioning the future village: the role of digital technology in supporting more inclusive visions in the neighbourhood planning process
This thesis presents the development of a digitally aided Collaborative Envisioning Framework,
to support disenfranchised young people in contributing to a ‘shared vision’ of their
community’s future. Drawing from the research areas of planning, design, collaboration and
envisioning, this study sought to address the existing democratic deficit in local decision making
activities, by utilising the new potentials of digital technologies.
The research aim was to support communities, particularly disengaged young people, in
becoming involved with decision-making activities, namely generating a shared vision for a
neighbourhood plan. Since the radical policy changes to the National Planning Policy
Framework and Localism Act 2011, members of the public have been handed increased
responsibility and accountability in contributing to the local decisions affecting them.
However, the tools and resources have been criticized for not engaging and including all
sectors of the public, particularly young people (who arguably have the most to gain, or lose,
as a result of decisions made).
Using community and neighbourhood planning as a microcosm of a larger problem, this study
looked towards the potentials of digital tools as a way to address this democratic deficit. To
discover whether they offered anything more than existing tools, by helping young people to
contribute to the generation of a ‘shared vision’ (a requisite of a neighbourhood planning
application). It also addressed the assumption that the public had an understanding of what
creating a ‘shared vision’ entailed, and had the skills and knowledge required to create one. It
firstly identified envisioning as a design activity, which needs creativity, imagination, empathy,
collaboration, communication and deliberation, and then identified ‘designable factors’ such as
processes, tools (digital and non-digital), environments, and services which are able to support
these, focusing on which were most suitable for the young audience. The research also explored
behavior and motivation theories which guided the design of an envisioning framework.
To achieve this aim, a constructive design research methodology was adopted consisting of a
designed artefact - ‘The Collaborative Envisioning Framework’ which was utlised throughout
numerous workshops. The interactions between the workshop participants and the
envisioning framework generated multiple sets of qualitative data, which were analysed and
interpreted to form the next iteration of the framework. The research demonstrates that
existing tools and resources aimed at supporting inclusivity and meaningful visions for
neighbourhood plans are not, in their current form, adequate to firstly, engage the diverse
groups of people they should be including, and secondly, to support a generative, creative
activity of envisioning, and suggests that the use of digital tools (namely Ageing Booth App,
Morfo App, and Minecraft) offer something new.
The original contributions to knowledge are: an advancement of constructive design research
methodology; contributions to the discourse surrounding the purpose and value of visons within
community planning; and a practical ‘Collaborative Envisioning Framework’ which can be
followed by public sector and private organisations who seek to support communities in
producing ‘visons’ for their community
Digital Life Project: Autonomous 3D Characters with Social Intelligence
In this work, we present Digital Life Project, a framework utilizing language
as the universal medium to build autonomous 3D characters, who are capable of
engaging in social interactions and expressing with articulated body motions,
thereby simulating life in a digital environment. Our framework comprises two
primary components: 1) SocioMind: a meticulously crafted digital brain that
models personalities with systematic few-shot exemplars, incorporates a
reflection process based on psychology principles, and emulates autonomy by
initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis
paradigm for controlling the character's digital body. It integrates motion
matching, a proven industry technique to ensure motion quality, with
cutting-edge advancements in motion generation for diversity. Extensive
experiments demonstrate that each module achieves state-of-the-art performance
in its respective domain. Collectively, they enable virtual characters to
initiate and sustain dialogues autonomously, while evolving their
socio-psychological states. Concurrently, these characters can perform
contextually relevant bodily movements. Additionally, a motion captioning
module further allows the virtual character to recognize and appropriately
respond to human players' actions. Homepage: https://digital-life-project.com/Comment: Homepage: https://digital-life-project.com
- …