Search CORE

222 research outputs found

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Author: Arjona-Medina Jose A.
Blies Patrick M.
Brandstetter Johannes
Dinu Marius-Constantin
Dorfer Matthias
Hochreiter Sepp
Hofmarcher Markus
Patil Vihang P.
Publication venue
Publication date: 29/09/2020
Field of study

Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced to identify these steps and then redistribute reward to them, thus immediately giving reward if sub-tasks are solved. Since the problem of delayed rewards is mitigated, learning is considerably sped up. However, for complex tasks, current exploration strategies as deployed in RUDDER struggle with discovering episodes with high rewards. Therefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration. Typically the number of demonstrations is small and RUDDER's LSTM model as a deep learning method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER with two major modifications. First, Align-RUDDER assumes that episodes with high rewards are given as demonstrations, replacing RUDDER's safe exploration and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. Profile models can be constructed from as few as two demonstrations as known from bioinformatics. Align-RUDDER inherits the concept of reward redistribution, which considerably reduces the delay of rewards, thus speeding up learning. Align-RUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations. On the MineCraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Github: https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-U

arXiv.org e-Print Archive

Recommended from our members

End to End Learning in Autonomous Driving Systems

Author: Gao Yang
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Convolutional neural networks have advanced visual perception significantly in recent years. Two major ingredients that enable such a success are the composition of simple modules into a complex network and the end to end optimization. However, such success has not yet revolutionized robotics as much as vision, even if robotics suffer from similar problems as traditional computer vision, i.e. imperfectness of the manual pipeline design of the system. This thesis investigates using end-to-end learning for the autonomous driving system, a concrete robotic application. End to end learning can produce reasonable driving behaviors, even in the complex urban driving scenarios. Representation learning in end-to-end driving models is crucial, and auxiliary vision tasks such as semantic segmentation can help to form a more informative driving representation especially when training data is limited. Naive convolutional neural networks are usually only capable of doing reactive control and can not involve complex reasoning in a particular scenario. This thesis also studies how to handle scene conditioned driving behavior, which goes beyond the capability of reactive control. Alongside the end-to-end structure, learning methods also play a critical role. Imitation learning methods will acquire meaningful behaviors but usually, the robot can not master the skill. Reinforcement learning, on the contrary, either barely learns anything if the environment is too complex, or it can master the skill otherwise. To get the best of both worlds, this thesis proposes an algorithmically unified method to learn from both demonstration data and the environment

eScholarship - University of California

Proceedings of the Fifth ERME TOPIC CONFERENCE (ETC 5) on Mathematics Education in the Digital Age (MEDA), 5-7 September 2018, Copenhagen, Denmark

Author
Publication venue: University of Copenhagen
Publication date: 01/01/2018
Field of study

Copenhagen University Research Information System

Proceedings of the European Society for Mathematics Education 5th Topic Conference: Mathematics Education in the Digital Era

Author
Publication venue: University of Copenhagen
Publication date: 14/09/2018
Field of study

UCL Discovery

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents

Author: Gerstein Mark
He Zhiwei
Liu Gongshen
Ma Xinbei
Tang Xiangru
Wang Rui
Wang Yiming
Yao Yao
Zhang Aston
Zhang Zhuosheng
Zhao Hai
Publication venue
Publication date: 20/11/2023
Field of study

Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents, which adeptly adhere to language instructions and execute actions within varied environments. This survey paper orchestrates a thorough discourse, penetrating vital research dimensions, encompassing: (i) the foundational mechanics of CoT techniques, with a focus on elucidating the circumstances and justification behind its efficacy; (ii) the paradigm shift in CoT; and (iii) the burgeoning of language agents fortified by CoT approaches. Prospective research avenues envelop explorations into generalization, efficiency, customization, scaling, and safety. This paper caters to a wide audience, including beginners seeking comprehensive knowledge of CoT reasoning and language agents, as well as experienced researchers interested in foundational mechanics and engaging in cutting-edge discussions on these topics. A repository for the related papers is available at https://github.com/Zoeyyao27/CoT-Igniting-Agent

arXiv.org e-Print Archive

Children's Third-Party Punishment Behaviour: The Roles of Deterrent Motives, Affective States and Moral Domains

Author: Arini Rhea Luana
Publication venue: Oxford Brookes University
Publication date: 01/01/2020
Field of study

Children engage in third-party punishment (3PP) from a young age in response to harm and fairness violations. However, several areas about children’s 3PP are still un-investigated: their motivations for engaging in 3PP; the emotional consequences of enacting 3PP; and the effect of moral domains on 3PP. In order to explore these topics, I developed two computerised paradigms: the MegaAttack game and the Minecraft Justice System. The former was used with 5- to 11-year-olds in the UK (Experiments 1-2) and Colombia (Experiment 3); the latter with British, Colombian and Italian 7- to 11-year-olds (Experiment 4). In both paradigms, as players violated different types of moral norms, children were asked to judge their behaviour and offered the opportunity to punish them. Additionally, in the Minecraft paradigm children could also compensate the victims. The type of transgression children watched did not fully predict their choice of 3PP type in terms of moral domains (Experiments 1-2), but significantly affected their severity and endorsement of 3PP (Experiment 4). Children did not appear motivated by reputational concerns, as their 3PP severity was not influenced by an audience, operationalised as cues of observation (Experiment 2) or accountability (Experiment 3). Children’s enjoyment of 3PP was generally low, although there were differences across countries (Experiments 2-3). In Experiment 4 children enjoyed compensating more than punishing. When asked whether they endorsed deterrence or retribution as their 3PP motive, children overwhelmingly chose deterrence, irrespective of their country, age and framing manipulation received. Reported deterrent motives, and lack of 3PP enjoyment or preference for compensation, together suggest that children, differently from adults, are not motivated by the retributive desire to see wrongdoers suffer. Results have implications for theoretical accounts of the cognitive and affective processes involved in 3PP, methodological implications for future research avenues and, potentially, practical implications for the development of intervention studies

Oxford Brookes University: RADAR

Multi-task Hierarchical Reinforcement Learning for Compositional Tasks

Author: Sohn Sungryull
Publication venue
Publication date: 01/01/2021
Field of study

This thesis presents the algorithms for solve multiple compositional tasks with high sample efficiency and strong generalization ability. Central to this work is the subtask graph which models the structure in compositional tasks into a graph form. We formulate the compositional tasks as a multi-task and meta-RL problems using the subtask graph and discuss different approaches to tackle the problem. Specifically, we present four contributions, where the common idea is to exploit the inductive bias in the hierarchical task structure for efficien learning and strong generalization. The first part of the thesis formally introduces the subtask graph execution problem: a modeling of the compositional task as an multi-task RL problem where the agent is given a task description input in a graph form as an additional input. We present the hierarchical architecture where high-level policy determines the subtask to execute and low-level policy executes the given subtask. The high-level policy learns the modular neural network that can be dynamically assmbled according to the input task description to choose the optimal sequence of subtasks to maximize the reward. We demonstrate that the proposed method can achieve a strong zero-shot task generalization ability, and also improve the search efficiency of existing planning method when combined together. The second part studies the more general setting where the task structure is not available to agent such that the task should be inferred from the agent's own experience; ie, few-shot reinforcement learning setting. Specifically, we combine the meta-reinforcemenet learning with an inductive logic programming (ILP) method to explicitly infer the latent task structure in terms of subtask graph from agent's trajectory. Our empirical study shows that the underlying task structure can be accurately inferred from a small amount of environment interaction without any explicit supervision on complex 3D environments with high-dimensional state and actions space. The third contribution extends thesecond contribution by transfer-learning the prior over the task structure from training tasks to the unseen testing task to achieve a faster adaptation. Although the meta-policy learned the general exploration strategy over the distribution of tasks, the task structure was independently inferred from scratch for each task in the previous part. We overcome such limitation by modeling the prior of the tasks from the subtask graph inferred via ILP, and transfer-learning the learned prior when performing the inference of novel test tasks. To achieve this, we propose a novel prior sampling and posterior update method to incorporate the knowledge learned from the seen task that is most relevant to the current task. The last part investigates more indirect form of inductive bias that is implemented as a constraint on the trajectory rolled out by the policy in MDP. We present a theoretical result proving that the proposed constraint preserves the optimality while reducing the policy search space. Empirically, the proposed method improves the sample effciency of the policy gradient method on a wide range of challenging sparse-reward tasks. Overall, this work formulates the hierarchical structure in the compositional tasks and provides the evidences that such structure exists in many important problems. In addition, we present diverse principled approaches to exploit the inductive bias on the hierarchical structure in MDP in different problem settings and assumptions, and demonstrate the usefulness of such inductive bias when tackling compositional tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169951/1/srsohn_1.pd

Deep Blue Documents at the University of Michigan

Envisioning the future village: the role of digital technology in supporting more inclusive visions in the neighbourhood planning process

Author: Duggan Kelly
Publication venue
Publication date: 01/12/2017
Field of study

This thesis presents the development of a digitally aided Collaborative Envisioning Framework, to support disenfranchised young people in contributing to a ‘shared vision’ of their community’s future. Drawing from the research areas of planning, design, collaboration and envisioning, this study sought to address the existing democratic deficit in local decision making activities, by utilising the new potentials of digital technologies. The research aim was to support communities, particularly disengaged young people, in becoming involved with decision-making activities, namely generating a shared vision for a neighbourhood plan. Since the radical policy changes to the National Planning Policy Framework and Localism Act 2011, members of the public have been handed increased responsibility and accountability in contributing to the local decisions affecting them. However, the tools and resources have been criticized for not engaging and including all sectors of the public, particularly young people (who arguably have the most to gain, or lose, as a result of decisions made). Using community and neighbourhood planning as a microcosm of a larger problem, this study looked towards the potentials of digital tools as a way to address this democratic deficit. To discover whether they offered anything more than existing tools, by helping young people to contribute to the generation of a ‘shared vision’ (a requisite of a neighbourhood planning application). It also addressed the assumption that the public had an understanding of what creating a ‘shared vision’ entailed, and had the skills and knowledge required to create one. It firstly identified envisioning as a design activity, which needs creativity, imagination, empathy, collaboration, communication and deliberation, and then identified ‘designable factors’ such as processes, tools (digital and non-digital), environments, and services which are able to support these, focusing on which were most suitable for the young audience. The research also explored behavior and motivation theories which guided the design of an envisioning framework. To achieve this aim, a constructive design research methodology was adopted consisting of a designed artefact - ‘The Collaborative Envisioning Framework’ which was utlised throughout numerous workshops. The interactions between the workshop participants and the envisioning framework generated multiple sets of qualitative data, which were analysed and interpreted to form the next iteration of the framework. The research demonstrates that existing tools and resources aimed at supporting inclusivity and meaningful visions for neighbourhood plans are not, in their current form, adequate to firstly, engage the diverse groups of people they should be including, and secondly, to support a generative, creative activity of envisioning, and suggests that the use of digital tools (namely Ageing Booth App, Morfo App, and Minecraft) offer something new. The original contributions to knowledge are: an advancement of constructive design research methodology; contributions to the discourse surrounding the purpose and value of visons within community planning; and a practical ‘Collaborative Envisioning Framework’ which can be followed by public sector and private organisations who seek to support communities in producing ‘visons’ for their community

University of Brighton Research Portal

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Author: Cai Zhongang
Du Han
Fan Xiangyu
Gao Peng
Gao Yang
Guo Xinying
Jiang Jianping
Li Jiaqi
Lin Zhengyu
Liu Ziwei
Loy Chen Change
Mei Haiyi
Pan Liang
Qing Zhongfei
Ren Tianxiang
Wang Ruisi
Wang Xiaogang
Wei Chen
Wei Yukun
Yang Lei
Yang Zhitao
Yin Wanqi
Zhang Mingyuan
Publication venue
Publication date: 07/12/2023
Field of study

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-the-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, a motion captioning module further allows the virtual character to recognize and appropriately respond to human players' actions. Homepage: https://digital-life-project.com/Comment: Homepage: https://digital-life-project.com

arXiv.org e-Print Archive