Search CORE

8,396 research outputs found

MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning

Author: Fu Haotian
Tang Hongyao
Hao Jianye
Liu Wulong
Chen Chen
Publication venue
Publication date: 04/03/2020
Field of study

Most meta reinforcement learning (meta-RL) methods learn to adapt to new tasks by directly optimizing the parameters of policies over primitive action space. Such algorithms work well in tasks with relatively slight difference. However, when the task distribution becomes wider, it would be quite inefficient to directly learn such a meta-policy. In this paper, we propose a new meta-RL algorithm called Meta Goal-generation for Hierarchical RL (MGHRL). Instead of directly generating policies over primitive action space for new tasks, MGHRL learns to generate high-level meta strategies over subgoals given past experience and leaves the rest of how to achieve subgoals as independent RL subtasks. Our empirical results on several challenging simulated robotics environments show that our method enables more efficient and generalized meta-learning from past experience.Comment: Accepted to the ICLR 2020 workshop: Beyond tabula rasa in RL (BeTR-RL

arXiv.org e-Print Archive

Crossref

What is Strategic Competence and Does it Matter? Exposition of the Concept and a Research Agenda

Author: Hodgkinson Gerard P.
Sparrow Paul R.
Publication venue: DigitalCommons@ILR
Publication date: 01/01/2006
Field of study

Drawing on a range of theoretical and empirical insights from strategic management and the cognitive and organizational sciences, we argue that strategic competence constitutes the ability of organizations and the individuals who operate within them to work within their cognitive limitations in such a way that they are able to maintain an appropriate level of responsiveness to the contingencies confronting them. Using the language of the resource based view of the firm, we argue that this meta-level competence represents a confluence of individual and organizational characteristics, suitably configured to enable the detection of those weak signals indicative of the need for change and to act accordingly, thereby minimising the dangers of cognitive bias and cognitive inertia. In an era of unprecedented informational burdens and instability, we argue that this competence is central to the longer-term survival and well being of the organization. We conclude with a consideration of the major scientific challenges that lie ahead, if the ideas contained within this paper are to be validated

DigitalCommons@ILR

eCommons@Cornell

An Ecological and Longitudinal Perspective

Author: Thoma Anna Isabel
Publication venue: Humboldt-Universität zu Berlin
Publication date: 07/09/2023
Field of study

Von der Entscheidung für ein Spiel bis zur Wahl einer Taktik, um die Schlafenszeit hinauszuzögern - wiederholte Entscheidungen sind für Kinder allgegenwärtig. Zwei paradigmatische Entscheidungsphänomene sind probability matching (dt. Angleichen der Wahrscheinlichkeit) und Maximieren. Um Belohnungen zu maximieren, sollte eine Person ausschließlich die Option auswählen, welche die höchste Wahrscheinlichkeit hat. Maximieren wird allgemein al ökonomisch rationales Verhalten angesehen. Probability matching beschreibt, dass eine Person jede Option mit der Wahrscheinlichkeit auswählt, wie deren zugrunde liegende Wahrscheinlichkeit einer Belohnung ist. Ob es sich bei probability matching um einen Fehlschluss oder einen adaptiven Mechanismus handelt, ist umstritten. Frühere Forschung zu probabilistischem Lernen zeigte das paradoxe Ergebnis, dass jüngere Kinder eher maximieren als ältere Kinder. Von älteren Kindern nimmt man hingegen an, dass sie probability matchen. Dabei wurde jedoch kaum berücksichtigt, dass Kinder die Struktur der Umwelt zu ihrem Vorteil nutzen können. Diese Dissertation untersucht die inter- und intraindividuelle Entwicklung des probabilistischen Lernens in der Kindheit unter ökologischen und kognitiven Aspekten. Vier empirischen Kapitel zeigen, dass die Interaktion zwischen heranreifenden kognitiven Funktionen, sowie Merkmalen der Lern- und Entscheidungsumgebung die Entwicklung des adaptiven Entscheidungsverhaltens prägt. Die Entwicklung des probabilistischen Lernens durchläuft in der Kindheit mehrere Phasen: von hoher Persistenz, aber auch hoher interindividueller Variabilität bei jüngeren Kindern zu wachsender Anpassungsfähigkeit durch zunehmende Diversifizierung und Exploration bei älteren Kindern. Die Ergebnisse dieser Dissertation unterstreichen insbesondere den Nutzen einer ökologischen Rationalitätsperspektive bei der Erforschung der Entwicklung des Entscheidungsvermögens.From choosing which game to play to deciding how to effectively delay bedtime—making repeated choices is a ubiquitous part of childhood. Two often contrasted paradigmatic choice behaviors are probability matching and maximizing. Maximizing, described as consistently choosing the option with the highest reward probability, has traditionally been considered economically rational. Probability matching, in contrast, described by proportionately matching choices to underlying reward probabilities, is debated whether it reflects a mistake or an adaptive mechanism. Previous research on the development of probability learning and repeated choice revealed considerable change across childhood and reported the paradoxical finding that younger children are more likely to maximize—outperforming older children who are thought to be more likely to probability match. However, this line of research largely disregarded the mind’s ability to capitalize on the structure of the environment. In this dissertation, I investigate the inter- and intra-individual development of probability learning and repeated choice behavior in childhood under consideration of ecological, cognitive, and methodological aspects. Four empirical chapters demonstrate that the interaction between the maturing mind and characteristics of the learning and choice environment shapes the development of adaptive choice behavior. The development of probability learning and repeated choice behavior in childhood progresses from high persistence but also high inter-individual variability to emerging adaptivity marked by increased diversification and exploration. The present research highlights the benefit of taking an ecological rationality view in research on the development of decision making abilities

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Decision tree learning for intelligent mobile robot navigation

Author: G. Hossein Shah Hamzei (7202189)
Publication venue
Publication date: 01/01/1998
Field of study

The replication of human intelligence, learning and reasoning by means of computer algorithms is termed Artificial Intelligence (Al) and the interaction of such algorithms with the physical world can be achieved using robotics. The work described in this thesis investigates the applications of concept learning (an approach which takes its inspiration from biological motivations and from survival instincts in particular) to robot control and path planning. The methodology of concept learning has been applied using learning decision trees (DTs) which induce domain knowledge from a finite set of training vectors which in turn describe systematically a physical entity and are used to train a robot to learn new concepts and to adapt its behaviour. To achieve behaviour learning, this work introduces the novel approach of hierarchical learning and knowledge decomposition to the frame of the reactive robot architecture. Following the analogy with survival instincts, the robot is first taught how to survive in very simple and homogeneous environments, namely a world without any disturbances or any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex environments by adding further worlds to its existing knowledge. The repertoire of the robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered decision trees (DTs) accommodating a number of primitives. To classify robot perceptions, control rules are synthesised using symbolic knowledge derived from searching the hierarchy of DTs. A second novel concept is introduced, namely that of multi-dimensional fuzzy associative memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs. In this thesis, the feasibility of the developed techniques is illustrated in the robot applications, their benefits and drawbacks are discussed

Loughborough University Institutional Repository

Recommended from our members

Neurocognitive Mechanisms of Learning and Decision-Making in Adolescent-OCD: A Computational Approach

Author: Aziz Marzuki Aleya
Publication venue: University of Cambridge
Publication date: 22/05/2021
Field of study

Early-onset obsessive-compulsive disorder (OCD) is substantially less researched than adult-OCD, resulting in prevalent equivocation surrounding the neurocognitive profile of child-OCD. Research into this area is pivotal as population studies report that youths with OCD struggle significantly in academic settings. In the General Introduction of this thesis, I reviewed existing literature and found that strikingly, young patients do not show impairment on features that are considered both hallmarks of adult OCD and tightly linked to disorder symptomatology, such as response inhibition and cognitive flexibility. Among the characteristics that are thought to be present in children and adolescents with OCD are abnormal decision-making under uncertainty and impaired learning, and I decided to focus on these features as they may be driving poor academic attainment in young people with the disorder. In addition, I sought to investigate other cognitive processes that have not been well-researched in adolescent-OCD but are found to be robustly altered in adult OCD such as goal directed/model-based reasoning, meta-cognition, and feedback sensitivity. I aimed to delineate these various processes using a battery of suitably complex cognitive tasks. Moreover, I highlighted that majority of past studies fail to find differences between young patients and controls due to behavioural signatures being too subtle to be uncovered by standard statistical analyses. Hence, I employed computational modelling of cognitive task data to disentangle latent decision-making processes displayed by adolescents with OCD. In Chapter 2, I modelled data from the Wisconsin Card Sorting task, a frequently used paradigm of cognitive flexibility, and confirmed that youths with OCD show equivalent performance on the task to controls. Only patients on serotonergic medication showed increased response latencies and a tendency to make unique errors (choosing a deck associated with no rule present on the test card). Next, in Chapter 3, I sought to understand instrumental and Pavlovian learning, and whether adolescents with OCD show increased punishment sensitivity on a novel aversive Pavlovian-to Instrumental Transfer paradigm. Once again, patient performance was equivalent to that of controls. Hence, the remaining chapters were dedicated to probing behaviour on probabilistic paradigms. In Chapter 4, I formally investigated model-based and model-free learning using a well-validated two step decision-making task, and fit a reinforcement learning drift diffusion model to both choice and reaction time data. Patients showed increased exploration on the task as well as faster and more erratic decisions compared to controls. Nonetheless, model-based learning was equivalent between groups. In the penultimate chapter, I demonstrate on a predictive-inference task that patients with OCD update their choices more frequently compared to controls independent of prediction error magnitude. Finally, in Chapter 6, I administered a probabilistic reversal learning paradigm to a large sample of 50 adolescent patients and 53 matched controls. Standard analyses revealed a significant reversal learning deficit in patients with OCD, wherein they displayed more errors and a lower propensity to repeat choices following positive feedback during the post-reversal phase. Crucially, computational modelling revealed striking group differences where adolescents with OCD displayed elevated reward learning and lower punishment learning, increased exploration, and decreased perseveration compared to controls. In the General Discussion, I emphasise that atypical learning and decision-making in adolescent-OCD are more pronounced on probabilistic tasks, where task environments are more volatile. Results are partly discussed in the context of the uncertainty model of OCD, where subjective feelings of doubt experienced by patients drive compulsive behaviours such as checking and certainty-seeking in daily life, alongside excessive exploration on probabilistic tasks. I also consider various explanations for cognitive distinctions between adult- and adolescent OCD. More general implications of the findings are discussed for understanding OCD in the context of adolescent development and for treatment/support strategies.WELLCOME TRUST (104631/Z/14/Z

Apollo (Cambridge)

Hierarchical control over effortful behavior by rodent medial frontal cortex : a computational model

Author: Holroyd Clay
McClure Samuel M.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2015
Field of study

The anterior cingulate cortex (ACC) has been the focus of intense research interest in recent years. Although separate theories relate ACC function variously to conflict monitoring, reward processing, action selection, decision making, and more, damage to the ACC mostly spares performance on tasks that exercise these functions, indicating that they are not in fact unique to the ACC. Further, most theories do not address the most salient consequence of ACC damage: impoverished action generation in the presence of normal motor ability. In this study we develop a computational model of the rodent medial prefrontal cortex that accounts for the behavioral sequelae of ACC damage, unifies many of the cognitive functions attributed to it, and provides a solution to an outstanding question in cognitive control research-how the control system determines and motivates what tasks to perform. The theory derives from recent developments in the formal study of hierarchical control and learning that highlight computational efficiencies afforded when collections of actions are represented based on their conjoint goals. According to this position, the ACC utilizes reward information to select tasks that are then accomplished through top-down control over action selection by the striatum. Computational simulations capture animal lesion data that implicate the medial prefrontal cortex in regulating physical and cognitive effort. Overall, this theory provides a unifying theoretical framework for understanding the ACC in terms of the pivotal role it plays in the hierarchical organization of effortful behavior

Ghent University Academic Bibliography