8,396 research outputs found
MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning
Most meta reinforcement learning (meta-RL) methods learn to adapt to new
tasks by directly optimizing the parameters of policies over primitive action
space. Such algorithms work well in tasks with relatively slight difference.
However, when the task distribution becomes wider, it would be quite
inefficient to directly learn such a meta-policy. In this paper, we propose a
new meta-RL algorithm called Meta Goal-generation for Hierarchical RL (MGHRL).
Instead of directly generating policies over primitive action space for new
tasks, MGHRL learns to generate high-level meta strategies over subgoals given
past experience and leaves the rest of how to achieve subgoals as independent
RL subtasks. Our empirical results on several challenging simulated robotics
environments show that our method enables more efficient and generalized
meta-learning from past experience.Comment: Accepted to the ICLR 2020 workshop: Beyond tabula rasa in RL
(BeTR-RL
What is Strategic Competence and Does it Matter? Exposition of the Concept and a Research Agenda
Drawing on a range of theoretical and empirical insights from strategic management and the cognitive and organizational sciences, we argue that strategic competence constitutes the ability of organizations and the individuals who operate within them to work within their cognitive limitations in such a way that they are able to maintain an appropriate level of responsiveness to the contingencies confronting them. Using the language of the resource based view of the firm, we argue that this meta-level competence represents a confluence of individual and organizational characteristics, suitably configured to enable the detection of those weak signals indicative of the need for change and to act accordingly, thereby minimising the dangers of cognitive bias and cognitive inertia. In an era of unprecedented informational burdens and instability, we argue that this competence is central to the longer-term survival and well being of the organization. We conclude with a consideration of the major scientific challenges that lie ahead, if the ideas contained within this paper are to be validated
An Ecological and Longitudinal Perspective
Von der Entscheidung für ein Spiel bis zur Wahl einer Taktik, um die Schlafenszeit hinauszuzögern - wiederholte Entscheidungen sind für Kinder allgegenwärtig. Zwei paradigmatische Entscheidungsphänomene sind probability matching (dt. Angleichen der Wahrscheinlichkeit) und Maximieren. Um Belohnungen zu maximieren, sollte eine Person ausschließlich die Option auswählen, welche die höchste Wahrscheinlichkeit hat. Maximieren wird allgemein al ökonomisch rationales Verhalten angesehen. Probability matching beschreibt, dass eine Person jede Option mit der Wahrscheinlichkeit auswählt, wie deren zugrunde liegende Wahrscheinlichkeit einer Belohnung ist. Ob es sich bei probability matching um einen Fehlschluss oder einen adaptiven Mechanismus handelt, ist umstritten. Frühere Forschung zu probabilistischem Lernen zeigte das paradoxe Ergebnis, dass jüngere Kinder eher maximieren als ältere Kinder. Von älteren Kindern nimmt man hingegen an, dass sie probability matchen. Dabei wurde jedoch kaum berücksichtigt, dass Kinder die Struktur der Umwelt zu ihrem Vorteil nutzen können. Diese Dissertation untersucht die inter- und intraindividuelle Entwicklung des probabilistischen Lernens in der Kindheit unter ökologischen und kognitiven Aspekten. Vier empirischen Kapitel zeigen, dass die Interaktion zwischen heranreifenden kognitiven Funktionen, sowie Merkmalen der Lern- und Entscheidungsumgebung die Entwicklung des adaptiven Entscheidungsverhaltens prägt. Die Entwicklung des probabilistischen Lernens durchläuft in der Kindheit mehrere Phasen: von hoher Persistenz, aber auch hoher interindividueller Variabilität bei jüngeren Kindern zu wachsender Anpassungsfähigkeit durch zunehmende Diversifizierung und Exploration bei älteren Kindern. Die Ergebnisse dieser Dissertation unterstreichen insbesondere den Nutzen einer ökologischen Rationalitätsperspektive bei der Erforschung der Entwicklung des Entscheidungsvermögens.From choosing which game to play to deciding how to effectively delay bedtime—making repeated choices is a ubiquitous part of childhood. Two often contrasted paradigmatic choice behaviors are probability matching and maximizing. Maximizing, described as consistently choosing the option with the highest reward probability, has traditionally been considered economically rational. Probability matching, in contrast, described by proportionately matching choices to underlying reward probabilities, is debated whether it reflects a mistake or an adaptive mechanism. Previous research on the development of probability learning and repeated choice revealed considerable change across childhood and reported the paradoxical finding that younger children are more likely to maximize—outperforming older children who are thought to be more likely to probability match. However, this line of research largely disregarded the mind’s ability to capitalize on the structure of the environment. In this dissertation, I investigate the inter- and intra-individual development of probability learning and repeated choice behavior in childhood under consideration of ecological, cognitive, and methodological aspects. Four empirical chapters demonstrate that the interaction between the maturing mind and characteristics of the learning and choice environment shapes the development of adaptive choice behavior. The development of probability learning and repeated choice behavior in childhood progresses from high persistence but also high inter-individual variability to emerging adaptivity marked by increased diversification and exploration. The present research highlights the benefit of taking an ecological rationality view in research on the development of decision making abilities
Decision tree learning for intelligent mobile robot navigation
The replication of human intelligence, learning and reasoning by means of computer
algorithms is termed Artificial Intelligence (Al) and the interaction of such
algorithms with the physical world can be achieved using robotics. The work described in
this thesis investigates the applications of concept learning (an approach which takes its
inspiration from biological motivations and from survival instincts in particular) to robot
control and path planning. The methodology of concept learning has been applied using
learning decision trees (DTs) which induce domain knowledge from a finite set of training
vectors which in turn describe systematically a physical entity and are used to train a robot
to learn new concepts and to adapt its behaviour.
To achieve behaviour learning, this work introduces the novel approach of hierarchical
learning and knowledge decomposition to the frame of the reactive robot architecture.
Following the analogy with survival instincts, the robot is first taught how to survive in
very simple and homogeneous environments, namely a world without any disturbances or
any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex
environments by adding further worlds to its existing knowledge. The repertoire of the
robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered
decision trees (DTs) accommodating a number of primitives. To classify robot perceptions,
control rules are synthesised using symbolic knowledge derived from searching the
hierarchy of DTs.
A second novel concept is introduced, namely that of multi-dimensional fuzzy associative
memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained
locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to
deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs.
In this thesis, the feasibility of the developed techniques is illustrated in the robot
applications, their benefits and drawbacks are discussed
Recommended from our members
Neurocognitive Mechanisms of Learning and Decision-Making in Adolescent-OCD: A Computational Approach
Early-onset obsessive-compulsive disorder (OCD) is substantially less researched than adult-OCD, resulting in prevalent equivocation surrounding the neurocognitive profile of child-OCD. Research
into this area is pivotal as population studies report that youths with OCD struggle significantly in
academic settings. In the General Introduction of this thesis, I reviewed existing literature and found that strikingly, young patients do not show impairment on features that are considered both hallmarks
of adult OCD and tightly linked to disorder symptomatology, such as response inhibition and cognitive flexibility. Among the characteristics that are thought to be present in children and adolescents with OCD are abnormal decision-making under uncertainty and impaired learning, and
I decided to focus on these features as they may be driving poor academic attainment in young people with the disorder. In addition, I sought to investigate other cognitive processes that have not been
well-researched in adolescent-OCD but are found to be robustly altered in adult OCD such as goal directed/model-based reasoning, meta-cognition, and feedback sensitivity. I aimed to delineate these various processes using a battery of suitably complex cognitive tasks. Moreover, I highlighted that majority of past studies fail to find differences between young patients and controls due to behavioural signatures being too subtle to be uncovered by standard statistical analyses. Hence, I
employed computational modelling of cognitive task data to disentangle latent decision-making processes displayed by adolescents with OCD.
In Chapter 2, I modelled data from the Wisconsin Card Sorting task, a frequently used paradigm of cognitive flexibility, and confirmed that youths with OCD show equivalent performance on the task
to controls. Only patients on serotonergic medication showed increased response latencies and a tendency to make unique errors (choosing a deck associated with no rule present on the test card).
Next, in Chapter 3, I sought to understand instrumental and Pavlovian learning, and whether adolescents with OCD show increased punishment sensitivity on a novel aversive Pavlovian-to Instrumental Transfer paradigm. Once again, patient performance was equivalent to that of controls. Hence, the remaining chapters were dedicated to probing behaviour on probabilistic paradigms.
In Chapter 4, I formally investigated model-based and model-free learning using a well-validated two step decision-making task, and fit a reinforcement learning drift diffusion model to both choice and
reaction time data. Patients showed increased exploration on the task as well as faster and more erratic decisions compared to controls. Nonetheless, model-based learning was equivalent between
groups. In the penultimate chapter, I demonstrate on a predictive-inference task that patients with OCD update their choices more frequently compared to controls independent of prediction error
magnitude. Finally, in Chapter 6, I administered a probabilistic reversal learning paradigm to a large sample of 50 adolescent patients and 53 matched controls. Standard analyses revealed a significant
reversal learning deficit in patients with OCD, wherein they displayed more errors and a lower propensity to repeat choices following positive feedback during the post-reversal phase. Crucially, computational modelling revealed striking group differences where adolescents with OCD displayed elevated reward learning and lower punishment learning, increased exploration, and decreased
perseveration compared to controls. In the General Discussion, I emphasise that atypical learning and decision-making in adolescent-OCD are more pronounced on probabilistic tasks, where task environments are more volatile. Results are partly discussed in the context of the uncertainty model of OCD, where subjective feelings of doubt experienced by patients drive compulsive behaviours
such as checking and certainty-seeking in daily life, alongside excessive exploration on probabilistic tasks. I also consider various explanations for cognitive distinctions between adult- and adolescent OCD. More general implications of the findings are discussed for understanding OCD in the context of adolescent development and for treatment/support strategies.WELLCOME TRUST (104631/Z/14/Z
Hierarchical control over effortful behavior by rodent medial frontal cortex : a computational model
The anterior cingulate cortex (ACC) has been the focus of intense research interest in recent years. Although separate theories relate ACC function variously to conflict monitoring, reward processing, action selection, decision making, and more, damage to the ACC mostly spares performance on tasks that exercise these functions, indicating that they are not in fact unique to the ACC. Further, most theories do not address the most salient consequence of ACC damage: impoverished action generation in the presence of normal motor ability. In this study we develop a computational model of the rodent medial prefrontal cortex that accounts for the behavioral sequelae of ACC damage, unifies many of the cognitive functions attributed to it, and provides a solution to an outstanding question in cognitive control research-how the control system determines and motivates what tasks to perform. The theory derives from recent developments in the formal study of hierarchical control and learning that highlight computational efficiencies afforded when collections of actions are represented based on their conjoint goals. According to this position, the ACC utilizes reward information to select tasks that are then accomplished through top-down control over action selection by the striatum. Computational simulations capture animal lesion data that implicate the medial prefrontal cortex in regulating physical and cognitive effort. Overall, this theory provides a unifying theoretical framework for understanding the ACC in terms of the pivotal role it plays in the hierarchical organization of effortful behavior
- …