8,396 research outputs found

    MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning

    Full text link
    Most meta reinforcement learning (meta-RL) methods learn to adapt to new tasks by directly optimizing the parameters of policies over primitive action space. Such algorithms work well in tasks with relatively slight difference. However, when the task distribution becomes wider, it would be quite inefficient to directly learn such a meta-policy. In this paper, we propose a new meta-RL algorithm called Meta Goal-generation for Hierarchical RL (MGHRL). Instead of directly generating policies over primitive action space for new tasks, MGHRL learns to generate high-level meta strategies over subgoals given past experience and leaves the rest of how to achieve subgoals as independent RL subtasks. Our empirical results on several challenging simulated robotics environments show that our method enables more efficient and generalized meta-learning from past experience.Comment: Accepted to the ICLR 2020 workshop: Beyond tabula rasa in RL (BeTR-RL

    What is Strategic Competence and Does it Matter? Exposition of the Concept and a Research Agenda

    Get PDF
    Drawing on a range of theoretical and empirical insights from strategic management and the cognitive and organizational sciences, we argue that strategic competence constitutes the ability of organizations and the individuals who operate within them to work within their cognitive limitations in such a way that they are able to maintain an appropriate level of responsiveness to the contingencies confronting them. Using the language of the resource based view of the firm, we argue that this meta-level competence represents a confluence of individual and organizational characteristics, suitably configured to enable the detection of those weak signals indicative of the need for change and to act accordingly, thereby minimising the dangers of cognitive bias and cognitive inertia. In an era of unprecedented informational burdens and instability, we argue that this competence is central to the longer-term survival and well being of the organization. We conclude with a consideration of the major scientific challenges that lie ahead, if the ideas contained within this paper are to be validated

    An Ecological and Longitudinal Perspective

    Get PDF
    Von der Entscheidung für ein Spiel bis zur Wahl einer Taktik, um die Schlafenszeit hinauszuzögern - wiederholte Entscheidungen sind für Kinder allgegenwärtig. Zwei paradigmatische Entscheidungsphänomene sind probability matching (dt. Angleichen der Wahrscheinlichkeit) und Maximieren. Um Belohnungen zu maximieren, sollte eine Person ausschließlich die Option auswählen, welche die höchste Wahrscheinlichkeit hat. Maximieren wird allgemein al ökonomisch rationales Verhalten angesehen. Probability matching beschreibt, dass eine Person jede Option mit der Wahrscheinlichkeit auswählt, wie deren zugrunde liegende Wahrscheinlichkeit einer Belohnung ist. Ob es sich bei probability matching um einen Fehlschluss oder einen adaptiven Mechanismus handelt, ist umstritten. Frühere Forschung zu probabilistischem Lernen zeigte das paradoxe Ergebnis, dass jüngere Kinder eher maximieren als ältere Kinder. Von älteren Kindern nimmt man hingegen an, dass sie probability matchen. Dabei wurde jedoch kaum berücksichtigt, dass Kinder die Struktur der Umwelt zu ihrem Vorteil nutzen können. Diese Dissertation untersucht die inter- und intraindividuelle Entwicklung des probabilistischen Lernens in der Kindheit unter ökologischen und kognitiven Aspekten. Vier empirischen Kapitel zeigen, dass die Interaktion zwischen heranreifenden kognitiven Funktionen, sowie Merkmalen der Lern- und Entscheidungsumgebung die Entwicklung des adaptiven Entscheidungsverhaltens prägt. Die Entwicklung des probabilistischen Lernens durchläuft in der Kindheit mehrere Phasen: von hoher Persistenz, aber auch hoher interindividueller Variabilität bei jüngeren Kindern zu wachsender Anpassungsfähigkeit durch zunehmende Diversifizierung und Exploration bei älteren Kindern. Die Ergebnisse dieser Dissertation unterstreichen insbesondere den Nutzen einer ökologischen Rationalitätsperspektive bei der Erforschung der Entwicklung des Entscheidungsvermögens.From choosing which game to play to deciding how to effectively delay bedtime—making repeated choices is a ubiquitous part of childhood. Two often contrasted paradigmatic choice behaviors are probability matching and maximizing. Maximizing, described as consistently choosing the option with the highest reward probability, has traditionally been considered economically rational. Probability matching, in contrast, described by proportionately matching choices to underlying reward probabilities, is debated whether it reflects a mistake or an adaptive mechanism. Previous research on the development of probability learning and repeated choice revealed considerable change across childhood and reported the paradoxical finding that younger children are more likely to maximize—outperforming older children who are thought to be more likely to probability match. However, this line of research largely disregarded the mind’s ability to capitalize on the structure of the environment. In this dissertation, I investigate the inter- and intra-individual development of probability learning and repeated choice behavior in childhood under consideration of ecological, cognitive, and methodological aspects. Four empirical chapters demonstrate that the interaction between the maturing mind and characteristics of the learning and choice environment shapes the development of adaptive choice behavior. The development of probability learning and repeated choice behavior in childhood progresses from high persistence but also high inter-individual variability to emerging adaptivity marked by increased diversification and exploration. The present research highlights the benefit of taking an ecological rationality view in research on the development of decision making abilities

    Decision tree learning for intelligent mobile robot navigation

    Get PDF
    The replication of human intelligence, learning and reasoning by means of computer algorithms is termed Artificial Intelligence (Al) and the interaction of such algorithms with the physical world can be achieved using robotics. The work described in this thesis investigates the applications of concept learning (an approach which takes its inspiration from biological motivations and from survival instincts in particular) to robot control and path planning. The methodology of concept learning has been applied using learning decision trees (DTs) which induce domain knowledge from a finite set of training vectors which in turn describe systematically a physical entity and are used to train a robot to learn new concepts and to adapt its behaviour. To achieve behaviour learning, this work introduces the novel approach of hierarchical learning and knowledge decomposition to the frame of the reactive robot architecture. Following the analogy with survival instincts, the robot is first taught how to survive in very simple and homogeneous environments, namely a world without any disturbances or any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex environments by adding further worlds to its existing knowledge. The repertoire of the robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered decision trees (DTs) accommodating a number of primitives. To classify robot perceptions, control rules are synthesised using symbolic knowledge derived from searching the hierarchy of DTs. A second novel concept is introduced, namely that of multi-dimensional fuzzy associative memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs. In this thesis, the feasibility of the developed techniques is illustrated in the robot applications, their benefits and drawbacks are discussed

    Hierarchical control over effortful behavior by rodent medial frontal cortex : a computational model

    Get PDF
    The anterior cingulate cortex (ACC) has been the focus of intense research interest in recent years. Although separate theories relate ACC function variously to conflict monitoring, reward processing, action selection, decision making, and more, damage to the ACC mostly spares performance on tasks that exercise these functions, indicating that they are not in fact unique to the ACC. Further, most theories do not address the most salient consequence of ACC damage: impoverished action generation in the presence of normal motor ability. In this study we develop a computational model of the rodent medial prefrontal cortex that accounts for the behavioral sequelae of ACC damage, unifies many of the cognitive functions attributed to it, and provides a solution to an outstanding question in cognitive control research-how the control system determines and motivates what tasks to perform. The theory derives from recent developments in the formal study of hierarchical control and learning that highlight computational efficiencies afforded when collections of actions are represented based on their conjoint goals. According to this position, the ACC utilizes reward information to select tasks that are then accomplished through top-down control over action selection by the striatum. Computational simulations capture animal lesion data that implicate the medial prefrontal cortex in regulating physical and cognitive effort. Overall, this theory provides a unifying theoretical framework for understanding the ACC in terms of the pivotal role it plays in the hierarchical organization of effortful behavior
    corecore