Search CORE

2,333 research outputs found

Reinforcement learning and insight in the artificial pigeon

Author: Belpaeme Tony
Colin Thomas R.
Publication venue: Cognitive Science Society
Publication date: 01/01/2019
Field of study

The phenomenon of insight (also called "Aha!" or "Eureka!" moments) is considered a core component of creative cognition. It is also a puzzle and a challenge for statistics-based approaches to behavior such as associative learning and reinforcement learning. We simulate a classic experiment on insight in pigeons using deep Reinforcement Learning. We show that prior experience may produce large and rapid performance improvements reminiscent of insights, and we suggest theoretical connections between concepts from machine learning (such as the value function or overfitting) and concepts from psychology (such as feelings-of-warmth and the einstellung effect). However, the simulated pigeons were slower than the real pigeons at solving the test problem, requiring a greater amount of trial and error: their "insightful" behavior was sudden by comparison with learning from scratch, but slow by comparison with real pigeons. This leaves open the question of whether incremental improvements to reinforcement learning algorithms will be sufficient to produce insightful behavior

Ghent University Academic Bibliography

Archivsystem Ask23

Intentions and Creative Insights: a Reinforcement Learning Study of Creative Exploration in Problem-Solving

Author: Colin Thomas R.
Publication venue: 'University of Plymouth'
Publication date: 01/01/2020
Field of study

Insight is perhaps the cognitive phenomenon most closely associated with creativity. People engaged in problem-solving sometimes experience a sudden transformation: they see the problem in a radically different manner, and simultaneously feel with great certainty that they have found the right solution. The change of problem representation is called "restructuring", and the affective changes associated with sudden progress are called the "Aha!" experience. Together, restructuring and the "Aha!" experience characterize insight. Reinforcement Learning is both a theory of biological learning and a subfield of machine learning. In its psychological and neuroscientific guise, it is used to model habit formation, and, increasingly, executive function. In its artificial intelligence guise, it is currently the favored paradigm for modeling agents interacting with an environment. Reinforcement learning, I argue, can serve as a model of insight: its foundation in learning coincides with the role of experience in insight problem-solving; its use of an explicit "value" provides the basis for the "Aha!" experience; and finally, in a hierarchical form, it can achieve a sudden change of representation resembling restructuring. An experiment helps confirm some parallels between reinforcement learning and insight. It shows how transfer from prior tasks results in considerably accelerated learning, and how the value function increase resembles the sense of progress corresponding to the "Aha!"-moment. However, a model of insight on the basis of hierarchical reinforcement learning did not display the expected "insightful" behavior. A second model of insight is presented, in which temporal abstraction is based on self-prediction: by predicting its own future decisions, an agent adjusts its course of action on the basis of unexpected events. This kind of temporal abstraction, I argue, corresponds to what we call "intentions", and offers a promising model for biological insight. It explains the "Aha!" experience as resulting from a temporal difference error, whereas restructuring results from an adjustment of the agent's internal state on the basis of either new information or a stochastic interpretation of stimuli. The model is called the actor-critic-intention (ACI) architecture. Finally, the relationship between intentions, insight, and creativity is extensively discussed in light of these models: other works in the philosophical and scientific literature are related to, and sometimes illuminated by the ACI architecture

Ghent University Academic Bibliography

Plymouth Electronic Archive and Research Library

Pavlovian, Skinner, and Other Behaviourists' Contributions to AI

Author: Kosinski Withold
Zaczek-Chrzanowska Dominika
Publication venue
Publication date
Field of study

A version of the definition of intelligent behaviour will be supplied in the context of real and artificial systems. Short presentation of principles of learning, starting with Pavlovian s classical conditioning through reinforced response and operant conditioning of Thorndike and Skinner and finishing with cognitive learning of Tolman and Bandura will be given. The most important figures within behaviourism, especially those with contribution to AI, will be described. Some tools of artificial intelligence that act according to those principles will be presented. An attempt will be made to show when some simple rules for behaviour modifications can lead to a complex intelligent behaviour

NASA Technical Reports Server

Categories, concepts, and calls : auditory perceptual mechanisms and cognitive abilities across different types of birds.

Author: Cook Robert
Guillette Lauren
Hahn Allison
Hoeschele Marisa
Sturdy Christopher
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2013
Field of study

Although involving different animals, preparations, and objectives, our laboratories (Sturdy's and Cook's) are mutually interested in category perception and concept formation. The Sturdy laboratory has a history of studying perceptual categories in songbirds, while Cook laboratory has a history of studying abstract concept formation in pigeons. Recently, we undertook a suite of collaborative projects to combine our investigations to examine abstract concept formation in songbirds, and perception of songbird vocalizations in pigeons. This talk will include our recent findings of songbird category perception, songbird abstract concept formation (same/different task), and early results from pigeons' processing of songbird vocalizations in a same/different task. Our findings indicate that (1) categorization in birds seems to be most heavily influenced by acoustic, rather than genetic or experiential factors (2) songbirds treat their vocalizations as perceptual categories, both at the level of the note and species/whole call, (3) chickadees, like pigeons, can perceive abstract, same-different relations, and (4) pigeons are not as good at discriminating chickadee vocalizations as songbirds (chickadees and finches). Our findings suggest that although there are commonalities in complex auditory processing among birds, there are potentially important comparative differences between songbirds and non-songbirds in their treatment of certain types of auditory objects.Publisher PD

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Cooperation in the iterated prisoner's dilemma is learned by operant conditioning mechanisms

Author: Gutnisky D. A.
Zanutto Bonifacio Silvano
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2004
Field of study

The prisoner's dilemma (PD) is the leading metaphor for the evolution of cooperative behavior in populations of selfish agents. Although cooperation in the iterated prisoner's dilemma (IPD) has been studied for over twenty years, most of this research has been focused on strategies that involve nonlearned behavior. Another approach is to suppose that players' selection of the preferred reply might he enforced in the same way as feeding animals track the best way to feed in changing nonstationary environments. Learning mechanisms such as operant conditioning enable animals to acquire relevant characteristics of their environment in order to get reinforcements and to avoid punishments. In this study, the role of operant conditioning in the learning of cooperation was evaluated in the PD. We found that operant mechanisms allow the learning of IPD play against other strategies. When random moves are allowed in the game, the operant learning model showed low sensitivity. On the basis of this evidence, it is suggested that operant learning might be involved in reciprocal altruism.Fil: Gutnisky, D. A.. Universidad de Buenos Aires. Facultad de Ingenieria. Instituto de Ingeniería Biomédica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; ArgentinaFil: Zanutto, Bonifacio Silvano. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; Argentina. Universidad de Buenos Aires. Facultad de Ingenieria. Instituto de Ingeniería Biomédica; Argentin

CONICET Digital

Biological learning and artificial intelligence

Author: Balkenius Christian
Publication venue
Publication date: 01/01/1994
Field of study

It was once taken for granted that learning in animals and man could be explained with a simple set of general learning rules, but over the last hundred years, a substantial amount of evidence has been accumulated that points in a quite different direction. In animal learning theory, the laws of learning are no longer considered general. Instead, it has been necessary to explain behaviour in terms of a large set of interacting learning mechanisms and innate behaviours. Artificial intelligence is now on the edge of making the transition from general theories to a view of intelligence that is based on anamalgamate of interacting systems. In the light of the evidence from animal learning theory, such a transition is to be highly desired

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Search, navigation and foraging: an optimal decision-making perspective

Author: Adorisio Matteo
Publication venue: place:Trieste
Publication date: 29/10/2018
Field of study

Behavior in its general form can be defined as a mapping between sensory inputs and a pattern of motor actions that are used to achieve a goal. Reinforcement learning in the last years emerged as a general framework to analyze behavior in its general definition. In this thesis exploiting the techniques of reinforcement learning we study several phenomena that can be classified as search, navigation and foraging behaviors. Regarding the search aspect we analyze random walks forced to reach a target in a confined region of the space. In this case we can solve analytically the problem that allows to find a very efficient way to generate such walks. The navigation problem is inspired by olfactory navigation in homing pigeons. In this case we propose an algorithm to navigate a noisy environment relying only on local signals. The foraging instead is analyzed starting from the observation that fossil traces show the evolution of foraging strategies towards highly compact and self-avoiding trajectories. We show how this optimal behavior can emerge in the reinforcement learning framework

Sissa Digital Library

A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

Author: Alberto Maria Metelli
Gianluca Drappo
Marcello Restelli
Publication venue
Publication date: 01/01/2023
Field of study

Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the option framework, previous works have conceived provably efficient algorithms for the case in which the options are *fixed* and the high-level policy selecting among options only has to be learned. However, the fully realistic scenario in which both the high-level and the low-level policies are learned is surprisingly disregarded from a theoretical perspective. This work makes a step towards the understanding of this latter scenario. Focusing on the finite-horizon problem, in this paper, we propose a novel meta-algorithm that alternates between two regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we look at the problem as a Semi-Markov Decision Process (SMDP), keeping the low-level policies fixed, while at a lower level, we learn the inner option policies by keeping the high-level policy fixed. Then, we specialize the results for a specific choice of algorithms, where we propose a novel provably efficient algorithm for the finite-horizon SMDPs, and we use a state-of-the-art regret minimizer for the options learning. We compare the bounds derived with those of state-of-the-art regret minimization algorithms for non-hierarchical finite-horizon problems. The comparison allows us to characterize the class of problems in which a hierarchical approach is provably preferable, even when a set of pre-trained options is not given

Archivio istituzionale della ricerca - Politecnico di Milano