72,274 research outputs found
Circuit motifs for sensory integration, learning, and the initiation of adaptive behavior in Drosophila
Goal-directed behavior is crucial for survival in complex, dynamic environments. It requires the detection of relevant sensory stimuli and the formation of separable neuronal representations. Learning the contingencies of these sensory stimuli with innately positive or negative valent stimuli (reinforcement) forms associations, allowing the former to cue the latter. This yields cue-based predictions to upgrade the behavioral repertoire from reactive to anticipatory. In this thesis, the Trias of sensory integration, learning of contingencies, and the initiation of anticipatory behavior are studied in the framework of the fruit fly Drosophila olfactory pathway and mushroom body, a higher-order brain center for integrating sensory input and coincidence detection using computational network models representing the mushroom body architecture with varying degrees of abstraction. Additionally, simulations of larval locomotion were employed to investigate how the output of the mushroom body relates to behavior and to foster comparability with animal experiments. We showed that inhibitory feedback within the mushroom body produces sparse stimulus representations, increasing the separability of different sensory stimuli. This separability reduced reinforcement generalization in learning experiments through the decreased overlap of stimulus representations. Furthermore, we showed that feedback from the valence-signaling output to the reinforcement-signaling dopaminergic neurons that innervate the mushroom body could explain experimentally observed temporal dynamics of the formation of associations between sensory cues and reinforcement. This supports the hypothesis that dopaminergic neurons encode the difference between predicted and received reinforcement, which in turn drives the learning process. These dopaminergic neurons have also been argued to convey an indirect reinforcement signal in second-order learning experiments. A new sensory cue is paired with an already established one that activates dopaminergic neurons due to its association with the reinforcement. We demonstrated how different pathways for feedforward or feedback input from the mushroom body’s intrinsic or output neurons can provide an indirect reinforcement signal to the dopaminergic neurons. Any direct or indirect association of sensory cues with reinforcement yielded a reinforcement expectation, biasing the fly’s behavioral response towards the approach or avoidance of the respective sensory cue. We then showed that the simulated locomotory behavior of individual animals in a virtual environment depends on the biasing output of the mushroom body. In conclusion, our results contribute to understanding the implementation of mechanisms for separable stimulus representations, postulated key features of associative learning, and the link between MB output and adaptive behavior in the mushroom body and confirm their explanatory power for animal behavior
Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules
This paper proposes Cooperative and competitive Reinforcement And Imitation Learning (CRAIL) for selecting an appropriate policy from a set of multiple heterogeneous modules and training all of them in parallel. Each learning module has its own network architecture and improves the policy based on an off-policy reinforcement learning algorithm and behavior cloning from samples collected by a behavior policy that is constructed by a combination of all the policies. Since the mixing weights are determined by the performance of the module, a better policy is automatically selected based on the learning progress. Experimental results on a benchmark control task show that CRAIL successfully achieves fast learning by allowing modules with complicated network structures to exploit task-relevant samples for training
Compiler-assisted Adaptive Program Scheduling in big.LITTLE Systems
Energy-aware architectures provide applications with a mix of low (LITTLE)
and high (big) frequency cores. Choosing the best hardware configuration for a
program running on such an architecture is difficult, because program parts
benefit differently from the same hardware configuration. State-of-the-art
techniques to solve this problem adapt the program's execution to dynamic
characteristics of the runtime environment, such as energy consumption and
throughput. We claim that these purely dynamic techniques can be improved if
they are aware of the program's syntactic structure. To support this claim, we
show how to use the compiler to partition source code into program phases:
regions whose syntactic characteristics lead to similar runtime behavior. We
use reinforcement learning to map pairs formed by a program phase and a
hardware state to the configuration that best fit this setup. To demonstrate
the effectiveness of our ideas, we have implemented the Astro system. Astro
uses Q-learning to associate syntactic features of programs with hardware
configurations. As a proof of concept, we provide evidence that Astro
outperforms GTS, the ARM-based Linux scheduler tailored for heterogeneous
architectures, on the parallel benchmarks from Rodinia and Parsec
Intentions and Creative Insights: a Reinforcement Learning Study of Creative Exploration in Problem-Solving
Insight is perhaps the cognitive phenomenon most closely associated with creativity. People engaged in problem-solving sometimes experience a sudden transformation: they see the problem in a radically different manner, and simultaneously feel with great certainty that they have found the right solution. The change of problem representation is called "restructuring", and the affective changes associated with sudden progress are called the "Aha!" experience. Together, restructuring and the "Aha!" experience characterize insight.
Reinforcement Learning is both a theory of biological learning and a subfield of machine learning. In its psychological and neuroscientific guise, it is used to model habit formation, and, increasingly, executive function. In its artificial intelligence guise, it is currently the favored paradigm for modeling agents interacting with an environment. Reinforcement learning, I argue, can serve as a model of insight: its foundation in learning coincides with the role of experience in insight problem-solving; its use of an explicit "value" provides the basis for the "Aha!" experience; and finally, in a hierarchical form, it can achieve a sudden change of representation resembling restructuring.
An experiment helps confirm some parallels between reinforcement learning and insight. It shows how transfer from prior tasks results in considerably accelerated learning, and how the value function increase resembles the sense of progress corresponding to the "Aha!"-moment. However, a model of insight on the basis of hierarchical reinforcement learning did not display the expected "insightful" behavior.
A second model of insight is presented, in which temporal abstraction is based on self-prediction: by predicting its own future decisions, an agent adjusts its course of action on the basis of unexpected events. This kind of temporal abstraction, I argue, corresponds to what we call "intentions", and offers a promising model for biological insight. It explains the "Aha!" experience as resulting from a temporal difference error, whereas restructuring results from an adjustment of the agent's internal state on the basis of either new information or a stochastic interpretation of stimuli. The model is called the actor-critic-intention (ACI) architecture.
Finally, the relationship between intentions, insight, and creativity is extensively discussed in light of these models: other works in the philosophical and scientific literature are related to, and sometimes illuminated by the ACI architecture
Q Learning Behavior on Autonomous Navigation of Physical Robot
Behavior based architecture gives robot fast and reliable action. If there are many behaviors in robot, behavior coordination is needed. Subsumption architecture is behavior coordination method that give quick and robust response. Learning mechanism improve robot’s performance in handling uncertainty. Q learning is popular reinforcement learning method that has been used in robot learning because it is simple, convergent and off
policy. In this paper, Q learning will be used as learning mechanism for obstacle avoidance behavior in autonomous robot navigation. Learning rate of Q learning affect robot’s performance in learning phase. As the result,
Q learning algorithm is successfully implemented in a physical robot with its imperfect environment
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework
In this paper, we argue that the future of Artificial Intelligence research
resides in two keywords: integration and embodiment. We support this claim by
analyzing the recent advances of the field. Regarding integration, we note that
the most impactful recent contributions have been made possible through the
integration of recent Machine Learning methods (based in particular on Deep
Learning and Recurrent Neural Networks) with more traditional ones (e.g.
Monte-Carlo tree search, goal babbling exploration or addressable memory
systems). Regarding embodiment, we note that the traditional benchmark tasks
(e.g. visual classification or board games) are becoming obsolete as
state-of-the-art learning algorithms approach or even surpass human performance
in most of them, having recently encouraged the development of first-person 3D
game platforms embedding realistic physics. Building upon this analysis, we
first propose an embodied cognitive architecture integrating heterogenous
sub-fields of Artificial Intelligence into a unified framework. We demonstrate
the utility of our approach by showing how major contributions of the field can
be expressed within the proposed framework. We then claim that benchmarking
environments need to reproduce ecologically-valid conditions for bootstrapping
the acquisition of increasingly complex cognitive skills through the concept of
a cognitive arms race between embodied agents.Comment: Updated version of the paper accepted to the ICDL-Epirob 2017
conference (Lisbon, Portugal
- …