    Circuit motifs for sensory integration, learning, and the initiation of adaptive behavior in Drosophila

    Goal-directed behavior is crucial for survival in complex, dynamic environments. It requires the detection of relevant sensory stimuli and the formation of separable neuronal representations. Learning the contingencies of these sensory stimuli with innately positive or negative valent stimuli (reinforcement) forms associations, allowing the former to cue the latter. This yields cue-based predictions to upgrade the behavioral repertoire from reactive to anticipatory. In this thesis, the Trias of sensory integration, learning of contingencies, and the initiation of anticipatory behavior are studied in the framework of the fruit fly Drosophila olfactory pathway and mushroom body, a higher-order brain center for integrating sensory input and coincidence detection using computational network models representing the mushroom body architecture with varying degrees of abstraction. Additionally, simulations of larval locomotion were employed to investigate how the output of the mushroom body relates to behavior and to foster comparability with animal experiments. We showed that inhibitory feedback within the mushroom body produces sparse stimulus representations, increasing the separability of different sensory stimuli. This separability reduced reinforcement generalization in learning experiments through the decreased overlap of stimulus representations. Furthermore, we showed that feedback from the valence-signaling output to the reinforcement-signaling dopaminergic neurons that innervate the mushroom body could explain experimentally observed temporal dynamics of the formation of associations between sensory cues and reinforcement. This supports the hypothesis that dopaminergic neurons encode the difference between predicted and received reinforcement, which in turn drives the learning process. These dopaminergic neurons have also been argued to convey an indirect reinforcement signal in second-order learning experiments. A new sensory cue is paired with an already established one that activates dopaminergic neurons due to its association with the reinforcement. We demonstrated how different pathways for feedforward or feedback input from the mushroom body’s intrinsic or output neurons can provide an indirect reinforcement signal to the dopaminergic neurons. Any direct or indirect association of sensory cues with reinforcement yielded a reinforcement expectation, biasing the fly’s behavioral response towards the approach or avoidance of the respective sensory cue. We then showed that the simulated locomotory behavior of individual animals in a virtual environment depends on the biasing output of the mushroom body. In conclusion, our results contribute to understanding the implementation of mechanisms for separable stimulus representations, postulated key features of associative learning, and the link between MB output and adaptive behavior in the mushroom body and confirm their explanatory power for animal behavior

    Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules

    This paper proposes Cooperative and competitive Reinforcement And Imitation Learning (CRAIL) for selecting an appropriate policy from a set of multiple heterogeneous modules and training all of them in parallel. Each learning module has its own network architecture and improves the policy based on an off-policy reinforcement learning algorithm and behavior cloning from samples collected by a behavior policy that is constructed by a combination of all the policies. Since the mixing weights are determined by the performance of the module, a better policy is automatically selected based on the learning progress. Experimental results on a benchmark control task show that CRAIL successfully achieves fast learning by allowing modules with complicated network structures to exploit task-relevant samples for training

    Compiler-assisted Adaptive Program Scheduling in big.LITTLE Systems

    Energy-aware architectures provide applications with a mix of low (LITTLE) and high (big) frequency cores. Choosing the best hardware configuration for a program running on such an architecture is difficult, because program parts benefit differently from the same hardware configuration. State-of-the-art techniques to solve this problem adapt the program's execution to dynamic characteristics of the runtime environment, such as energy consumption and throughput. We claim that these purely dynamic techniques can be improved if they are aware of the program's syntactic structure. To support this claim, we show how to use the compiler to partition source code into program phases: regions whose syntactic characteristics lead to similar runtime behavior. We use reinforcement learning to map pairs formed by a program phase and a hardware state to the configuration that best fit this setup. To demonstrate the effectiveness of our ideas, we have implemented the Astro system. Astro uses Q-learning to associate syntactic features of programs with hardware configurations. As a proof of concept, we provide evidence that Astro outperforms GTS, the ARM-based Linux scheduler tailored for heterogeneous architectures, on the parallel benchmarks from Rodinia and Parsec

    Intentions and Creative Insights: a Reinforcement Learning Study of Creative Exploration in Problem-Solving

    Insight is perhaps the cognitive phenomenon most closely associated with creativity. People engaged in problem-solving sometimes experience a sudden transformation: they see the problem in a radically different manner, and simultaneously feel with great certainty that they have found the right solution. The change of problem representation is called "restructuring", and the affective changes associated with sudden progress are called the "Aha!" experience. Together, restructuring and the "Aha!" experience characterize insight. Reinforcement Learning is both a theory of biological learning and a subfield of machine learning. In its psychological and neuroscientific guise, it is used to model habit formation, and, increasingly, executive function. In its artificial intelligence guise, it is currently the favored paradigm for modeling agents interacting with an environment. Reinforcement learning, I argue, can serve as a model of insight: its foundation in learning coincides with the role of experience in insight problem-solving; its use of an explicit "value" provides the basis for the "Aha!" experience; and finally, in a hierarchical form, it can achieve a sudden change of representation resembling restructuring. An experiment helps confirm some parallels between reinforcement learning and insight. It shows how transfer from prior tasks results in considerably accelerated learning, and how the value function increase resembles the sense of progress corresponding to the "Aha!"-moment. However, a model of insight on the basis of hierarchical reinforcement learning did not display the expected "insightful" behavior. A second model of insight is presented, in which temporal abstraction is based on self-prediction: by predicting its own future decisions, an agent adjusts its course of action on the basis of unexpected events. This kind of temporal abstraction, I argue, corresponds to what we call "intentions", and offers a promising model for biological insight. It explains the "Aha!" experience as resulting from a temporal difference error, whereas restructuring results from an adjustment of the agent's internal state on the basis of either new information or a stochastic interpretation of stimuli. The model is called the actor-critic-intention (ACI) architecture. Finally, the relationship between intentions, insight, and creativity is extensively discussed in light of these models: other works in the philosophical and scientific literature are related to, and sometimes illuminated by the ACI architecture

    Q Learning Behavior on Autonomous Navigation of Physical Robot

    Behavior based architecture gives robot fast and reliable action. If there are many behaviors in robot, behavior coordination is needed. Subsumption architecture is behavior coordination method that give quick and robust response. Learning mechanism improve robot’s performance in handling uncertainty. Q learning is popular reinforcement learning method that has been used in robot learning because it is simple, convergent and off policy. In this paper, Q learning will be used as learning mechanism for obstacle avoidance behavior in autonomous robot navigation. Learning rate of Q learning affect robot’s performance in learning phase. As the result, Q learning algorithm is successfully implemented in a physical robot with its imperfect environment

    Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

    In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.Comment: Updated version of the paper accepted to the ICDL-Epirob 2017 conference (Lisbon, Portugal
