3 research outputs found

    Evolution of Memory in Reactive Artificial Neural Networks

    Get PDF
    In the neuronal circuits of natural and artificial agents, memory is usually implemented with recurrent connections, since recurrence allows past agent state to affect the present, on-going behavior. Here, an interesting question arises in the context of evolution: how reactive agents could have evolved into cognitive ones with internalized memory? This study strives to find an answer to the question by simulating neuroevolution on artificial neural networks, with the hypothesis that internalization of external material interaction can be a plausible evolutionary path leading to a fully internalized memory system. A series of computational experiments were performed to gradually verify the above hypothesis. The first experiment demonstrated the possibility that external materials can be used as memory-aids for a memoryless reactive artificial agents in a simple 1-dimensional environment. Here, the reactive artificial agents used environmental markers as memory references to be successful in the ball-catching task that requires memory. Motivated by the result of the first experiment, an extended experiment was conducted to tackle a more complex memory problem using the same principle of external material interaction. This time, the reactive artificial agents are tasked to remember the locations of food items and the nest in a 2-dimensional environment. Such path-following behavior is a trivial foraging strategy of various lower animals such as ants and fish. The final experiment was designed to show the evolution of internal recurrence. In this experiment, I showed the evolutionary advantage of external material interaction by comparing the results from neural network topology evolution algorithms with and without the material interaction mechanism. The result confirmed that the agents with external material interaction learned to solve the memory task faster and more accurately. The results of the experiments provide insights on the possible evolutionary route to an internalized memory. The use of external material interaction can help reactive artificial agents to go beyond the functionality restricted by their simple network structure. Moreover, it allows much faster convergence with higher accuracy than the topological evolution of the artificial agents. These results suggest one plausible evolutionary path from reactive, through external material interaction, to recurrent structure

    Exploiting Multimodal Information in Deep Learning

    Get PDF
    Humans are good at using multimodal information to perceive and to interact with the world. Such information includes visual, auditory, kinesthetic, etc. Despite the advancement in deep learning using single modality in the past decade, there are relatively fewer works focused on multimodal learning. Even with existing multimodal deep learning works, most of them focus on a small number of modalities. This dissertation will investigate various distinct forms of multi-modal learning: multiple visual modalities as input, audio-visual multimodal input, and visual and proprioceptive (kinesthetic) multimodal input. Specifically, in the first project we investigate synthesizing light fields from a single image and estimated depth. In the second project, we investigate face recognition for unconstrained videos with audio-visual multimodal inputs. Finally, we investigate learning to construct and use tools with visual, proprioceptive and kinesthetic multimodal inputs. In the first task, we investigate synthesizing light fields with a single RGB image and its estimated depth. Synthesizing novel views (light fields) from a single image is very challenging since the depth information is lost, and depth information is crucial for view synthesis. We propose to use a pre-trained model to estimate the depth, and then fuse the depth information together with the RGB image to generate the light fields. Our experiments showed that multimodal input (RGB image and depth) significantly improved the performance over the single image input. In the second task, we focus on learning face recognition for low quality videos. For low quality videos such as low-resolution online videos and surveillance videos, recognizing faces based on video frames alone is very challenging. We propose to use audio information in the video clip to aid in the face recognition task. To achieve this goal, we propose Audio-Visual Aggregation Network (AVAN) to aggregate audio features and visual features using an attention mechanism. Empirical results show that our approach using both visual and audio information significantly improves the face recognition accuracy on unconstrained videos. Finally, in the third task, we propose to use visual, proprioceptive and kinesthetic inputs to learn to construct and use tools. The use of tools in animals indicates high levels of cognitive capability, and, aside from humans, it is observed only in a small number of higher mammals and avian species, and constructing novel tools is an even more challenging task. Learning this task with only visual input is challenging, therefore, we propose to use visual and proprioceptive (kinesthetic) inputs to accelerate the learning. We build a physically simulated environment for tool construction task. We also introduce a hierarchical reinforcement learning approach to learn to construct tools and reach the target, without any prior knowledge. The main contribution of this dissertation is in the investigation of multiple scenarios where multimodal processing leads to enhanced performance. We expect the specific methods developed in this work, such as the extraction of hidden modalities (depth), use of attention, and hierarchical rewards, to help us better understand multimodal processing in deep learning
    corecore