58 research outputs found

    A Deep Incremental Boltzmann Machine for Modeling Context in Robots

    Get PDF
    Context is an essential capability for robots that are to be as adaptive as possible in challenging environments. Although there are many context modeling efforts, they assume a fixed structure and number of contexts. In this paper, we propose an incremental deep model that extends Restricted Boltzmann Machines. Our model gets one scene at a time, and gradually extends the contextual model when necessary, either by adding a new context or a new context layer to form a hierarchy. We show on a scene classification benchmark that our method converges to a good estimate of the contexts of the scenes, and performs better or on-par on several tasks compared to other incremental models or non-incremental models.Comment: 6 pages, 5 figures, International Conference on Robotics and Automation (ICRA 2018

    CINet: A Learning Based Approach to Incremental Context Modeling in Robots

    Get PDF
    There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98\% testing accuracy) learns to predict when to increment, and demonstrate, in a scene modeling problem (where the correct number of contexts is not known), that the robot increments the number of contexts in an expected manner (i.e., the entropy of the system is reduced). We also present how the incremental model can be used for various scene reasoning tasks.Comment: The first two authors have contributed equally, 6 pages, 8 figures, International Conference on Intelligent Robots (IROS 2018

    Methods and Apparatus for Autonomous Robotic Control

    Get PDF
    Sensory processing of visual, auditory, and other sensor information (e.g., visual imagery, LIDAR, RADAR) is conventionally based on "stovepiped," or isolated processing, with little interactions between modules. Biological systems, on the other hand, fuse multi-sensory information to identify nearby objects of interest more quickly, more efficiently, and with higher signal-to-noise ratios. Similarly, examples of the OpenSense technology disclosed herein use neurally inspired processing to identify and locate objects in a robot's environment. This enables the robot to navigate its environment more quickly and with lower computational and power requirements

    Computational principles for an autonomous active vision system

    Full text link
    Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision. The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions. Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives. The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone. Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models

    A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware

    Full text link
    Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement towards native spike-based learning has been the use of approximate Backpropagation using spike-time dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases for gradient and weight update calculation necessitates external memory and computational access. This is a challenge for standard neuromorphic hardware implementations. In this paper, we propose a stochastic SNN based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. We show that our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that the well-performing softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN with a 2-layer network shows excellent generalization through comparable performance to ANNs with equivalent architecture and regularization parameters on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and temporally encoded image datasets like Neuromorphic MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware

    Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter

    Get PDF
    Social communication is the use of language in social contexts. It encompasses social interaction, social cognition, pragmatics, and language processing” [3]. One presumed prerequisite of social communication is visual attention–the focus of this work. “Visual attention is a process that directs a tiny fraction of the information arriving at primary visual cortex to high-level centers involved in visual working memory and pattern recognition” [7]. This process involves the integration of two streams: the global and local streams; the global stream rapidly processes the scene, and the local stream processes details. This integration is important to social communication in that attending to both the global and local features of a scene are necessary to grasp the overall meaning. For people with autism spectrum disorder (ASD), the integration of these two streams can be disrupted by the tendency to privilege details (local processing) over seeing the big picture (global processing) [66]. Consequently, people with ASD may have challenges integrating visual attention, which may disrupt their social communication. This doctoral work explores the hypothesis that visual attention can be redirected to the features of an image that contain holistic information about a scene, which when highlighted might enable people with ASD to see the forest as well as the trees (i.e., seeing a scene as a whole rather than parts). The focuses are on 1) designing a global filter that can shift visual attention from local details to global features, and 2) evaluating the performance of a global filter by leveraging eye-tracking technology. This doctoral work manipulates visual stimuli in an effort to shift the visual attention of people with ASD. This doctoral work includes two development life cycles (i.e., design, develop, evaluate): 1) low-fidelity filter, and 2) high-fidelity filter. The low-fidelity filter life cycle includes the design of four low-fidelity filters for an initial experiment which was tested with an adult participant with ASD. The performance of each filter was evaluated by using verbal responses and eye-tracking data in terms of visual analysis, fixation analysis, and saccade analysis. The results from this cycle informed the decision for designing a high-fidelity filter in the next development life cycle. In this second cycle, ten children with ASD participated in the experiment. The performance of the high-fidelity filter was evaluated by using both verbal responses and eye-tracking data in terms of eye gaze behaviors. Results indicate that baseline conditions slightly outperform global filters in terms of verbal response and the eye gaze behaviors. To unpack the results in more details beyond group comparisons, three analyses (e.g., luminance, chroma, and spatial frequency) of image characteristics are performed to ascertain relevant aspects that contribute to the filter performance. The results indicate that there are no significant correlations between the image characteristics and the filter performance. However, among the three characteristics, spatial frequency is depicted as the most correlated factor with the filter performance. Additional analyses using neural networks, specifically Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN), are also explored. The result shows that CNN is more predictive of the relationship between an image and visual attention than MLP. This is a proof of concept that neural networks can be employed to identify images for future experiments, by avoiding any variance or bias in terms of unbalanced characteristics of images across the experimental image pool

    Statistical modelling of neuronal population activity: from data analysis to network function

    Get PDF
    The term statistical modelling refers to a number of abstract models designed to reproduce and understand the statistical properties of the activity of neuronal networks at the population level. Large-scale recordings by multielectrode arrays (MEAs) have now made possible to scale their use to larger groups of neurons. The initial step in this work focused on improving the data analysis pipeline that leads from the experimental protocol used in dense MEA recordings to a clean dataset of sorted spike times, to be used in model training. In collaboration with experimentalists, I contributed to developing a fast and scalable algorithm for spike sorting, which is based on action potential shapes and on the estimated location for the spike. Using the resulting datasets, I investigated the use of restricted Boltzmann machines in the analysis of neural data, finding that they can be used as a tool in the detection of neural ensembles or low-dimensional activity subspaces. I further studied the physical properties of RBMs fitted to neural activity, finding they exhibit signatures of criticality, as observed before in similar models. I discussed possible connections between this phenomenon and the \dynamical" criticality often observed in neuronal networks that exhibit emergent behaviour. Finally, I applied what I found about the structure of the parameter space in statistical models to the discovery of a learning rule that helps long-term storage of previously learned memories in Hopfield networks during sequential learning tasks. Overall, this work aimed to contribute to the computational tools used for analysing and modelling large neuronal populations, on different levels: starting from raw experimental recordings and gradually proceeding towards theoretical aspects

    Neural dynamics at successive stages of the ventral visual stream are consistent with hierarchical error signals

    Get PDF
    Ventral visual stream neural responses are dynamic, even for static image presentations. However, dynamical neural models of visual cortex are lacking as most progress has been made modeling static, time-averaged responses. Here, we studied population neural dynamics during face detection across three cortical processing stages. Remarkably, ~30 milliseconds after the initially evoked response, we found that neurons in intermediate level areas decreased their responses to typical configurations of their preferred face parts relative to their response for atypical configurations even while neurons in higher areas achieved and maintained a preference for typical configurations. These hierarchical neural dynamics were inconsistent with standard feedforward circuits. Rather, recurrent models computing prediction errors between stages captured the observed temporal signatures. This model of neural dynamics, which simply augments the standard feedforward model of online vision, suggests that neural responses to static images may encode top-down prediction errors in addition to bottom-up feature estimates.National Institutes of Health (U.S.) (Grant R01-EY014970)National Institutes of Health (U.S.) (Grant K99-EY022671)National Institutes of Health (U.S.) (Grant F32-EY019609)National Institutes of Health (U.S.) (Grant F32-EY022845)United States. Office of Naval Research (MURI-114407)McGovern Institute for Brain Research at MI
    corecore