102 research outputs found

    Humans and deep networks largely agree on which kinds of variation make object recognition harder

    Get PDF
    View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best algorithms for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition using the same images and controlling for both the kinds of transformation as well as their magnitude. We used four object categories and images were rendered from 3D computer models. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position. This suggests that humans recognize objects mainly through 2D template matching, rather than by constructing 3D object models, and that DCNNs are not too unreasonable models of human feed-forward vision. Also, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research

    Bio-Inspired Multi-Layer Spiking Neural Network Extracts Discriminative Features from Speech Signals

    Full text link
    Spiking neural networks (SNNs) enable power-efficient implementations due to their sparse, spike-based coding scheme. This paper develops a bio-inspired SNN that uses unsupervised learning to extract discriminative features from speech signals, which can subsequently be used in a classifier. The architecture consists of a spiking convolutional/pooling layer followed by a fully connected spiking layer for feature discovery. The convolutional layer of leaky, integrate-and-fire (LIF) neurons represents primary acoustic features. The fully connected layer is equipped with a probabilistic spike-timing-dependent plasticity learning rule. This layer represents the discriminative features through probabilistic, LIF neurons. To assess the discriminative power of the learned features, they are used in a hidden Markov model (HMM) for spoken digit recognition. The experimental results show performance above 96% that compares favorably with popular statistical feature extraction methods. Our results provide a novel demonstration of unsupervised feature acquisition in an SNN

    NatCSNN: A Convolutional Spiking Neural Network for recognition of objects extracted from natural images

    Full text link
    Biological image processing is performed by complex neural networks composed of thousands of neurons interconnected via thousands of synapses, some of which are excitatory and others inhibitory. Spiking neural models are distinguished from classical neurons by being biological plausible and exhibiting the same dynamics as those observed in biological neurons. This paper proposes a Natural Convolutional Neural Network (NatCSNN) which is a 3-layer bio-inspired Convolutional Spiking Neural Network (CSNN), for classifying objects extracted from natural images. A two-stage training algorithm is proposed using unsupervised Spike Timing Dependent Plasticity (STDP) learning (phase 1) and ReSuMe supervised learning (phase 2). The NatCSNN was trained and tested on the CIFAR-10 dataset and achieved an average testing accuracy of 84.7% which is an improvement over the 2-layer neural networks previously applied to this dataset.Comment: 12 page

    UAV detection : a STDP trained deep convolutional spiking neural network retina-neuromorphic approach

    Get PDF
    The Dynamic Vision Sensor (DVS) has many attributes, such as sub-millisecond response time along with a good low light dy- namic range, that allows it to be well suited to the task for UAV De- tection. This paper proposes a system that exploits the features of an event camera solely for UAV detection while combining it with a Spik- ing Neural Network (SNN) trained using the unsupervised approach of Spike Time-Dependent Plasticity (STDP), to create an asynchronous, low power system with low computational overhead. Utilising the unique features of both the sensor and the network, this result in a system that is robust to a wide variety in lighting conditions, has a high temporal resolution, propagates only the minimal amount of information through the network, while training using the equivalent of 43,000 images. The network returns a 91% detection rate when shown other objects and can detect a UAV with less than 1% of pixels on the sensor being used for processing

    Parkinson’s Disease Detection Using Isosurfaces-Based Features and Convolutional Neural Networks

    Get PDF
    Computer aided diagnosis systems based on brain imaging are an important tool to assist in the diagnosis of Parkinson’s disease, whose ultimate goal is the detection by automatic recognizing of patterns that characterize the disease. In recent times Convolutional Neural Networks (CNN) have proved to be amazingly useful for that task. The drawback, however, is that 3D brain images contain a huge amount of information that leads to complex CNN architectures. When these architectures become too complex, classification performances often degrades because the limitations of the training algorithm and overfitting. Thus, this paper proposes the use of isosurfaces as a way to reduce such amount of data while keeping the most relevant information. These isosurfaces are then used to implement a classification system which uses two of the most well-known CNN architectures, LeNet and AlexNet, to classify DaTScan images with an average accuracy of 95.1% and AUC = 97%, obtaining comparable (slightly better) values to those obtained for most of the recently proposed systems. It can be concluded therefore that the computation of isosurfaces reduces the complexity of the inputs significantly, resulting in high classification accuracies with reduced computational burden.MINECO/FEDER under TEC2015-64718-R, PSI2015-65848-R, PGC2018-098813-B-C32, and RTI2018-098913-B-100 projects

    A general-purpose mechanism of visual feature association in visual word identification and beyond

    Get PDF
    As writing systems are a relatively novel invention (slightly over 5 kya),1 they could not have influenced the evolution of our species. Instead, reading might recycle evolutionary older mechanisms that originally supported other tasks2,3 and preceded the emergence of written language. Accordingly, it has been shown that baboons and pigeons can be trained to distinguish words from nonwords based on orthographic regularities in letter co-occurrence.4,5 This suggests that part of what is usually considered reading-specific processing could be performed by domain-general visual mechanisms. Here, we tested this hypothesis in humans: if the reading system relies on domain-general visual mechanisms, some of the effects that are often found with orthographic material should also be observable with non-orthographic visual stimuli. We performed three experiments using the same exact design but with visual stimuli that progressively departed from orthographic material. Subjects were passively familiarized with a set of composite visual items and tested in an oddball paradigm for their ability to detect novel stimuli. Participants showed robust sensitivity to the co-occurrence of features (\u201cbigram\u201d coding) with strings of letter-like symbols but also with made-up 3D objects and sinusoidal gratings. This suggests that the processing mechanisms involved in the visual recognition of novel words also support the recognition of other novel visual objects. These mechanisms would allow the visual system to capture statistical regularities in the visual environment.6\u20139 We hope that this work will inspire models of reading that, although addressing its unique aspects, place it within the broader context of vision. Vidal et al. show that an effect usually studied in the context of reading\u2014sensitivity to bigram frequencies\u2014is also found when participants are presented with images of objects and circular sinusoidal gratings. This suggests that some mechanisms implied in the processing of novel words are in fact of general purpose

    A review of learning in biologically plausible spiking neural networks

    Get PDF
    Artificial neural networks have been used as a powerful processing tool in various areas such as pattern recognition, control, robotics, and bioinformatics. Their wide applicability has encouraged researchers to improve artificial neural networks by investigating the biological brain. Neurological research has significantly progressed in recent years and continues to reveal new characteristics of biological neurons. New technologies can now capture temporal changes in the internal activity of the brain in more detail and help clarify the relationship between brain activity and the perception of a given stimulus. This new knowledge has led to a new type of artificial neural network, the Spiking Neural Network (SNN), that draws more faithfully on biological properties to provide higher processing abilities. A review of recent developments in learning of spiking neurons is presented in this paper. First the biological background of SNN learning algorithms is reviewed. The important elements of a learning algorithm such as the neuron model, synaptic plasticity, information encoding and SNN topologies are then presented. Then, a critical review of the state-of-the-art learning algorithms for SNNs using single and multiple spikes is presented. Additionally, deep spiking neural networks are reviewed, and challenges and opportunities in the SNN field are discussed

    How biological attention mechanisms improve task performance in a large-scale visual system model

    Get PDF
    How does attentional modulation of neural activity enhance performance? Here we use a deep convolutional neural network as a large-scale model of the visual system to address this question. We model the feature similarity gain model of attention, in which attentional modulation is applied according to neural stimulus tuning. Using a variety of visual tasks, we show that neural modulations of the kind and magnitude observed experimentally lead to performance changes of the kind and magnitude observed experimentally. We find that, at earlier layers, attention applied according to tuning does not successfully propagate through the network, and has a weaker impact on performance than attention applied according to values computed for optimally modulating higher areas. This raises the question of whether biological attention might be applied at least in part to optimize function rather than strictly according to tuning. We suggest a simple experiment to distinguish these alternatives
    • …
    corecore