9 research outputs found

    S3TC: Spiking Separated Spatial and Temporal Convolutions with Unsupervised STDP-based Learning for Action Recognition

    Full text link
    Video analysis is a major computer vision task that has received a lot of attention in recent years. The current state-of-the-art performance for video analysis is achieved with Deep Neural Networks (DNNs) that have high computational costs and need large amounts of labeled data for training. Spiking Neural Networks (SNNs) have significantly lower computational costs (thousands of times) than regular non-spiking networks when implemented on neuromorphic hardware. They have been used for video analysis with methods like 3D Convolutional Spiking Neural Networks (3D CSNNs). However, these networks have a significantly larger number of parameters compared with spiking 2D CSNN. This, not only increases the computational costs, but also makes these networks more difficult to implement with neuromorphic hardware. In this work, we use CSNNs trained in an unsupervised manner with the Spike Timing-Dependent Plasticity (STDP) rule, and we introduce, for the first time, Spiking Separated Spatial and Temporal Convolutions (S3TCs) for the sake of reducing the number of parameters required for video analysis. This unsupervised learning has the advantage of not needing large amounts of labeled data for training. Factorizing a single spatio-temporal spiking convolution into a spatial and a temporal spiking convolution decreases the number of parameters of the network. We test our network with the KTH, Weizmann, and IXMAS datasets, and we show that S3TCs successfully extract spatio-temporal information from videos, while increasing the output spiking activity, and outperforming spiking 3D convolutions

    2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition

    No full text
    International audienceCurrent advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multilayered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motionbased recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences

    Spiking Neural Networks Trained withUnsupervised STDP for Video Analysis

    No full text
    Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. Traditional artificial neural networks have considerably high computational costs with video analysis, and many modern applications such as autonomous vehicles have limited computational resources. Spiking neural networks (SNNs) are third generation, biologically plausible models that are seen as hypothetical solutions for the bottlenecks of ANNs, such as energy efficiency. However, current SNN-specific methods that achieve good classification rates, such as ANN-to-SNN conversion and back-propagation, depend on labeled data, which requires costly human intervention. Meanwhile, unsupervised learning with SNNs using the spike timing-dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks. However, STDP-based SNNs are still immature. SNNs trained in an unsupervised manner with STDP can hypothetically surpass ANNs in energy efficiency, and thus must be studied and improved. In this work, we study the performance of these networks with human action recognition tasks. Moreover, we focus on the motion found in videos in order to recognise the actions. In this paper, we focus on studying the effects that different motion modeling techniques can have on the spatio-temporal features extracted by a spiking neural network trained with unsupervised STDP

    Spiking Neural Networks Trained withUnsupervised STDP for Video Analysis

    No full text
    Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. Traditional artificial neural networks have considerably high computational costs with video analysis, and many modern applications such as autonomous vehicles have limited computational resources. Spiking neural networks (SNNs) are third generation, biologically plausible models that are seen as hypothetical solutions for the bottlenecks of ANNs, such as energy efficiency. However, current SNN-specific methods that achieve good classification rates, such as ANN-to-SNN conversion and back-propagation, depend on labeled data, which requires costly human intervention. Meanwhile, unsupervised learning with SNNs using the spike timing-dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks. However, STDP-based SNNs are still immature. SNNs trained in an unsupervised manner with STDP can hypothetically surpass ANNs in energy efficiency, and thus must be studied and improved. In this work, we study the performance of these networks with human action recognition tasks. Moreover, we focus on the motion found in videos in order to recognise the actions. In this paper, we focus on studying the effects that different motion modeling techniques can have on the spatio-temporal features extracted by a spiking neural network trained with unsupervised STDP

    A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP

    No full text
    International audienceThere has been an increasing interest in spiking neural networks in recent years. SNNs are seen as hypothetical solutions for the bottlenecks of ANNs in pattern recognition, such as energy efficiency. But current methods such as ANN-to-SNN conversion and back-propagation do not take full advantage of these networks, and unsupervised methods have not yet reached a success comparable to advanced artificial neural networks. It is important to study the behavior of SNNs trained with unsupervised learning methods such as spiketiming dependent plasticity (STDP) on video classification tasks, including mechanisms to model motion information using spikes, as this information is critical for video understanding. This paper presents multiple methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding. These methods are paired with two types of temporal fusion known as early and late fusion, and are used to help the spiking neural network in capturing the spatio-temporal features from videos. In this paper, we rely on the network architecture of a convolutional spiking neural network trained with STDP, and we test the performance of this network when challenged with action recognition tasks. Understanding how a spiking neural network responds to different methods of movement extraction and representation can help reduce the performance gap between SNNs and ANNs. In this paper we show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks, we also highlight the effectiveness of some methods compared to others

    2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition

    No full text
    International audienceCurrent advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multilayered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motionbased recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences

    A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP

    No full text
    International audienceThere has been an increasing interest in spiking neural networks in recent years. SNNs are seen as hypothetical solutions for the bottlenecks of ANNs in pattern recognition, such as energy efficiency. But current methods such as ANN-to-SNN conversion and back-propagation do not take full advantage of these networks, and unsupervised methods have not yet reached a success comparable to advanced artificial neural networks. It is important to study the behavior of SNNs trained with unsupervised learning methods such as spiketiming dependent plasticity (STDP) on video classification tasks, including mechanisms to model motion information using spikes, as this information is critical for video understanding. This paper presents multiple methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding. These methods are paired with two types of temporal fusion known as early and late fusion, and are used to help the spiking neural network in capturing the spatio-temporal features from videos. In this paper, we rely on the network architecture of a convolutional spiking neural network trained with STDP, and we test the performance of this network when challenged with action recognition tasks. Understanding how a spiking neural network responds to different methods of movement extraction and representation can help reduce the performance gap between SNNs and ANNs. In this paper we show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks, we also highlight the effectiveness of some methods compared to others

    A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP

    No full text
    International audienceThere has been an increasing interest in spiking neural networks in recent years. SNNs are seen as hypothetical solutions for the bottlenecks of ANNs in pattern recognition, such as energy efficiency. But current methods such as ANN-to-SNN conversion and back-propagation do not take full advantage of these networks, and unsupervised methods have not yet reached a success comparable to advanced artificial neural networks. It is important to study the behavior of SNNs trained with unsupervised learning methods such as spiketiming dependent plasticity (STDP) on video classification tasks, including mechanisms to model motion information using spikes, as this information is critical for video understanding. This paper presents multiple methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding. These methods are paired with two types of temporal fusion known as early and late fusion, and are used to help the spiking neural network in capturing the spatio-temporal features from videos. In this paper, we rely on the network architecture of a convolutional spiking neural network trained with STDP, and we test the performance of this network when challenged with action recognition tasks. Understanding how a spiking neural network responds to different methods of movement extraction and representation can help reduce the performance gap between SNNs and ANNs. In this paper we show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks, we also highlight the effectiveness of some methods compared to others

    2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition

    No full text
    International audienceCurrent advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multilayered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motionbased recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences
    corecore