4 research outputs found

    Making sense of neuromorphic event data for human action recognition

    Get PDF
    Neuromorphic vision sensors provide low power sensing and capture salient spatial-temporal events. The majority of the existing neuromorphic sensing work focus on object detection. However, since they only record the events, they provide an efficient signal domain for privacy aware surveillance tasks. This paper explores how the neuromorphic vision sensor data streams can be analysed for human action recognition, which is a challenging application. The proposed method is based on handcrafted features. It consists of a pre-processing step for removing the noisy events followed by the extraction of handcrafted local and global feature vectors corresponding to the underlying human action. The local features are extracted considering a set of high-order descriptive statistics from the spatio-temporal events in a time window slice, while the global features are extracted by considering the frequencies of occurrences of the temporal event sequences. Then, low complexity classifiers, such as, support vector machines (SVM) and K-Nearest Neighbours (KNNs), are trained using these feature vectors. The proposed method evaluation uses three groups of datasets: Emulator-based, re-recording-based and native NVS-based. The proposed method has outperformed the existing methods in terms of human action recognition accuracy rates by 0.54%, 19.3%, and 25.61% for E-KTH, E-UCF11 and E-HMDB51 datasets, respectively. This paper also reports results for three further datasets: E-UCF50, R-UCF50, and N-Actions, which are reported for the first time for human action recognition on neuromorphic vision sensor domain

    From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift

    Get PDF
    Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains

    Cognitive and Brain-inspired Processing Using Parallel Algorithms and Heterogeneous Chip Multiprocessor Architectures

    Get PDF
    This thesis explores how some neuromorphic engineering approaches can be used to speed up computations and reduce power consumption using neuromorphic hardware systems. These hardware designs are not well-suited to conventional algorithms, so new approaches must be used to take advantage of the parallel nature of these architectures. Background regarding probabilistic graphical models is presented along with brain-inspired ways to perform inference in Bayesian networks. A spiking neuron implementation is developed on two general-purpose parallel neuromorphic hardware devices, the SpiNNaker and the Parallella. Scalability results are shown along with speed improvements as compared to using mainstream processors on a desktop computer. General vector-matrix multiplication computations at various levels of precision are also explored using IBM's TrueNorth Neurosynaptic System. The TrueNorth contains highly-configurable hardware neurons and axons connected via crossbar arrays and consumes very little power but is less flexible than a more general-purpose neuromorphic system such as the SpiNNaker. Nevertheless, techniques described here enable useful computations to be performed utilizing such crossbar arrays with spiking neurons including computing word similarities using trained word vector embeddings. Another technique describes how to perform computations using only one column of the crossbar array at a time despite the fact that incoming spikes normally affect all columns of the array. A way to perform cognitive audio-visual beamforming is presented. Using two systems, each containing a spherical microphone array, sounds are localized using spherical harmonic beamforming. Combining the microphone arrays with 360 degree cameras provides an opportunity to overlay the sound localization with the visual data and create a combined audio-visual salience map. Cognitive computations can be performed on the audio signals to localize specific sounds while ignoring others based on their spectral characteristics. Finally, an ARM Cortex M0 processor design is shown that will be used to bootstrap and coordinate other processing units on a chip developed in the lab for the DARPA Unconventional Processing of Signals for Intelligent Data Exploitation (UPSIDE) program. This design includes a bootloader which provides full programmability each time the chip is booted, and the processor interfaces with other hardware modules to access the Networks-on-Chip and main memory
    corecore