43 research outputs found

    Deep spiking neural networks with applications to human gesture recognition

    Get PDF
    The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise

    A Posture Sequence Learning System for an Anthropomorphic Robotic Hand

    Get PDF
    The paper presents a cognitive architecture for posture learning of an anthropomorphic robotic hand. Our approach is aimed to allow the robotic system to perform complex perceptual operations, to interact with a human user and to integrate the perceptions by a cognitive representation of the scene and the observed actions. The anthropomorphic robotic hand imitates the gestures acquired by the vision system in order to learn meaningful movements, to build its knowledge by different conceptual spaces and to perform complex interaction with the human operator

    Sensory integration model inspired by the superior colliculus for multimodal stimuli localization

    Get PDF
    Sensory information processing is an important feature of robotic agents that must interact with humans or the environment. For example, numerous attempts have been made to develop robots that have the capability of performing interactive communication. In most cases, individual sensory information is processed and based on this, an output action is performed. In many robotic applications, visual and audio sensors are used to emulate human-like communication. The Superior Colliculus, located in the mid-brain region of the nervous system, carries out similar functionality of audio and visual stimuli integration in both humans and animals. In recent years numerous researchers have attempted integration of sensory information using biological inspiration. A common focus lies in generating a single output state (i.e. a multimodal output) that can localize the source of the audio and visual stimuli. This research addresses the problem and attempts to find an effective solution by investigating various computational and biological mechanisms involved in the generation of multimodal output. A primary goal is to develop a biologically inspired computational architecture using artificial neural networks. The advantage of this approach is that it mimics the behaviour of the Superior Colliculus, which has the potential of enabling more effective human-like communication with robotic agents. The thesis describes the design and development of the architecture, which is constructed from artificial neural networks using radial basis functions. The primary inspiration for the architecture came from emulating the function top and deep layers of the Superior Colliculus, due to their visual and audio stimuli localization mechanisms, respectively. The integration experimental results have successfully demonstrated the key issues, including low-level multimodal stimuli localization, dimensionality reduction of audio and visual input-space without affecting stimuli strength, and stimuli localization with enhancement and depression phenomena. Comparisons have been made between computational and neural network based methods, and unimodal verses multimodal integrated outputs in order to determine the effectiveness of the approach.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Sensory Integration Model Inspired by the Superior Colliculus For Multimodal Stimuli Localization

    Get PDF
    Sensory information processing is an important feature of robotic agents that must interact with humans or the environment. For example, numerous attempts have been made to develop robots that have the capability of performing interactive communication. In most cases, individual sensory information is processed and based on this, an output action is performed. In many robotic applications, visual and audio sensors are used to emulate human-like communication. The Superior Colliculus, located in the mid-brain region of the nervous system, carries out similar functionality of audio and visual stimuli integration in both humans and animals. In recent years numerous researchers have attempted integration of sensory information using biological inspiration. A common focus lies in generating a single output state (i.e. a multimodal output) that can localize the source of the audio and visual stimuli. This research addresses the problem and attempts to find an effective solution by investigating various computational and biological mechanisms involved in the generation of multimodal output. A primary goal is to develop a biologically inspired computational architecture using artificial neural networks. The advantage of this approach is that it mimics the behaviour of the Superior Colliculus, which has the potential of enabling more effective human-like communication with robotic agents. The thesis describes the design and development of the architecture, which is constructed from artificial neural networks using radial basis functions. The primary inspiration for the architecture came from emulating the function top and deep layers of the Superior Colliculus, due to their visual and audio stimuli localization mechanisms, respectively. The integration experimental results have successfully demonstrated the key issues, including low-level multimodal stimuli localization, dimensionality reduction of audio and visual input-space without affecting stimuli strength, and stimuli localization with enhancement and depression phenomena. Comparisons have been made between computational and neural network based methods, and unimodal verses multimodal integrated outputs in order to determine the effectiveness of the approach

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    Mind out of matter: topics in the physical foundations of consciousness and cognition

    Get PDF
    This dissertation begins with an exploration of a brand of dual aspect monism and some problems deriving from the distinction between a first person and third person point of view. I continue with an outline of one way in which the conscious experience of the subject might arise from organisational properties of a material substrate. With this picture to hand, I first examine theoretical features at the level of brain organisation which may be required to support conscious experience and then discuss what bearing some actual attributes of biological brains might have on such experience. I conclude the first half of the dissertation with comments on information processing and with artificial neural networks meant to display simple varieties of the organisational features initially described abstractly.While the first half begins with a view of conscious experience and infers downwards in the organisational hierarchy to explore neural features suggested by the view, attention in the second half shifts towards analysing low level dynamical features of material substrates and inferring upwards to possible effects on experience. There is particular emphasis on clarifying the role of chaotic dynamics, and I discuss relationships between levels of description of a cognitive system and comment on issues of complexity, computability, and predictability before returning to the topic of representation which earlier played a central part in isolating features of brain organisation which may underlie conscious experience.Some themes run throughout the dissertation, including an emphasis on understanding experience from both the first person and the third person points of view and on analysing the latter at different levels of description. Other themes include a sustained effort to integrate the picture offered here with existing empirical data and to situate current problems in the philosophy of mind within the new framework, as well as an appeal to tools from mathematics, computer science, and cognitive science to complement the more standard philosophical repertoire
    corecore