35 research outputs found
Recommended from our members
Biologically inspired speaker verification
Speaker verification is an active research problem that has been addressed using a variety of different classification techniques. However, in general, methods inspired by the human auditory system tend to show better verification performance than other methods. In this thesis three biologically inspired speaker verification algorithms are presented
Deep spiking neural networks with applications to human gesture recognition
The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals.
From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems.
From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed.
In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals.
From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems.
From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed.
In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise
Sensory integration model inspired by the superior colliculus for multimodal stimuli localization
Sensory information processing is an important feature of robotic agents that must interact with humans or the environment. For example, numerous attempts have been made to develop robots that have the capability of performing interactive communication. In most cases, individual sensory information is processed and based on this, an output action is performed. In many robotic applications, visual and audio sensors are used to emulate human-like communication. The Superior Colliculus, located in the mid-brain region of the nervous system, carries out similar functionality of audio and visual stimuli integration in both humans and animals. In recent years numerous researchers have attempted integration of sensory information using biological inspiration. A common focus lies in generating a single output state (i.e. a multimodal output) that can localize the source of the audio and visual stimuli. This research addresses the problem and attempts to find an effective solution by investigating various computational and biological mechanisms involved in the generation of multimodal output. A primary goal is to develop a biologically inspired computational architecture using artificial neural networks. The advantage of this approach is that it mimics the behaviour of the Superior Colliculus, which has the potential of enabling more effective human-like communication with robotic agents. The thesis describes the design and development of the architecture, which is constructed from artificial neural networks using radial basis functions. The primary inspiration for the architecture came from emulating the function top and deep layers of the Superior Colliculus, due to their visual and audio stimuli localization mechanisms, respectively. The integration experimental results have successfully demonstrated the key issues, including low-level multimodal stimuli localization, dimensionality reduction of audio and visual input-space without affecting stimuli strength, and stimuli localization with enhancement and depression phenomena. Comparisons have been made between computational and neural network based methods, and unimodal verses multimodal integrated outputs in order to determine the effectiveness of the approach.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Sensory Integration Model Inspired by the Superior Colliculus For Multimodal Stimuli Localization
Sensory information processing is an important feature of robotic agents that must
interact with humans or the environment. For example, numerous attempts have
been made to develop robots that have the capability of performing interactive
communication. In most cases, individual sensory information is processed and
based on this, an output action is performed. In many robotic applications, visual
and audio sensors are used to emulate human-like communication. The Superior
Colliculus, located in the mid-brain region of the nervous system, carries out
similar functionality of audio and visual stimuli integration in both humans and
animals.
In recent years numerous researchers have attempted integration of sensory
information using biological inspiration. A common focus lies in generating a single
output state (i.e. a multimodal output) that can localize the source of the audio and
visual stimuli. This research addresses the problem and attempts to find an
effective solution by investigating various computational and biological
mechanisms involved in the generation of multimodal output. A primary goal is to
develop a biologically inspired computational architecture using artificial neural
networks. The advantage of this approach is that it mimics the behaviour of the
Superior Colliculus, which has the potential of enabling more effective human-like
communication with robotic agents.
The thesis describes the design and development of the architecture, which is
constructed from artificial neural networks using radial basis functions. The
primary inspiration for the architecture came from emulating the function top and
deep layers of the Superior Colliculus, due to their visual and audio stimuli
localization mechanisms, respectively. The integration experimental results have
successfully demonstrated the key issues, including low-level multimodal stimuli
localization, dimensionality reduction of audio and visual input-space without
affecting stimuli strength, and stimuli localization with enhancement and
depression phenomena. Comparisons have been made between computational
and neural network based methods, and unimodal verses multimodal integrated
outputs in order to determine the effectiveness of the approach
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
Recommended from our members
EVA London 2022: Electronic Visualisation and the Arts
The Electronic Visualisation and the Arts London 2022 Conference (EVA London 2022) is co-sponsored by the Computer Arts Society (CAS) and BCS, the Chartered Institute for IT, of which the CAS is a Specialist Group. Of course, this has been a difficult time for all conferences, with the Covid-19 pandemic. For the first time since 2019, the EVA London 2022 Conference is a physical conference. It is also an online conference, as it was in the previous two years. We continue with publishing the proceedings, both online, with open access via ScienceOpen, and also in our traditional printed form, for the second year in full colour. Over recent decades, the EVA London Conference on Electronic Visualisation and the Arts has established itself as one of the United Kingdom’s most innovative and interdisciplinary conferences. It brings together a wide range of research domains to celebrate a diverse set of interests, with a specialised focus on visualisation. The long and short papers in this volume cover varied topics concerning the arts, visualisations, and IT, including 3D graphics, animation, artificial intelligence, creativity, culture, design, digital art, ethics, heritage, literature, museums, music, philosophy, politics, publishing, social media, and virtual reality, as well as other related interdisciplinary areas.
The EVA London 2022 proceedings presents a wide spectrum of papers, demonstrations, Research Workshop contributions, other workshops, and for the seventh year, the EVA London Symposium, in the form of an opening morning session, with three invited contributors. The conference includes a number of other associated evening events including ones organised by the Computer Arts Society, Art in Flux, and EVA International. As in previous years, there are Research Workshop contributions in this volume, aimed at encouraging participation by postgraduate students and early-career artists, accepted either through the peer-review process or directly by the Research Workshop chair. The Research Workshop contributors are offered bursaries to aid participation. In particular, EVA London liaises with Art in Flux, a London-based group of digital artists. The EVA London 2022 proceedings includes long papers and short “poster” papers from international researchers inside and outside academia, from graduate artists, PhD students, industry professionals, established scholars, and senior researchers, who value EVA London for its interdisciplinary community. The conference also features keynote talks. A special feature this year is support for Ukrainian culture after its invasion earlier in the year. This publication has resulted from a selective peer review process, fitting as many excellent submissions as possible into the proceedings.
This year, submission numbers were lower than previous years, mostly likely due to the pandemic and a new requirement to submit drafts of long papers for review as well as abstracts. It is still pleasing to have so many good proposals from which to select the papers that have been included. EVA London is part of a larger network of EVA international conferences. EVA events have been held in Athens, Beijing, Berlin, Brussels, California, Cambridge (both UK and USA), Canberra, Copenhagen, Dallas, Delhi, Edinburgh, Florence, Gifu (Japan), Glasgow, Harvard, Jerusalem, Kiev, Laval, London, Madrid, Montreal, Moscow, New York, Paris, Prague, St Petersburg, Thessaloniki, and Warsaw. Further venues for EVA conferences are very much encouraged by the EVA community. As noted earlier, this volume is a record of accepted submissions to EVA London 2022. Associated online presentations are in general recorded and made available online after the conference
Mind out of matter: topics in the physical foundations of consciousness and cognition
This dissertation begins with an exploration of a brand of dual
aspect monism and some problems deriving from the distinction between
a first person and third person point of view. I continue with an outline
of one way in which the conscious experience of the subject might arise
from organisational properties of a material substrate. With this picture to
hand, I first examine theoretical features at the level of brain organisation
which may be required to support conscious experience and then discuss
what bearing some actual attributes of biological brains might have on
such experience. I conclude the first half of the dissertation with
comments on information processing and with artificial neural networks
meant to display simple varieties of the organisational features initially
described abstractly.While the first half begins with a view of conscious experience and
infers downwards in the organisational hierarchy to explore neural
features suggested by the view, attention in the second half shifts towards
analysing low level dynamical features of material substrates and inferring
upwards to possible effects on experience. There is particular emphasis on
clarifying the role of chaotic dynamics, and I discuss relationships between
levels of description of a cognitive system and comment on issues of
complexity, computability, and predictability before returning to the topic
of representation which earlier played a central part in isolating features of
brain organisation which may underlie conscious experience.Some themes run throughout the dissertation, including an
emphasis on understanding experience from both the first person and the
third person points of view and on analysing the latter at different levels
of description. Other themes include a sustained effort to integrate the
picture offered here with existing empirical data and to situate current
problems in the philosophy of mind within the new framework, as well as
an appeal to tools from mathematics, computer science, and cognitive
science to complement the more standard philosophical repertoire