1,089 research outputs found
Novel Hybrid-Learning Algorithms for Improved Millimeter-Wave Imaging Systems
Increasing attention is being paid to millimeter-wave (mmWave), 30 GHz to 300
GHz, and terahertz (THz), 300 GHz to 10 THz, sensing applications including
security sensing, industrial packaging, medical imaging, and non-destructive
testing. Traditional methods for perception and imaging are challenged by novel
data-driven algorithms that offer improved resolution, localization, and
detection rates. Over the past decade, deep learning technology has garnered
substantial popularity, particularly in perception and computer vision
applications. Whereas conventional signal processing techniques are more easily
generalized to various applications, hybrid approaches where signal processing
and learning-based algorithms are interleaved pose a promising compromise
between performance and generalizability. Furthermore, such hybrid algorithms
improve model training by leveraging the known characteristics of radio
frequency (RF) waveforms, thus yielding more efficiently trained deep learning
algorithms and offering higher performance than conventional methods. This
dissertation introduces novel hybrid-learning algorithms for improved mmWave
imaging systems applicable to a host of problems in perception and sensing.
Various problem spaces are explored, including static and dynamic gesture
classification; precise hand localization for human computer interaction;
high-resolution near-field mmWave imaging using forward synthetic aperture
radar (SAR); SAR under irregular scanning geometries; mmWave image
super-resolution using deep neural network (DNN) and Vision Transformer (ViT)
architectures; and data-level multiband radar fusion using a novel
hybrid-learning architecture. Furthermore, we introduce several novel
approaches for deep learning model training and dataset synthesis.Comment: PhD Dissertation Submitted to UTD ECE Departmen
Deep spiking neural networks with applications to human gesture recognition
The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals.
From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems.
From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed.
In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals.
From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems.
From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed.
In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise
OR Residual Connection Achieving Comparable Accuracy to ADD Residual Connection in Deep Residual Spiking Neural Networks
Spiking Neural Networks (SNNs) have garnered substantial attention in
brain-like computing for their biological fidelity and the capacity to execute
energy-efficient spike-driven operations. As the demand for heightened
performance in SNNs surges, the trend towards training deeper networks becomes
imperative, while residual learning stands as a pivotal method for training
deep neural networks. In our investigation, we identified that the SEW-ResNet,
a prominent representative of deep residual spiking neural networks,
incorporates non-event-driven operations. To rectify this, we introduce the OR
Residual connection (ORRC) to the architecture. Additionally, we propose the
Synergistic Attention (SynA) module, an amalgamation of the Inhibitory
Attention (IA) module and the Multi-dimensional Attention (MA) module, to
offset energy loss stemming from high quantization. When integrating SynA into
the network, we observed the phenomenon of "natural pruning", where after
training, some or all of the shortcuts in the network naturally drop out
without affecting the model's classification accuracy. This significantly
reduces computational overhead and makes it more suitable for deployment on
edge devices. Experimental results on various public datasets confirmed that
the SynA enhanced OR-Spiking ResNet achieved single-sample classification with
as little as 0.8 spikes per neuron. Moreover, when compared to other spike
residual models, it exhibited higher accuracy and lower power consumption.
Codes are available at https://github.com/Ym-Shan/ORRC-SynA-natural-pruning.Comment: 16 pages, 8 figures and 11table
- …