    Noise Types Adaptation for Speech Enhancement with Recurrent Neural Network

    Speech enhancement is a critical part in automatic speech recognition systems. Recently with the development of deep learning based techniques, those speech enhancement systems trained with neural networks can significantly improve performance. While many of the latest speech enhancement systems show advantages in maximizing the perceptual quality of the noisy signals, they expose drawbacks when the test noisy signals have noise types that never exist during the system training process. The systems have relatively poor performance when handling noisy signals with unseen noise in contrast to noisy signals with seen noise. The dissimilarity between the training and testing circumstances can cause a serious performance decline in a deep learning task.In this work, a new method is proposed to solve the noise types problem. The framework has three parts: the autoencoder, the gradient reverse layers and the recurrent neural networks. The proposed framework can weaken the noise types influences when handling random noisy signals. This work shows that the new method outperforms the baseline models in unseen noise situations

    DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score

    We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for soundquality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create highquality output signals. However, since most OSQA scores are not analytically tractable, i.e., they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of black-box optimization, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability-density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized

    ERBM-SE: Extended Restricted Boltzmann Machine for Multi-Objective Single-Channel Speech Enhancement

    Machine learning-based supervised single-channel speech enhancement has achieved considerable research interest over conventional approaches. In this paper, an extended Restricted Boltzmann Machine (RBM) is proposed for the spectral masking-based noisy speech enhancement. In conventional RBM, the acoustic features for the speech enhancement task are layerwise extracted and the feature compression may result in loss of vital information during the network training. In order to exploit the important information in the raw data, an extended RBM is proposed for the acoustic feature representation and speech enhancement. In the proposed RBM, the acoustic features are progressively extracted by multiple-stacked RBMs during the pre-training phase. The hidden acoustic features from the previous RBM are combined with the raw input data that serve as the new inputs to the present RBM. By adding the raw data to RBMs, the layer-wise features related to the raw data are progressively extracted, that is helpful to mine valuable information in the raw data. The results using the TIMIT database showed that the proposed method successfully attenuated the noise and gained improvements in the speech quality and intelligibility. The STOI, PESQ and SDR are improved by 16.86%, 25.01% and 3.84dB over the unprocessed noisy speech

    Medical Image Classification using Deep Learning Techniques and Uncertainty Quantification

    The emergence of medical image analysis using deep learning techniques has introduced multiple challenges in terms of developing robust and trustworthy systems for automated grading and diagnosis. Several works have been presented to improve classification performance. However, these methods lack the diversity of capturing different levels of contextual information among image regions, strategies to present diversity in learning by using ensemble-based techniques, or uncertainty measures for predictions generated from automated systems. Consequently, the presented methods provide sub-optimal results which is not enough for clinical practice. To enhance classification performance and introduce trustworthiness, deep learning techniques and uncertainty quantification methods are required to provide diversity in contextual learning and the initial stage of explainability, respectively. This thesis aims to explore and develop novel deep learning techniques escorted by uncertainty quantification for developing actionable automated grading and diagnosis systems. More specifically, the thesis provides the following three main contributions. First, it introduces a novel entropy-based elastic ensemble of Deep Convolutional Neural Networks (DCNNs) architecture termed as 3E-Net for classifying grades of invasive breast carcinoma microscopic images. 3E-Net is based on a patch-wise network for feature extraction and image-wise networks for final image classification and uses an elastic ensemble based on Shannon Entropy as an uncertainty quantification method for measuring the level of randomness in image predictions. As the second contribution, the thesis presents a novel multi-level context and uncertainty-aware deep learning architecture named MCUa for the classification of breast cancer microscopic images. MCUa consists of multiple feature extractors and multi-level context-aware models in a dynamic ensemble fashion to learn the spatial dependencies among image patches and enhance the learning diversity. Also, the architecture uses Monte Carlo (MC) dropout for measuring the uncertainty of image predictions and deciding whether an input image is accurate based on the generated uncertainty score. The third contribution of the thesis introduces a novel model agnostic method (AUQantO) that establishes an actionable strategy for optimising uncertainty quantification for deep learning architectures. AUQantO method works on optimising a hyperparameter threshold, which is compared against uncertainty scores from Shannon entropy and MC-dropout. The optimal threshold is achieved based on single- and multi-objective functions which are optimised using multiple optimisation methods. A comprehensive set of experiments have been conducted using multiple medical imaging datasets and multiple novel evaluation metrics to prove the effectiveness of our three contributions to clinical practice. First, 3E-Net versions achieved an accuracy of 96.15% and 99.50% on invasive breast carcinoma dataset. The second contribution, MCUa, achieved an accuracy of 98.11% on Breast cancer histology images dataset. Lastly, AUQantO showed significant improvements in performance of the state-of-the-art deep learning models with an average accuracy improvement of 1.76% and 2.02% on Breast cancer histology images dataset and an average accuracy improvement of 5.67% and 4.24% on Skin cancer dataset using two uncertainty quantification techniques. AUQantO demonstrated the ability to generate the optimal number of excluded images in a particular dataset

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Efficient Multi-Objective NeuroEvolution in Computer Vision and Applications for Threat Identification

    Concealed threat detection is at the heart of critical security systems designed to en- sure public safety. Currently, methods for threat identification and detection are primarily manual, but there is a recent vision to automate the process. Problematically, developing computer vision models capable of operating in a wide range of settings, such as the ones arising in threat detection, is a challenging task involving multiple (and often conflicting) objectives. Automated machine learning (AutoML) is a flourishing field which endeavours to dis- cover and optimise models and hyperparameters autonomously, providing an alternative to classic, effort-intensive hyperparameter search. However, existing approaches typ- ically show significant downsides, like their (1) high computational cost/greediness in resources, (2) limited (or absent) scalability to custom datasets, (3) inability to provide competitive alternatives to expert-designed and heuristic approaches and (4) common consideration of a single objective. Moreover, most existing studies focus on standard classification tasks and thus cannot address a plethora of problems in threat detection and, more broadly, in a wide variety of compelling computer vision scenarios. This thesis leverages state-of-the-art convolutional autoencoders and semantic seg- mentation (Chapter 2) to develop effective multi-objective AutoML strategies for neural architecture search. These strategies are designed for threat detection and provide in- sights into some quintessential computer vision problems. To this end, the thesis first introduces two new models, a practical Multi-Objective Neuroevolutionary approach for Convolutional Autoencoders (MONCAE, Chapter 3) and a Resource-Aware model for Multi-Objective Semantic Segmentation (RAMOSS, Chapter 4). Interestingly, these ap- proaches reached state-of-the-art results using a fraction of computational resources re- quired by competing systems (0.33 GPU days compared to 3150), yet allowing for mul- tiple objectives (e.g., performance and number of parameters) to be simultaneously op- timised. This drastic speed-up was possible through the coalescence of neuroevolution algorithms with a new heuristic technique termed Progressive Stratified Sampling. The presented methods are evaluated on a range of benchmark datasets and then applied to several threat detection problems, outperforming previous attempts in balancing multiple objectives. The final chapter of the thesis focuses on thread detection, exploiting these two mod- els and novel components. It presents first a new modification of specialised proxy scores to be embedded in RAMOSS, enabling us to further accelerate the AutoML process even more drastically while maintaining avant-garde performance (above 85% precision for SIXray). This approach rendered a new automatic evolutionary Multi-objEctive method for cOncealed Weapon detection (MEOW), which outperforms state-of-the-art models for threat detection in key datasets: a gold standard benchmark (SixRay) and a security- critical, proprietary dataset. Finally, the thesis shifts the focus from neural architecture search to identifying the most representative data samples. Specifically, the Multi-objectIve Core-set Discovery through evolutionAry algorithMs in computEr vision approach (MIRA-ME) showcases how the new neural architecture search techniques developed in previous chapters can be adapted to operate on data space. MIRA-ME offers supervised and unsupervised ways to select maximally informative, compact sets of images via dataset compression. This operation can offset the computational cost further (above 90% compression), with a minimal sacrifice in performance (less than 5% for MNIST and less than 13% for SIXray). Overall, this thesis proposes novel model- and data-centred approaches towards a more widespread use of AutoML as an optimal tool for architecture and coreset discov- ery. With the presented and future developments, the work suggests that AutoML can effectively operate in real-time and performance-critical settings such as in threat de- tection, even fostering interpretability by uncovering more parsimonious optimal models. More widely, these approaches have the potential to provide effective solutions to chal- lenging computer vision problems that nowadays are typically considered unfeasible for AutoML settings

    Ensembles of Pruned Deep Neural Networks for Accurate and Privacy Preservation in IoT Applications

    The emergence of the AIoT (Artificial Intelligence of Things) represents the powerful convergence of Artificial Intelligence (AI) with the expansive realm of the Internet of Things (IoT). By integrating AI algorithms with the vast network of interconnected IoT devices, we open new doors for intelligent decision-making and edge data analysis, transforming various domains from healthcare and transportation to agriculture and smart cities. However, this integration raises pivotal questions: How can we ensure deep learning models are aptly compressed and quantised to operate seamlessly on devices constrained by computational resources, without compromising accuracy? How can these models be effectively tailored to cope with the challenges of statistical heterogeneity and the uneven distribution of class labels inherent in IoT applications? Furthermore, in an age where data is a currency, how do we uphold the sanctity of privacy for the sensitive data that IoT devices incessantly generate while also ensuring the unhampered deployment of these advanced deep learning models? Addressing these intricate challenges forms the crux of this thesis, with its contributions delineated as follows: Ensyth: A novel approach designed to synthesise pruned ensembles of deep learning models, which not only makes optimal use of limited IoT resources but also ensures a notable boost in predictability. Experimental evidence gathered from CIFAR-10, CIFAR-5, and MNIST-FASHION datasets solidify its merit, especially given its capacity to achieve high predictability. MicroNets: Venturing into the realms of efficiency, this is a multi-phase pruning pipeline that fuses the principles of weight pruning, channel pruning. Its objective is clear: foster efficient deep ensemble learning, specially crafted for IoT devices. Benchmark tests conducted on CIFAR-10 and CIFAR-100 datasets demonstrate its prowess, highlighting a compression ratio of nearly 92%, with these pruned ensembles surpassing the accuracy metrics set by conventional models. FedNets: Recognising the challenges of statistical heterogeneity in federated learning and the ever-growing concerns of data privacy, this innovative federated learning framework is introduced. It facilitates edge devices in their collaborative quest to train ensembles of pruned deep neural networks. More than just training, it ensures data privacy remains uncompromised. Evaluations conducted on the Federated CIFAR-100 dataset offer a testament to its efficacy. In this thesis, substantial contributions have been made to the AIoT application domain. Ensyth, MicroNets, and FedNets collaboratively tackle the challenges of efficiency, accuracy, statistical heterogeneity arising from distributed class labels, and privacy concerns inherent in deploying AI applications on IoT devices. The experimental results underscore the effectiveness of these approaches, paving the way for their practical implementation in real-world scenarios. By offering an integrated solution that satisfies multiple key requirements simultaneously, this research brings us closer to the realisation of effective and privacy-preserved AIoT systems

    Deep neural network generation for image classification within resource-constrained environments using evolutionary and hand-crafted processes

    Constructing Convolutional Neural Networks (CNN) models is a manual process requiringexpert knowledge and trial and error. Background research highlights the following knowledge gaps. 1) existing efficiency-focused CNN models make design choices that impact model performance. Better ways are needed to construct accurate models for resourceconstrained environments that lack graphics processing units (GPU’s) to speed up model inference time such as CCTV cameras and IoT devices. 2) Existing methods for automatically designing CNN architectures do not explore the search space effectively for the best solution and 3) existing methods for automatically designing CNN architectures do not exploit modern model architecture design patterns such as residual connections. The lack of residual connections means the model depth is limited owing to the vanishing gradient problem. Furthermore, existing methods for automatically designing CNN architectures adopt search strategies that make them vulnerable to local minima traps. Better techniques to construct efficient CNN models, and automated approaches that can produce accurate deep model constructions advance many areas such as hazard detection, medical diagnosis and robotics in both academia and industry. The work undertaken during this research are 1) the proposal of an efficient and accurate CNN architecture for resource-constrained environments owing to a novel block structure containing 1x3 and 3x1 convolutions to save computational cost, 2) proposed a particle swarm optimization (PSO) method of automatically constructing efficient deep CNN architectures with greater accuracy by proposing a novel encoding and search strategy, 3) proposed a PSO based method of automatically constructing deeper CNN models with improved accuracy by proposing a novel encoding scheme that employs residual connections, in novel search mechanism that follows the global and neighbouring best leaders. The main findings of this research are 1) the proposed efficiency-focused CNN model outperformed MobileNetV2 by 13.43% in respect to accuracy, and 39.63% in respect to efficiency, measured in floating-point operations. A reduction in floating-point operations means the model has the potential for faster inference times which is beneficial to applications within resource-constrained environments without GPU’s such as CCTV cameras. 2) the proposed automatic CNN generation technique outperformed existing methods by 7.58% in respect to accuracy and a 63% improvement in search time efficiency owing to the proposal of more efficient architectures speeding up the search process and 3) the proposed automatic deep residual CNN generation method improved model accuracy by 4.43% when compared against related studies owing to deeper model construction and improvements in the search process. The proposed search process embeds human knowledge of constructing deep residual networks and provides constraint settings which can be used to limit the proposed models depth and width. The ability to constrain a models depth and width is important as it ensures the upper bounds of a proposed model will fit within the constraints of resource-constrained environments