306 research outputs found

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Noise Types Adaptation for Speech Enhancement with Recurrent Neural Network

    Get PDF
    Speech enhancement is a critical part in automatic speech recognition systems. Recently with the development of deep learning based techniques, those speech enhancement systems trained with neural networks can significantly improve performance. While many of the latest speech enhancement systems show advantages in maximizing the perceptual quality of the noisy signals, they expose drawbacks when the test noisy signals have noise types that never exist during the system training process. The systems have relatively poor performance when handling noisy signals with unseen noise in contrast to noisy signals with seen noise. The dissimilarity between the training and testing circumstances can cause a serious performance decline in a deep learning task.In this work, a new method is proposed to solve the noise types problem. The framework has three parts: the autoencoder, the gradient reverse layers and the recurrent neural networks. The proposed framework can weaken the noise types influences when handling random noisy signals. This work shows that the new method outperforms the baseline models in unseen noise situations

    Robust detection of North Atlantic right whales using deep learning methods

    Get PDF
    This thesis begins by assessing the current state of marine mammal detection, specifically investigating currently used detection platforms and approaches of detection. The recent development of autonomous platforms provides a necessity for automated processing of hydrophone recordings and suitable methods to detect marine mammals from their acoustic vocalisations. Although passive acoustic monitoring is not a novel topic, the detection of marine mammals from their vocalisations using machine learning is still in its infancy. Specifically, detection of the highly endangered North Atlantic right whale (Eubalaena glacialis) is investigated. A large variety of machine learning algorithms are developed and applied to the detection of North Atlantic right whale (NARW) vocalisations with a comparison of methods presented to discover which provides the highest detection accuracy. Convolutional neural networks are found to outperform other machine learning methods and provide the highest detection accuracy when given spectrograms of acoustic recordings for detection. Next, tests investigate the use of both audio and image based enhancements method for improving detection accuracy in noisy conditions. Log spectrogram features and log histogram equalisation features both achieve comparable detection accuracy when tested in clean (noise-free), and noisy conditions. Further work provides an investigation into deep learning denoising approaches, applying both denoising autoencoders and denoising convolutional neural networks to noisy NARW vocalisations. After initial parameter and architecture testing, a full evaluation of tests is presented to compare the denoising autoencoder and denoising convolutional neural network. Additional tests also provide a range of simulated real-world noise conditions with a variety of signal-to-noise ratios (SNRs) for evaluating denoising performance in multiple scenarios. Analysis of results found the denoising autoencoder (DAE) to outperform other methods and had increased accuracy in all conditions when testing on an underlying classifier that has been retrained on the vestigial denoised signal. Tests to evaluate the benefit of augmenting training data were carried out and discovered that augmenting training data for both the denoising autoencoder and convolutional neural network, improved performance and increased detection accuracy for a range of noise types. Furthermore, evaluation using a naturally noisy condition saw an increase in detection accuracy when using a denoising autoencoder, with augmented training and convolutional neural network classifier. This configuration was also timed and deemed capable of running multiple times faster than real-time and likely suitable for deployment on-board an autonomous system

    Representation learning for unsupervised speech processing

    Get PDF
    Automatic speech recognition for our most widely used languages has recently seen substantial improvements, driven by improved training procedures for deep artificial neural networks, cost-effective availability of computational power at large scale, and, crucially, availability of large quantities of labelled training data. This success cannot be transferred to low and zero resource languages where the requisite transcriptions are unavailable. Unsupervised speech processing promises better methods for dealing with under-resourced languages. Here we investigate unsupervised neural network based models for learning frame- and sequence- level representations with the goal of improving zero-resource speech processing. Good representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword or whole-term units for within- and across- speaker speech unit discrimination. We present two contributions focussing on unsupervised learning of frame-level representations: (1) an improved version of the correspondence autoencoder applied to the INTERSPEECH 2015 Zero Resource Challenge, and (2) a proposed model for learning representations that explicitly optimize speech unit discrimination. We also present two contributions focussing on efficiency and scalability of unsupervised speech processing: (1) a proposed model and pilot experiments for learning a linear-time approximation of the quadratic-time dynamic time warping algorithm, and (2) a series of model proposals for learning fixed size representations of variable length speech segments enabling efficient vector space similarity measures

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Denoising Autoencoders and LSTM-Based Artificial Neural Networks Data Processing for Its Application to Internal Model Control in Industrial Environments-The Wastewater Treatment Plant Control Case

    Get PDF
    Altres ajuts: Secretaria d'Universitats i Recerca del Departament d'Empresa i Coneixement de la Generalitat de Catalunya i del Fons Social Europeu (2020 FI_B2 000)The evolution of industry towards the Industry 4.0 paradigm has become a reality where different data-driven methods are adopted to support industrial processes. One of them corresponds to Artificial Neural Networks (ANNs), which are able to model highly complex and non-linear processes. This motivates their adoption as part of new data-driven based control strategies. The ANN-based Internal Model Controller (ANN-based IMC) is an example which takes advantage of the ANNs characteristics by modelling the direct and inverse relationships of the process under control with them. This approach has been implemented in Wastewater Treatment Plants (WWTP), where results show a significant improvement on control performance metrics with respect to (w.r.t.) the WWTP default control strategy. However, this structure is very sensible to non-desired effects in the measurements-when a real scenario showing noise-corrupted data is considered, the control performance drops. To solve this, a new ANN-based IMC approach is designed with a two-fold objective, improve the control performance and denoise the noise-corrupted measurements to reduce the performance degradation. Results show that the proposed structure improves the control metrics, (the Integrated Absolute Error (IAE) and the Integrated Squared Error (ISE)), around a 21.25% and a 54.64%, respectively

    Joint 1D and 2D Neural Networks for Automatic Modulation Recognition

    Get PDF
    The digital communication and radar community has recently manifested more interest in using data-driven approaches for tasks such as modulation recognition, channel estimation and distortion correction. In this research we seek to apply an object detector for parameter estimation to perform waveform separation in the time and frequency domain prior to classification. This enables the full automation of detecting and classifying simultaneously occurring waveforms. We leverage a lD ResNet implemented by O\u27Shea et al. in [1] and the YOLO v3 object detector designed by Redmon et al. in [2]. We conducted an in depth study of the performance of these architectures and integrated the models to perform joint detection and classification. To our knowledge, the present research is the first to study and successfully combine a lD ResNet classifier and Yolo v3 object detector to fully automate the process of AMR for parameter estimation, pulse extraction and waveform classification for non-cooperative scenarios. The overall performance of the joint detector/ classifier is 90 at 10 dB signal to noise ratio for 24 digital and analog modulations
    corecore