1,265 research outputs found

    Model-Based Speech Enhancement

    Get PDF
    Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition

    Get PDF
    Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios

    A study on different linear and non-linear filtering techniques of speech and speech recognition

    Get PDF
    In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes the whole signal redundant.A speech signal is very prominent with acoustical noises like bubblenoise, car noise, street noise etc. So for removing the noises researchershave developed various techniques which are called filtering. Basicallyall the filtering techniques are not suitable for every application,hence based on the type of application some techniques are betterthan the others. Broadly, the filtering techniques can be classifiedinto two categories i.e. linear filtering and non-linear filtering.In this paper a study is presented on some of the filtering techniqueswhich are based on linear and nonlinear approaches. These techniquesincludes different adaptive filtering based on algorithm like LMS,NLMS and RLS etc., Kalman filter, ARMA and NARMA time series applicationfor filtering, neural networks combine with fuzzy i.e. ANFIS. Thispaper also includes the application of various features i.e. MFCC,LPC, PLP and gamma for filtering and recognition

    Noise cancelling in acoustic voice signals with spectral subtraction

    Get PDF
    The main purpose of study throughout this entire End of Degree Project would be the noise removal within speech signals, focusing on the diverse amount of algorithms using the spectral subtraction method. A Matlab application has been designed and created. The application main goal is to remove any meaningless thing considered as a disturb element when trying to perceive a voice; that is, anything considered as a noise. Noise removal is the basis for any voice processing that the user wants to do later, as speech recognition, save the clean audio, voice analysis, etc. A studio on four algorithms has been executed, in order to perform the spectral subtraction: Boll, Berouti, Lockwood & Boudy, and Multiband. This document presents a theoretical study and its implementation. Moreover, in order to have ready for the user a suitable implementation of an application, an intuitive and simple interface has been designed. This document shows how the different algorithms work in some voices and with various types of noise. A few amounts of noises are ideal, used by its mathematical characteristics, while others, are quite common and presented in daily routine, it is presented as for example, the noise of a bus. To apply the method of spectral subtraction is necessary the implementation of a Vocal Activity Detector, able to recognize in which precise moments of the audio there is voice or not. Two types have been studied and implemented: the first one establishes the meaning of voice according to a threshold which is adequate to this record, while the second one is the combination of Zero Crossing Rate and energy. In the end, once the application is implemented, evaluating its performances was the next process, either in an objective and a subjective form. People stand point was considered and asked, in order to obtain the proper functioning of the application along different types of noise, voice, variables, algorithm, etc.Este Trabajo de Fin de Grado, consiste en el estudio de la eliminación de ruido en voces; en concreto en el estudio de distintos algoritmos para el método de la resta espectral. Se ha creado una aplicación en el programa de cálculo Matlab cuyo uso es la eliminación de todo aquello que nos pueda molestar a la hora de escuchar una voz, es decir, lo que se considera ruido. La eliminación de ruido es la base de cualquier tratamiento de voz que se quiera aplicar posteriormente; desde reconocimiento de voz, el análisis de la misma, la conservación de la grabación limpia. etc. Se ha hecho un estudio de cuatro algoritmos para llevar a cabo esta resta espectral: Boll, Berouti, Lockwood & Boudy y Multibanda. En este documento se encuentra tanto un estudio teórico, así como su implementación. Para la implementación de una aplicación que pueda ser usada por un usuario, se ha diseñado una interfaz fácil e intuitiva de usar, en ésta se muestra cómo funcionan los distintos algoritmos en distintas voces y con distintos tipos de ruido, algunos ideales, usados en las medidas oficiales de ruido por sus concretas características matemáticas, y otros, los de la vida cotidiana como el ruido de un autobús. Para aplicar el método de la resta espectral es necesario la implementación de un Detector de Actividad Vocal (VAD) que reconozca en qué momentos del audio hay voz o no. Se han estudiado e implementado dos: Uno de ellos establece qué es voz según un límite adecuado a esa grabación y el otro es la combinación de la Tasa de Cruces por Cero (ZCR) y la energía. Por último, una vez implementada esta aplicación se ha procedido a evaluar su funcionamiento, tanto de una forma objetiva como subjetiva, a través de la escucha de distintas personas, las cuales dan su opinión, para poder obtener el comportamiento de la aplicación con distintos tipos de ruidos, voces, variables, algoritmos, etc.Ingeniería de Sistemas Audiovisuale
    corecore