8 research outputs found

    Traitement paramétrique des signaux audio dans le contexte des prothèses auditives

    Get PDF
    Modèle à moyenne mobile > -- Modèle autorégressif > -- Modèle autorégressif à moyenne mobile > -- Remarque sur le lien entre AR, MA et ARMA -- Evaluation des paramètres d'un processus AR(p) -- Critères de sélection de l'ordre d'un modèle AR(p) -- Notion d'enveloppe spectrale -- Méthodes élaborées dans le domaine fréquentiel -- Méthodes élaborées dans le domaine de corrélation -- Réduction de bruit dans le domaine fréquentiel -- A two-microphone algorithm for speech enhancement -- State of the art -- Zelinski's approach in the case of two-microphone arrangement -- Two-microphone speech enhancement system -- Performance evaluation and results -- Réduction de bruit dans le domaine de corrélation -- Estimation de la puissance du bruit -- Compensation des effets du bruit -- Amélioration de la procédure de compensation -- Perspectives de développement -- Traitement paramétrique en présence de bruit -- Disposition du traitement combiné -- Amélioration de la précision de l'estimateur de variance du bruit

    Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

    Get PDF
    This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.Published versio

    Noise Reduction with Microphone Arrays for Speaker Identification

    Get PDF
    The presence of acoustic noise in audio recordings is an ongoing issue that plagues many applications. This ambient background noise is difficult to reduce due to its unpredictable nature. Many single channel noise reduction techniques exist but are limited in that they may distort the desired speech signal due to overlapping spectral content of the speech and noise. It is therefore of interest to investigate the use of multichannel noise reduction algorithms to further attenuate noise while attempting to preserve the speech signal of interest. Specifically, this thesis looks to investigate the use of microphone arrays in conjunction with multichannel noise reduction algorithms to aid aiding in speaker identification. Recording a speaker in the presence of acoustic background noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the noise environment where the speech sample is taken, noise reduction algorithms must be developed and applied to clean the speech signal in order to give speaker identification software a chance at a positive identification. Due to the limitations of single channel techniques, it is of interest to see if spatial information provided by microphone arrays can be exploited to aid in speaker identification. This thesis provides an exploration of several time domain multichannel noise reduction techniques including delay sum beamforming, multi-channel Wiener filtering, and Spatial-Temporal Prediction filtering. Each algorithm is prototyped and filter performance is evaluated using various simulations and experiments. A three-dimensional noise model is developed to simulate and compare the performance of the above methods and experimental results of three data collections are presented and analyzed. The algorithms are compared and recommendations are given for the use of each technique. Finally, ideas for future work are discussed to improve performance and implementation of these multichannel algorithms. Possible applications for this technology include audio surveillance, identity verification, video chatting, conference calling and sound source localization

    Virtual Instrumentation for Speech Signal Processing in SMART Technology and Industry 4.0

    Get PDF
    Tato diplomová práce se zabývá automatickým rozpoznáváním řeči v oblasti Průmyslu 4.0 a SMART technologií pro následné testování vybraných filtračních metod. Nejprve se práce věnuje rešerši zabývající se uplatněním hlasového ovládání v konceptu Průmyslu 4.0 a SMART technologií. Dále se zabývá metodami automatického rozpoznávání řeči a filtračních metod. Primárně se tato práce zaměřuje na hlasové ovládání a filtraci rušení pomocí adaptivního algoritmu LMS a analýzy nezávislých komponent (ICA). V této práci byla realizována softwarová aplikace pro vytvoření databáze nahrávek rušení. Na základě těchto nahrávek byli realizovány tři vizualizace pro testování vybraných metod. Úspěšnost rozpoznání je vyhodnocena dle stavu rozpoznal/nerozpoznal, kdy každý příkaz byl 100x vysloven.This thesis deals with automatic speech recognition in Industry 4.0 and SMART technology for subsequent testing of selected filtration methods. Firstly, the thesis deals with the search of voice control in the Industry 4.0 and SMART technologies. It also deals with methods of automatic speech recognition and filtering methods. Primarily, this work focuses on voice control and interference filtering using adaptive LMS and Independent Component Analysis (ICA). In this work, a software application was created to create a jam recording database. Based on these recordings, three visualizations were made to test selected methods. Recognition success is evaluated by state recognized / not recognized when each command was 100x pronounced.450 - Katedra kybernetiky a biomedicínského inženýrstvívýborn
    corecore