96 research outputs found

    Techniques for the enhancement of linear predictive speech coding in adverse conditions

    Get PDF

    Noise-Robust Voice Conversion

    Get PDF
    A persistent challenge in speech processing is the presence of noise that reduces the quality of speech signals. Whether natural speech is used as input or speech is the desirable output to be synthesized, noise degrades the performance of these systems and causes output speech to be unnatural. Speech enhancement deals with such a problem, typically seeking to improve the input speech or post-processes the (re)synthesized speech. An intriguing complement to post-processing speech signals is voice conversion, in which speech by one person (source speaker) is made to sound as if spoken by a different person (target speaker). Traditionally, the majority of speech enhancement and voice conversion methods rely on parametric modeling of speech. A promising complement to parametric models is an inventory-based approach, which is the focus of this work. In inventory-based speech systems, one records an inventory of clean speech signals as a reference. Noisy speech (in the case of enhancement) or target speech (in the case of conversion) can then be replaced by the best-matching clean speech in the inventory, which is found via a correlation search method. Such an approach has the potential to alleviate intelligibility and unnaturalness issues often encountered by parametric modeling speech processing systems. This work investigates and compares inventory-based speech enhancement methods with conventional ones. In addition, the inventory search method is applied to estimate source speaker characteristics for voice conversion in noisy environments. Two noisy-environment voice conversion systems were constructed for a comparative study: a direct voice conversion system and an inventory-based voice conversion system, both with limited noise filtering at the front end. Results from this work suggest that the inventory method offers encouraging improvements over the direct conversion method

    Theory, design and application of gradient adaptive lattice filters

    Get PDF
    SIGLELD:D48933/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Cue estimation for vowel perception prediction in low signal-to-noise ratios

    Get PDF
    This study investigates the signal processing required in order to allow for the evaluation of hearing perception prediction models at low signal-to-noise Ratios (SNR). It focusses on speech enhancement and the estimation of the cues from which speech may be recognized, specifically where these cues are estimated from severely degraded speech (SNR ranging from -10 dB to -3 dB). This research has application in the field of cochlear implants (CI), where a listener would hear degraded speech due to several distortions introduced by the biophysical interface (e.g. frequency and amplitude discretization). These difficulties can also be interpreted as a loss in signal quality due to a specific type of noise. The ability to investigate perception in low SNR conditions may have application in the development of CI signal processing algorithms to counter the effects of noise. In the military domain a speech signal may be degraded intentionally by enemy forces or unintentionally owing to engine noise, for example. The ability to analyse and predict perception can be used for algorithm development to counter the unintentional or intentional interference or to predict perception degradation if low SNR conditions cannot be avoided. A previously documented perception model (Svirsky, 2000) is used to illustrate that the proposed signal processing steps can indeed be used to estimate the various cues used by the perception model at SNRs successfully as low as -10 dB. AFRIKAANS : Hierdie studie ondersoek die seinprosessering wat nodig is om ’n gehoorpersepsievoorspellingmodel te evalueer by lae sein-tot-ruis-verhoudings. Hierdie studie fokus op spraakverbetering en die estimasie van spraakeienskappe wat gebruik kan word tydens spraakherkenning, spesifiek waar hierdie eienskappe beraam word vir ernstig gedegradeerde spraak (sein-tot-ruisverhoudings van -10 dB tot -3 dB). Hierdie navorsing is van toepassing in die veld van kogleêre inplantings, waar die luisteraar degradering van spraak ervaar weens die bio-fisiese koppelvlak (bv. diskrete frekwensie en amplitude). Hierdie degradering kan gesien word as ’n verlies aan seinkwaliteit weens ’n spesifieke tipe ruis. Die vermoë om persepsie te ondersoek by lae sein-tot-ruis kan toegepas word tydens die ontwikkeling van kogleêre inplantingseinprosesseringalgoritmes om die effekte van ruis teen te werk. In die militêre omgewing kan spraak deur vyandige magte gedegradeer word, of degradering van spraak kan plaasvind as gevolg van bv. enjingeraas. Die vermoë om persepsie te ondersoek en te voorspel in die teenwoordigheid van ruis kan gebruik word vir algoritme-ontwikkeling om die ruis teen te werk of om die verlies aan persepsie te voorspel waar lae sein-tot-ruis verhoudings nie vermy kan word nie. ’n Voorheen gedokumenteerde persepsiemodel (Svirsky, 2000) word gebruik om te demonstreer dat die voorgestelde seinprosesseringstappe wel suksesvol gebruik kan word om die spraakeienskappe te beraam wat deur die persepsiemodel benodig word by sein-tot-ruis verhouding so laag as -10 dB. CopyrightDissertation (MEng)--University of Pretoria, 2009.Electrical, Electronic and Computer Engineeringunrestricte

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    Speech assessment and characterization for law enforcement applications

    No full text
    Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by one or more effects including, for example, additive noise. These degradations alter the signal properties in a manner that deteriorates the intelligibility or quality of the speech signal. In the law enforcement context such degradations are commonplace due to the limitations in the audio collection methodology, which is often required to be covert. In severe degradation conditions, the acquired signal may become unintelligible, losing its value in an investigation and in less severe conditions, a loss in signal quality may be encountered, which can lead to higher transcription time and cost. This thesis proposes a non-intrusive speech assessment framework from which algorithms for speech quality and intelligibility assessment are derived, to guide the collection and transcription of law enforcement audio. These methods are trained on a large database labelled using intrusive techniques (whose performance is verified with subjective scores) and shown to perform favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive CODEC identification and verification algorithm is developed which can identify a CODEC with an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 % in the presence of additive noise. Finally, the speech description taxonomy framework is developed, with the aim of characterizing various aspects of a degraded speech signal, including the mechanism that results in a signal with particular characteristics, the vocabulary that can be used to describe those degradations and the measurable signal properties that can characterize the degradations. The taxonomy is implemented as a relational database that facilitates the modeling of the relationships between various attributes of a signal and promises to be a useful tool for training and guiding audio analysts

    Robust speech recognition under noisy environments.

    Get PDF
    Lee Siu Wa.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 116-121).Abstracts in English and Chinese.Abstract --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- An Overview on Automatic Speech Recognition --- p.2Chapter 1.2 --- Thesis Outline --- p.6Chapter 2 --- Baseline Speech Recognition System --- p.8Chapter 2.1 --- Baseline Speech Recognition Framework --- p.8Chapter 2.2 --- Acoustic Feature Extraction --- p.11Chapter 2.2.1 --- Speech Production and Source-Filter Model --- p.12Chapter 2.2.2 --- Review of Feature Representations --- p.14Chapter 2.2.3 --- Mel-frequency Cepstral Coefficients --- p.20Chapter 2.2.4 --- Energy and Dynamic Features --- p.24Chapter 2.3 --- Back-end Decoder --- p.26Chapter 2.4 --- English Digit String Corpus ´ؤ AURORA2 --- p.28Chapter 2.5 --- Baseline Recognition Experiment --- p.31Chapter 3 --- A Simple Recognition Framework with Model Selection --- p.34Chapter 3.1 --- Mismatch between Training and Testing Conditions --- p.34Chapter 3.2 --- Matched Training and Testing Conditions --- p.38Chapter 3.2.1 --- Noise type-Matching --- p.38Chapter 3.2.2 --- SNR-Matching --- p.43Chapter 3.2.3 --- Noise Type and SNR-Matching --- p.44Chapter 3.3 --- Recognition Framework with Model Selection --- p.48Chapter 4 --- Noise Spectral Estimation --- p.53Chapter 4.1 --- Introduction to Statistical Estimation Methods --- p.53Chapter 4.1.1 --- Conventional Estimation Methods --- p.54Chapter 4.1.2 --- Histogram Technique --- p.55Chapter 4.2 --- Quantile-based Noise Estimation (QBNE) --- p.57Chapter 4.2.1 --- Overview of Quantile-based Noise Estimation (QBNE) --- p.58Chapter 4.2.2 --- Time-Frequency Quantile-based Noise Estimation (T-F QBNE) --- p.62Chapter 4.2.3 --- Mainlobe-Resilient Time-Frequency Quantile-based Noise Estimation (M-R T-F QBNE) --- p.65Chapter 4.3 --- Estimation Performance Analysis --- p.72Chapter 4.4 --- Recognition Experiment with Model Selection --- p.74Chapter 5 --- Feature Compensation: Algorithm and Experiment --- p.81Chapter 5.1 --- Feature Deviation from Clean Speech --- p.81Chapter 5.1.1 --- Deviation in MFCC Features --- p.82Chapter 5.1.2 --- Implications for Feature Compensation --- p.84Chapter 5.2 --- Overview of Conventional Compensation Methods --- p.86Chapter 5.3 --- Feature Compensation by In-phase Feature Induction --- p.94Chapter 5.3.1 --- Motivation --- p.94Chapter 5.3.2 --- Methodology --- p.97Chapter 5.4 --- Compensation Framework for Magnitude Spectrum and Segmen- tal Energy --- p.102Chapter 5.5 --- Recognition -Experiments --- p.103Chapter 6 --- Conclusions --- p.112Chapter 6.1 --- Summary and Discussions --- p.112Chapter 6.2 --- Future Directions --- p.114Bibliography --- p.11
    corecore