231 research outputs found

    Coding Strategies for Cochlear Implants Under Adverse Environments

    Get PDF
    Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise

    Linear and nonlinear adaptive filtering and their applications to speech intelligibility enhancement

    Get PDF

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    A Framework for Speech Enhancement with Ad Hoc Microphone Arrays

    Get PDF

    Objective and Subjective Evaluation of Wideband Speech Quality

    Get PDF
    Traditional landline and cellular communications use a bandwidth of 300 - 3400 Hz for transmitting speech. This narrow bandwidth impacts quality, intelligibility and naturalness of transmitted speech. There is an impending change within the telecommunication industry towards using wider bandwidth speech, but the enlarged bandwidth also introduces a few challenges in speech processing. Echo and noise are two challenging issues in wideband telephony, due to increased perceptual sensitivity by users. Subjective and/or objective measurements of speech quality are important in benchmarking speech processing algorithms and evaluating the effect of parameters like noise, echo, and delay in wideband telephony. Subjective measures include ratings of speech quality by listeners, whereas objective measures compute a metric based on the reference and degraded speech samples. While subjective quality ratings are the gold - standard\u27\u27, they are also time- and resource- consuming. An objective metric that correlates highly with subjective data is attractive, as it can act as a substitute for subjective quality scores in gauging the performance of different algorithms and devices. This thesis reports results from a series of experiments on subjective and objective speech quality evaluation for wideband telephony applications. First, a custom wideband noise reduction database was created that contained speech samples corrupted by different background noises at different signal to noise ratios (SNRs) and processed by six different noise reduction algorithms. Comprehensive subjective evaluation of this database revealed an interaction between the algorithm performance, noise type and SNR. Several auditory-based objective metrics such as the Loudness Pattern Distortion (LPD) measure based on the Moore - Glasberg auditory model were evaluated in predicting the subjective scores. In addition, the performance of Bayesian Multivariate Regression Splines(BMLS) was also evaluated in terms of mapping the scores calculated by the objective metrics to the true quality scores. The combination of LPD and BMLS resulted in high correlation with the subjective scores and was used as a substitution for fine - tuning the noise reduction algorithms. Second, the effect of echo and delay on the wideband speech was evaluated in both listening and conversational context, through both subjective and objective measures. A database containing speech samples corrupted by echo with different delay and frequency response characteristics was created, and was later used to collect subjective quality ratings. The LPD - BMLS objective metric was then validated using the subjective scores. Third, to evaluate the effect of echo and delay in conversational context, a realtime simulator was developed. Pairs of subjects conversed over the simulated system and rated the quality of their conversations which were degraded by different amount of echo and delay. The quality scores were analysed and LPD+BMLS combination was found to be effective in predicting subjective impressions of quality for condition-averaged data

    Adaptive post-filtering of speech in mobile communications

    Get PDF
    Puheen ehostusta tarvitaan kohinaisen puheen laadun ja ymmärrettävyyden parantamisessa. Tässä työssä suunniteltiin matkapuhelimiin tarkoitettu jälkisuodatusalgoritmi. Tämän jälkiprosessoinnin tarkoituksena oli korostaa joitakin taajuusalueita puheessa siten, että sen ymmärtäminen olisi edelleen mahdollista hyvin kovassa kohinassa. Jälkiprosessoinnin alussa soinnillisen puhekehyksen formanttitaajuudet haettiin tarkastelemalla sen LP-spektrissä olevia piikkejä. Tämän jälkeen ensimmäistä löydettyä formanttia vaimennettiin ja toista vahvistettiin. Ideana oli siirtää energiaa korkeammille taajuuksille, jossa kohinan energiataso olisi matalampi. Formanttisuotimen kertoimet optimoitiin kuuntelukokeen avulla ja sen mahdollinen kallistus kompensoitiin ensimmäisen asteen alipäästösuotimella. Lopullisen jälkisuotimen suorituskykyä tarkasteltiin sekä tutkimalla sen vaikutusta erilaisiin soinnillisiin äänteisiin että vertailemalla suodinta muihin jälkisuotimiin. Saatujen tulosten perusteella voitiin päätellä, että toteutettu menetelmä toimi halutulla tavalla ja onnistui parantamaan puheen ymmärrettävyyttä. Tarkasteluissa tuli kuitenkin ilmi myös yllättäviä piirteitä, kuten formanttien siirtymisiä, jotka vaativat lisätutkimusta. Verrattuna muihin jälkisuodatussysteemeihin, jotka on suunniteltu toimimaan kovassa kohinassa, työssä kehitetyn algoritmin etuna ovat sen adaptiivisuus ja säädettävyys.Speech enhancement is needed to improve the quality and intelligibility of speech degraded by noise. In this thesis, a post-filtering approach for the mobile communication environment was designed. The purpose of this post-processing scheme was to enhance certain frequency regions of speech, so that when it was degraded with a very high level of noise, the speech could still be understood. The post-processing worked by locating the formants of a voiced speech frame by extracting the peaks of the LP spectrum. After this, the first formant was attenuated and the second one enhanced. The idea was to move energy to higher frequencies where the energy level of the noise was lower. The coefficients of the formant filter were optimized with informal listening tests, and the possible tilt of the filter was compensated with a first order low-pass filter. The performance of the post-processing algorithm was studied by analyzing its effects on different voiced sounds and by comparing the filter to other post-filters. It was concluded that the post-processing worked as intended and improved the intelligibility of speech. Some unexpected behavior, such as shifted formants, was also encountered and needs to be further studied. The advantages of this approach are its more adaptive and tunable structure compared to the other methods used for post-processing in high noise levels

    DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION

    Get PDF
    Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima
    • …
    corecore