89 research outputs found

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation

    DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION

    Get PDF
    Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima

    Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired

    No full text
    Although there are numerous single-channel noise reduction strategies to improve speech perception in a noisy environment, most of them can only improve speech quality but not improve speech intelligibility for normal hearing (NH) or hearing impaired (HI) listeners. Exceptions that can improve speech intelligibility currently are only those that require a priori statistics of speech or noise. Most of the noise reduction algorithms in hearing aids are adopted directly from the algorithms for NH listeners without taking into account of the hearing loss factors within HI listeners. HI listeners suffer more in speech intelligibility than NH listeners in the same noisy environment. Further study of monaural noise reduction algorithms for HI listeners is required.The motivation is to adapt a model-based approach in contrast to the conventional Wiener filtering approach. The model-based algorithm called sparse coding shrinkage (SCS) was proposed to extract key speech information from noisy speech. The SCS algorithm was evaluated by comparison with another state-of-the-art Wiener filtering approach through speech intelligibility and quality tests using 9 NH and 9 HI listeners. The SCS algorithm matched the performance of the Wiener filtering algorithm in speech intelligibility and speech quality. Both algorithms showed some intelligibility improvements for HI listeners but not at all for NH listeners. The algorithms improved speech quality for both HI and NH listeners.Additionally, a physiologically-inspired hearing loss simulation (HLS) model was developed to characterize hearing loss factors and simulate hearing loss consequences. A methodology was proposed to evaluate signal processing strategies for HI listeners with the proposed HLS model and NH subjects. The corresponding experiment was performed by asking NH subjects to listen to unprocessed/enhanced speech with the HLS model. Some of the effects of the algorithms seen in HI listeners are reproduced, at least qualitatively, by using the HLS model with NH listeners.Conclusions: The model-based algorithm SCS is promising for improving performance in stationary noise although no clear difference was seen in the performance of SCS and a competitive Wiener filtering algorithm. Fluctuating noise is more difficult to reduce compared to stationary noise. Noise reduction algorithms may perform better at higher input signal-to-noise ratios (SNRs) where HI listeners can get benefit but where NH listeners already reach ceiling performance. The proposed HLS model can save time and cost when evaluating noise reduction algorithms for HI listeners

    The Effect of Coordinated Movement on Infant Vocalizations

    Get PDF

    Learning-Based Reference-Free Speech Quality Assessment for Normal Hearing and Hearing Impaired Applications

    Get PDF
    Accurate speech quality measures are highly attractive and beneficial in the design, fine-tuning, and benchmarking of speech processing algorithms, devices, and communication systems. Switching from narrowband telecommunication to wideband telephony is a change within the telecommunication industry which provides users with better speech quality experience but introduces a number of challenges in speech processing. Noise is the most common distortion on audio signals and as a result there have been a lot of studies on developing high performance noise reduction algorithms. Assistive hearing devices are designed to decrease communication difficulties for people with loss of hearing. As the algorithms within these devices become more advanced, it becomes increasingly crucial to develop accurate and robust quality metrics to assess their performance. Objective speech quality measurements are more attractive compared to subjective assessments as they are cost-effective and subjective variability is eliminated. Although there has been extensive research on objective speech quality evaluation for narrowband speech, those methods are unsuitable for wideband telephony. In the case of hearing-impaired applications, objective quality assessment is challenging as it has to be capable of distinguishing between desired modifications which make signals audible and undesired artifacts. In this thesis a model is proposed that allows extracting two sets of features from the distorted signal only. This approach which is called reference-free (nonintrusive) assessment is attractive as it does not need access to the reference signal. Although this benefit makes nonintrusive assessments suitable for real-time applications, more features need to be extracted and smartly combined to provide comparable accuracy as intrusive metrics. Two feature vectors are proposed to extract information from distorted signals and their performance is examined in three studies. In the first study, both feature vectors are trained on various portions of a noise reduction database for normal hearing applications. In the second study, the same investigation is performed on two sets of databases acquired through several hearing aids. Third study examined the generalizability of the proposed metrics on benchmarking four wireless remote microphones in a variety of environmental conditions. Machine learning techniques are deployed for training the models in the three studies. The studies show that one of the feature sets is robust when trained on different portions of the data from different databases and it also provides good quality prediction accuracy for both normal hearing and hearing-impaired applications

    Speech assessment and characterization for law enforcement applications

    No full text
    Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by one or more effects including, for example, additive noise. These degradations alter the signal properties in a manner that deteriorates the intelligibility or quality of the speech signal. In the law enforcement context such degradations are commonplace due to the limitations in the audio collection methodology, which is often required to be covert. In severe degradation conditions, the acquired signal may become unintelligible, losing its value in an investigation and in less severe conditions, a loss in signal quality may be encountered, which can lead to higher transcription time and cost. This thesis proposes a non-intrusive speech assessment framework from which algorithms for speech quality and intelligibility assessment are derived, to guide the collection and transcription of law enforcement audio. These methods are trained on a large database labelled using intrusive techniques (whose performance is verified with subjective scores) and shown to perform favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive CODEC identification and verification algorithm is developed which can identify a CODEC with an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 % in the presence of additive noise. Finally, the speech description taxonomy framework is developed, with the aim of characterizing various aspects of a degraded speech signal, including the mechanism that results in a signal with particular characteristics, the vocabulary that can be used to describe those degradations and the measurable signal properties that can characterize the degradations. The taxonomy is implemented as a relational database that facilitates the modeling of the relationships between various attributes of a signal and promises to be a useful tool for training and guiding audio analysts

    The role of sound offsets in auditory temporal processing and perception

    Get PDF
    Sound-offset responses are distinct to sound onsets in their underlying neural mechanisms, temporal processing pathways and roles in auditory perception following recent neurobiological studies. In this work, I investigate the role of sound offsets and the effect of reduced sensitivity to offsets on auditory perception in humans. The implications of a 'sound-offset deficit' for speech-in-noise perception are investigated, based on a mathematical model with biological significance and independent channels for onset and offset detection. Sound offsets are important in recognising, distinguishing and grouping sounds. They are also likely to play a role in perceiving consonants that lie in the troughs of amplitude fluctuations in speech. The offset influence on the discriminability of model outputs for 48 non-sense vowel-consonant-vowel (VCV) speech stimuli in varying levels of multi-talker babble noise (-12, -6, 0, 6, 12 dB SNR) was assessed, and led to predictions that correspond to known phonetic categories. This work therefore suggests that variability in the offset salience alone can explain the rank order of consonants most affected in noisy situations. A novel psychophysical test battery for offset sensitivity was devised and assessed, followed by a study to find an electrophysiological correlate. The findings suggest that individual differences in sound-offset sensitivity may be a factor contributing to inter-subject variation in speech-in-noise discrimination ability. The promising measures from these results can be used to test between-population differences in offset sensitivity, with more support for objective than psychophysical measures. In the electrophysiological study, offset responses in a duration discrimination paradigm were found to be modulated by attention compared to onset responses. Overall, this thesis shows for the first time that the onset-offset dichotomy in the auditory system, previously explored in physiological studies, is also evident in human studies for both simple and complex speech sounds

    Speech enhancement in binaural hearing protection devices

    Get PDF
    The capability of people to operate safely and effective under extreme noise conditions is dependent on their accesses to adequate voice communication while using hearing protection. This thesis develops speech enhancement algorithms that can be implemented in binaural hearing protection devices to improve communication and situation awareness in the workplace. The developed algorithms which emphasize low computational complexity, come with the capability to suppress noise while enhancing speech
    corecore