130 research outputs found

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Doctor of Philosophy

    Get PDF
    dissertationHearing aids suffer from the problem of acoustic feedback that limits the gain provided by hearing aids. Moreover, the output sound quality of hearing aids may be compromised in the presence of background acoustic noise. Digital hearing aids use advanced signal processing to reduce acoustic feedback and background noise to improve the output sound quality. However, it is known that the output sound quality of digital hearing aids deteriorates as the hearing aid gain is increased. Furthermore, popular subband or transform domain digital signal processing in modern hearing aids introduces analysis-synthesis delays in the forward path. Long forward-path delays are not desirable because the processed sound combines with the unprocessed sound that arrives at the cochlea through the vent and changes the sound quality. In this dissertation, we employ a variable, frequency-dependent gain function that is lower at frequencies of the incoming signal where the information is perceptually insignificant. In addition, the method of this dissertation automatically identifies and suppresses residual acoustical feedback components at frequencies that have the potential to drive the system to instability. The suppressed frequency components are monitored and the suppression is removed when such frequencies no longer pose a threat to drive the hearing aid system into instability. Together, the method of this dissertation provides more stable gain over traditional methods by reducing acoustical coupling between the microphone and the loudspeaker of a hearing aid. In addition, the method of this dissertation performs necessary hearing aid signal processing with low-delay characteristics. The central idea for the low-delay hearing aid signal processing is a spectral gain shaping method (SGSM) that employs parallel parametric equalization (EQ) filters. Parameters of the parametric EQ filters and associated gain values are selected using a least-squares approach to obtain the desired spectral response. Finally, the method of this dissertation switches to a least-squares adaptation scheme with linear complexity at the onset of howling. The method adapts to the altered feedback path quickly and allows the patient to not lose perceivable information. The complexity of the least-squares estimate is reduced by reformulating the least-squares estimate into a Toeplitz system and solving it with a direct Toeplitz solver. The increase in stable gain over traditional methods and the output sound quality were evaluated with psychoacoustic experiments on normal-hearing listeners with speech and music signals. The results indicate that the method of this dissertation provides 8 to 12 dB more hearing aid gain than feedback cancelers with traditional fixed gain functions. Furthermore, experimental results obtained with real world hearing aid gain profiles indicate that the method of this dissertation provides less distortion in the output sound quality than classical feedback cancelers, enabling the use of more comfortable style hearing aids for patients with moderate to profound hearing loss. Extensive MATLAB simulations and subjective evaluations of the results indicate that the method of this dissertation exhibits much smaller forward-path delays with superior howling suppression capability

    Adaptiiviset läpikuuluvuuskuulokkeet

    Get PDF
    Hear-through equalization can be used to make a headset acoustically transparent, i.e.~to produce sound perception that is similar to perception without the headset. The headset must have microphones outside the earpieces to capture the ambient sounds, which is then reproduced with the headset transducers after the equalization. The reproduced signal is called the hear-through signal. Equalization is needed, since the headset affects the acoustics of the outer ear. \\ In addition to the external microphones, the headset used in this study has additional internal microphones. Together these microphones can be used to estimate the attenuation of the headset online and to detect poor fit. Since the poor fit causes leaks and decreased attenuation, the combined effect of the leaked sound and the hear-through signal changes, when compared to proper fit situation. Therefore, the isolation estimate is used to control the hear-through equalization in order to produce better acoustical transparency. Furthermore, the proposed adaptive hear-through algorithm includes manual controls for the equalizers and the volume of the hear-through signal. \\ The proposed algorithm is found to transform the used headset acoustically transparent. The equalization controls improve the performance of the headset, when the fit is poor or when the volume of the hear-through signal is adjusted, by reducing the comb-filtering effect due to the summation of the leaked sound and the hear-through signal inside the ear canal. The behavior of the proposed algorithm can be demonstrated with an implemented Matlab simulator.Läpikuuluvuusekvalisoinnilla voidaan saavuttaa akustinen läpinäkyvyys kuulokkeita käytettäessä, eli tuottaa samankaltainen ääniaistimus kuin mikä havaittaisiin ilman kuulokkeita. Käytetyissä kuulokkeissa tulee olla mikrofonit kuulokkeen ulkopinnalla, joiden avulla voidaan tallentaa ympäröiviä ääniä. Mikrofonisignaalit ekvalisoidaan, jolloin niistä tulee läpikuuluvuussignaalit, ja toistetaan kuulokkeista. Ekvalisointi on tarpeellista, sillä kuulokkeet muuttavat ulkokorvan akustiikka ja siten myös äänihavaintoa. \\ Tässä diplomityössä käytetyssä prototyyppikuulokeparissa on edellä mainittujen mikrofonien lisäksi myös toiset, korvakäytävän sisälle asettuvat mikrofonit. Yhdessä näiden kahden mikrofonin avulla voidaan määrittää reaaliaikainen estimaatti kuulokkeen vaimennukselle ja tunnistaa huono istuvuus. Koska huonosti asetettu kuuloke vuotaa enemmän ääntä korvakäytävän sisään kuin kunnolla asetettu, kuulokkeen äänen ja vuotavan äänen yhteisvaikutus muuttuu. Tästä syystä vaimennusestimaattia käytetään läpikuuluvuusekvalisoinnin säätöön, jotta akustinen läpinäkyvyys ei kärsisi. Lisäksi esitellyssä algoritmissa on manuaaliset säädöt ekvalisaattoreille ja läpikuuluvuussignaalin voimakkuudelle.\\ Esitetyn algoritmin havaitaan tuottavan akustinen läpinäkyvyys, kun sitä käytetään prototyyppikuulokkeiden kanssa. Ekvalisointisäädöt parantavat kuulokkeiden toimintaa istuvuuden ollessa huono tai säädettäessä läpikuuluvuussignaalin voimakkuutta, koska ne vähentävät kampasuodatusefektiä, joka voi aiheutua vuotavan äänen ja läpikuuluvuussignaalin summautuessa. Esitellyn algoritmin toimintaa voidaan havainnollistaa toteutetulla Matlab-simulaattorilla

    Spatial hearing rendering in wireless microphone systems for binaural hearing aids

    Get PDF
    In 2015, 360 million people, including 32 million children, were suffering from hearing impairment all over the world. This makes hearing disability a major worldwide issue. In the US, the prevalence of hearing loss increased by 160% over the past generations. However, 72% of the 34 million impaired American persons (11% of the population) still have an untreated hearing loss. Among the various current solutions alleviating hearing disability, hearing aid is the only non-invasive and the most widespread medical apparatus. Combined with hearing aids, assisting listening devices are a powerful answer to address the degraded speech understanding observed in hearing-impaired subjects, especially in noisy and reverberant environments. Unfortunately, the conventional devices do not accurately render the spatial hearing property of the human auditory system, weakening their benefits. Spatial hearing is an attribute of the auditory system relying on binaural hearing. With 2 ears, human beings are able to localize sounds in space, to get information about the acoustic surroundings, to feel immersed in environments... Furthermore, it strongly contributes to speech intelligibility. It is hypothesized that recreating an artificial spatial perception through the hearing aids of impaired people might allow for recovering a part of these subjects' hearing performance. This thesis investigates and supports the aforementioned hypothesis with both technological and clinical approaches. It reveals how certain well-established signal processing methods can be integrated in some assisting listening devices. These techniques are related to sound localization and spatialization. Taking into consideration the technical constraints of current hearing aids, as well as the characteristics of the impaired auditory system, the thesis proposes a novel solution to restore a spatial perception for users of certain types of assisting listening devices. The achieved results demonstrate the feasibility and the possible implementation of such a functionality on conventional systems. Additionally, this thesis examines the relevance and the efficiency of the proposed spatialization feature towards the enhancement of speech perception. Via a clinical trial involving a large number of patients, the artificial spatial hearing shows to be well appreciated by disabled persons, while improving or preserving their current hearing abilities. This can be considered as a prominent contribution to the current scientific and technological knowledge in the domain of hearing impairment

    Investigation into digital audio equaliser systems and the effects of arithmetic and transform errors on performance

    Get PDF
    Merged with duplicate record 10026.1/2685 on 07.20.2017 by CS (TIS)Discrete-time audio equalisers introduce a variety of undesirable artefacts into audio mixing systems, namely, distortions caused by finite wordlength constraints, frequency response distortion due to coefficient calculation and signal disturbances that arise from real-time coefficient update. An understanding of these artefacts is important in the design of computationally affordable, good quality equalisers. A detailed investigation into these artefacts using various forms of arithmetic, filter frequency response, input excitation and sampling frequencies is described in this thesis. Novel coefficient calculation techniques, based on the matched z-transform (MZT) were developed to minimise filter response distortion and computation for on-line implementation. It was found that MZT-based filter responses can approximate more closely to s-plane filters, than BZTbased filters, with an affordable increase in computation load. Frequency response distortions and prewarping/correction schemes at higher sampling frequencies (96 and 192 kHz) were also assessed. An environment for emulating fractional quantisation in fixed and floating point arithmetic was developed. Various key filter topologies were emulated in fixed and floating point arithmetic using various input stimuli and frequency responses. The work provides detailed objective information and an understanding of the behaviour of key topologies in fixed and floating point arithmetic and the effects of input excitation and sampling frequency. Signal disturbance behaviour in key filter topologies during coefficient update was investigated through the implementation of various coefficient update scenarios. Input stimuli and specific frequency response changes that produce worst-case disturbances were identified, providing an analytical understanding of disturbance behaviour in various topologies. Existing parameter and coefficient interpolation algorithms were implemented and assessed under fihite wordlength arithmetic. The disturbance behaviour of various topologies at higher sampling frequencies was examined. The work contributes to the understanding of artefacts in audio equaliser implementation. The study of artefacts at the sampling frequencies of 48,96 and 192 kHz has implications in the assessment of equaliser performance at higher sampling frequencies.Allen & Heath Limite
    • …
    corecore