12 research outputs found

    Control of feedback for assistive listening devices

    Get PDF
    Acoustic feedback refers to the undesired acoustic coupling between the loudspeaker and microphone in hearing aids. This feedback channel poses limitations to the normal operation of hearing aids under varying acoustic scenarios. This work makes contributions to improve the performance of adaptive feedback cancellation techniques and speech quality in hearing aids. For this purpose a two microphone approach is proposed and analysed; and probe signal injection methods are also investigated and improved upon

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Adaptive Feedback Cancellation in Hearing Aids

    Get PDF
    Acoustic feedback is a well-known phenomenon in hearing aids and public address systems. Under certain conditions it causes the so-called howling effect, which is highly annoying for the hearing aid user and limits the maximum amplification of the hearing aid. The most common choice to prevent howling is the adaptive feedback cancellation algorithm, which is able to completely eliminate the feedback signal. However, standard adaptive feedback cancellation algorithms suffer from a biased adaptation if the input signal is spectrally colored, as it is for speech and music signals. Due to this bias distortion artifacts (entrainment) are generated and consequently, the sound quality is significantly reduced. Most of the known methods to reduce the bias have focused on speech signals. However, those methods do not cope with music, since the tonality and correlation are much stronger for such signals. This leads to a higher bias and consequently, to stronger entrainment for music than for speech. Other methods, which deal with music signals, work only satisfactorily when using a very slow adaptation speed. This reduces the ability to react fast to feedback path changes. Hence, howling occurs for a longer time when the feedback path is changing. In this thesis, a new sub-band adaptive feedback cancellation system for hearing aid applications is proposed. It combines decorrelation methods with a new realization of a non-parametric variable step size. The adaptation is realized in sub-bands which decreases the computational complexity and increases the adaptation performance of the system simultaneously. The applied decorrelation methods, prediction error filter and frequency shift, are well known approaches to reduce the bias. However, the combination of both is first proposed in this thesis. To apply the proposed step size in the context of adaptive feedback cancellation, a method to estimate the signal power of the desired input signal, i.e., without feedback, also referred to as source signal power is necessary. This estimate is theoretically derived and it is demonstrated that it is a reliabe estimate if the decorrelation methods are additionally applied. In order to further improve the performance of the system three additional control methods are derived: The first one is an impulse detection to detect wideband impulses, which could lead to misadaptation. Secondly, a modified estimate of the source signal power to stabilize the system in case of jarring voices is proposed. Lastly, a correlation detection, which is applied to improve the trade-off between adaptation stability and tracking behavior, is developed. The complete system is optimized and evaluated for several speech and music signals as well as for different feedback scenarios in simulations with feedback paths measured under realistic situations. Additionally, the system is tested by real-time simulations with hearing aid dummies and a torso and head simulator. For both simulation setups hearing loss compensation methods as applied in realistic hearing aids are used. The performance is measured in terms of being able to prevent entrainment (adaptation stability) and reacting to feedback path changes (tracking behavior). The complete adaptive feedback cancellation system shows an excellent performance. Furthermore, the system relies only on few parameters, shows a low computational complexity, and therefore has a strong practical relevance

    Two-way acoustic window using wave field synthesis

    Get PDF
    Tässä diplomityössä esitellään monikanavainen ja kaksisuuntainen audiokommunikaatiojärjestelmä. Sen tavoitteena on luoda kaksisuuntainen akustinen avanne kahden tilan välille ja mahdollistaa tarkka äänilähteiden paikantuminen molemmissa tiloissa. Kun yksikanavainen kommunikaatiojärjestelmä laajennetaan monikanavaiseksi, on myös mahdollista parantaa puheen ymmärrettävyyttä. Toisaalta lisääntynyt kanavamäärä monimutkaistaa akustisen kierron poistamiseen käytettyjä tekniikoita. Tekniikat, jotka tunnetaan kaksikanavaisista järjestelmistä on mahdollista laajentaa myös monikanavaisiin järjestelmiin. Käyttämällä kaiutin- ja mikrofonihiloja on osittain mahdollista äänittää äänikenttä toisaalla ja toistaa se samanlaisena toisessa tilassa. Tämä voidaan toteuttaa tässä työssä käytetyllä menetelmällä, jota kutsutaan äänikenttäsynteesiksi. Akustisen kierron poistamiseksi toteutettiin 48-kanavainen järjestelmä, joka hyödynsi staattisten ja adaptiivisten suodinten yhdistelmää. Järjestelmä osoittautui stabiiliksi ja mahdollisti normaalin keskustelun rakennetun akustisen avanteen läpi. Aaltokenttäsynteesiä verrattiin muihin äänentoisto- ja äänitysjärjestelmiin kuuntelukokeiden avulla. Tulokset osoittavat, että äänikenttäsynteesin ominaisuudet ovat riittävät korkealaatuisen ja monikanavaisen äänikommunikaatiojärjestelmän toteuttamiseksi.In this Master's Thesis a two-way multichannel audio communication system is introduced. The aim is to create a virtual acoustic window between two rooms, providing correct spatial localization of multiple audio sources on both sides. Extending monophonic communication systems to feature multichannel sound capture and reproduction increases the intelligibility of speech and the accuracy of source localization achieved with the system. Adding multiple channels to the system also increases the complexity of the acoustic echo cancellation. Methods known from stereophonic systems extend to multichannel systems. By using arrays of microphones and loudspeakers it becomes possible to try to recreate a part of the acoustic wave field existing in the recording space. A method for achieving this is wave field synthesis (WFS). To solve the acoustic feedback problem, a 48 channel acoustic echo canceller was implemented. To maximize the achieved echo attenuation, a combination of adaptive and static filters were used. The implementation provided a stable solution that made normal conversation through the window possible. To verify the quality of the system, a listening test was performed. In the test, WFS was compared against three other recording and reproduction methods on four different attributes of the perceived sound scape. The results show that WFS offers clear potential to be used in multichannel communication systems and in creation of the acoustic opening

    Acoustic sensor network geometry calibration and applications

    Get PDF
    In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods

    Applications of fuzzy counterpropagation neural networks to non-linear function approximation and background noise elimination

    Get PDF
    An adaptive filter which can operate in an unknown environment by performing a learning mechanism that is suitable for the speech enhancement process. This research develops a novel ANN model which incorporates the fuzzy set approach and which can perform a non-linear function approximation. The model is used as the basic structure of an adaptive filter. The learning capability of ANN is expected to be able to reduce the development time and cost of the designing adaptive filters based on fuzzy set approach. A combination of both techniques may result in a learnable system that can tackle the vagueness problem of a changing environment where the adaptive filter operates. This proposed model is called Fuzzy Counterpropagation Network (Fuzzy CPN). It has fast learning capability and self-growing structure. This model is applied to non-linear function approximation, chaotic time series prediction and background noise elimination
    corecore