886 research outputs found

    Instrumental and perceptual evaluation of dereverberation techniques based on robust acoustic multichannel equalization

    Get PDF
    Speech signals recorded in an enclosed space by microphones at a distance from the speaker are often corrupted by reverberation, which arises from the superposition of many delayed and attenuated copies of the source signal. Because reverberation degrades the signal, removing reverberation would enhance quality. Dereverberation techniques based on acoustic multichannel equalization are known to be sensitive to room impulse response perturbations. In order to increase robustness, several methods have been proposed, as for example, using a shorter reshaping filter length, incorporating regularization, or applying a sparsity-promoting penalty function. This paper focuses on evaluating the performance of these methods for single-source multi-microphone scenarios, using instrumental performance measures as well as using subjective listening tests. By analyzing the correlation between the instrumental and the perceptual results, it is shown that signal-based performance measures are more advantageous than channel-based performance measures to evaluate the perceptual speech quality of signals that were dereverberated by equalization techniques. Furthermore, this analysis also demonstrates the need to develop more reliable instrumental performance measures

    Processamento ótico e digital de sinal em sistemas de transmissão com multiplexagem por divisão espacial

    Get PDF
    The present thesis focuses on the development of optical and digital signal processing techniques for coherent optical transmission systems with spacedivision multiplexing (SDM). According to the levels of spatial crosstalk, these systems can be grouped in the ones with and the ones without spatial selectivity; drastically changing its operation principle. In systems with spatial selectivity, the mode coupling is negligible and therefore, an arbitrary spacial channel can be independently routed through the optical network and post-processed at the optical coherent receiver. In systems without spatial selectivity, mode coupling plays a key role in a way that spatial channels are jointly transmitted and post-processed at the optical coherent receiver. With this in mind, optical switching techniques for SDM transmission systems with spatial selectivity are developed, whereas digital techniques for space-demultiplexing are developed for SDM systems without spatial selectivity. With the purpose of developing switching techniques, the acoustic-optic effect is analyzed in few-mode fibers (FMF)s and in multicore fibers (MCF)s. In FMF, the signal switching between two arbitrary modes using flexural or longitudinal acoustic waves is numerically and experimentally demonstrated. While, in MCF, it is shown that a double resonant coupling, induced by flexural acoustic waves, allows for the signal switching between two arbitrary cores. Still in the context of signal switching, the signal propagation in the multimodal nonlinear regime is analyzed. The nonlinear Schrödinger equation is deduced in the presence of mode coupling, allowing the meticulous analysis of the multimodal process of four-wave mixing. Under the right conditions, it is shown that such process allows for the signal switching between distinguishable optical modes. The signal representation in higher-order Poincaré spheres is introduced and analyzed in order to develop digital signal processing techniques. In this representation, an arbitrary pair of tributary signals is represented in a Poincaré sphere, where the samples appear symmetrically distributed around a symmetry plane. Based on this property, spatial-demultiplexing and mode dependent loss compensation techniques are developed, which are independent of the modulation format, are free of training sequences and tend to be robust to frequency offsets and phase fluctuations. The aforementioned techniques are numerically validated, and its performance is assessed through the calculation of the remaining penalty in the signal-to-noise ratio of the post-processed signal. Finally, the complexity of such techniques is analytically described in terms of real multiplications per sample.A presente tese tem por objectivo o desenvolvimento de técnicas de processamento ótico e digital de sinal para sistemas coerentes de transmissão ótica com multiplexagem por diversidade espacial. De acordo com a magnitude de diafonia espacial, estes sistemas podem ser agrupados em sistemas com e sem seletividade espacial, alterando drasticamente o seu princípio de funcionamento. Em sistemas com seletividade espacial, o acoplamento modal é negligenciável e, portanto, um canal espacial arbitrário pode ser encaminhado de forma independente através da rede ótica e pós-processado no recetor ótico coerente. Em sistemas sem seletividade espacial, o acoplamento modal tem um papel fulcral pelo que os canais espaciais são transmitidos e pós-processados conjuntamente. Perante este cenário, foram desenvolvidas técnicas de comutação entre canais espaciais para sistemas com seletividade espacial, ao passo que para sistemas sem seletividade espacial, foram desenvolvidas técnicas digitais de desmultiplexagem espacial. O efeito acústico-ótico foi analisado em fibras com alguns modos (FMF) e em fibras com múltiplos núcleos (MCF) com o intuito de desenvolver técnicas de comutação de sinal no domínio ótico. Em FMF, demonstrou-se numérica e experimentalmente a comutação do sinal entre dois modos de propagação arbitrários através de ondas acústicas transversais ou longitudinais, enquanto, em MCF, a comutação entre dois núcleos arbitrários é mediada por um processo de acoplamento duplamente ressonante induzido por ondas acústicas transversais. Ainda neste contexto, analisou-se a propagação do sinal no regime multimodal não linear. Foi deduzida a equação não linear de Schrödinger na presença de acoplamento modal, posteriormente usada na análise do processo multimodal de mistura de quatro ondas. Nas condições adequadas, é demonstrado que este processo permite a comutação ótica de sinal entre dois modos de propagação distintos. A representação de sinal em esferas de Poincaré de ordem superior é introduzida e analisada com o objetivo de desenvolver técnicas de processamento digital de sinal. Nesta representação, um par arbitrário de sinais tributários é representado numa esfera de Poincaré onde as amostras surgem simetricamente distribuídas em torno de um plano de simetria. Com base nesta propriedade, foram desenvolvidas técnicas de desmultiplexagem espacial e de compensação das perdas dependentes do modo de propagação, as quais são independentes do formato de modulação, não necessitam de sequências de treino e tendem a ser robustas aos desvios de frequência e às flutuações de fase. As técnicas referidas foram validadas numericamente, e o seu desempenho é avaliado mediante a penalidade remanescente na relação sinal-ruído do sinal pós-processado. Por fim, a complexidade destas é analiticamente descrita em termos de multiplicações reais por amostra.Programa Doutoral em Engenharia Eletrotécnic

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Analysis of and techniques for adaptive equalization for underwater acoustic communication

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution September 2011Underwater wireless communication is quickly becoming a necessity for applications in ocean science, defense, and homeland security. Acoustics remains the only practical means of accomplishing long-range communication in the ocean. The acoustic communication channel is fraught with difficulties including limited available bandwidth, long delay-spread, time-variability, and Doppler spreading. These difficulties reduce the reliability of the communication system and make high data-rate communication challenging. Adaptive decision feedback equalization is a common method to compensate for distortions introduced by the underwater acoustic channel. Limited work has been done thus far to introduce the physics of the underwater channel into improving and better understanding the operation of a decision feedback equalizer. This thesis examines how to use physical models to improve the reliability and reduce the computational complexity of the decision feedback equalizer. The specific topics covered by this work are: how to handle channel estimation errors for the time varying channel, how to use angular constraints imposed by the environment into an array receiver, what happens when there is a mismatch between the true channel order and the estimated channel order, and why there is a performance difference between the direct adaptation and channel estimation based methods for computing the equalizer coefficients. For each of these topics, algorithms are provided that help create a more robust equalizer with lower computational complexity for the underwater channel.This work would not have been possible without support from the O ce of Naval Research, through a Special Research Award in Acoustics Graduate Fellowship (ONR Grant #N00014-09-1-0540), with additional support from ONR Grant #N00014-05- 10085 and ONR Grant #N00014-07-10184

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore