61 research outputs found

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Transparent heterogeneous terrestrial optical communication networks with phase modulated signals

    Get PDF
    This thesis presents a large scale numerical investigation of heterogeneous terrestrial optical communications systems and the upgrade of fourth generation terrestrial core to metro legacy interconnects to fifth generation transmission system technologies. Retrofitting (without changing infrastructure) is considered for commercial applications. ROADM are crucial enabling components for future core network developments however their re-routing ability means signals can be switched mid-link onto sub-optimally configured paths which raises new challenges in network management. System performance is determined by a trade-off between nonlinear impairments and noise, where the nonlinear signal distortions depend critically on deployed dispersion maps. This thesis presents a comprehensive numerical investigation into the implementation of phase modulated signals in transparent reconfigurable wavelength division multiplexed fibre optic communication terrestrial heterogeneous networks. A key issue during system upgrades is whether differential phase encoded modulation formats are compatible with the cost optimised dispersion schemes employed in current 10 Gb/s systems. We explore how robust transmission is to inevitable variations in the dispersion mapping and how large the margins are when suboptimal dispersion management is applied. We show that a DPSK transmission system is not drastically affected by reconfiguration from periodic dispersion management to lumped dispersion mapping. A novel DPSK dispersion map optimisation methodology which reduces drastically the optimisation parameter space and the many ways to deploy dispersion maps is also presented. This alleviates strenuous computing requirements in optimisation calculations. This thesis provides a very efficient and robust way to identify high performing lumped dispersion compensating schemes for use in heterogeneous RZ-DPSK terrestrial meshed networks with ROADMs. A modified search algorithm which further reduces this number of configuration combinations is also presented. The results of an investigation of the feasibility of detouring signals locally in multi-path heterogeneous ring networks is also presented

    Transparent heterogeneous terrestrial optical communication networks with phase modulated signals

    Get PDF
    This thesis presents a large scale numerical investigation of heterogeneous terrestrial optical communications systems and the upgrade of fourth generation terrestrial core to metro legacy interconnects to fifth generation transmission system technologies. Retrofitting (without changing infrastructure) is considered for commercial applications. ROADM are crucial enabling components for future core network developments however their re-routing ability means signals can be switched mid-link onto sub-optimally configured paths which raises new challenges in network management. System performance is determined by a trade-off between nonlinear impairments and noise, where the nonlinear signal distortions depend critically on deployed dispersion maps. This thesis presents a comprehensive numerical investigation into the implementation of phase modulated signals in transparent reconfigurable wavelength division multiplexed fibre optic communication terrestrial heterogeneous networks. A key issue during system upgrades is whether differential phase encoded modulation formats are compatible with the cost optimised dispersion schemes employed in current 10 Gb/s systems. We explore how robust transmission is to inevitable variations in the dispersion mapping and how large the margins are when suboptimal dispersion management is applied. We show that a DPSK transmission system is not drastically affected by reconfiguration from periodic dispersion management to lumped dispersion mapping. A novel DPSK dispersion map optimisation methodology which reduces drastically the optimisation parameter space and the many ways to deploy dispersion maps is also presented. This alleviates strenuous computing requirements in optimisation calculations. This thesis provides a very efficient and robust way to identify high performing lumped dispersion compensating schemes for use in heterogeneous RZ-DPSK terrestrial meshed networks with ROADMs. A modified search algorithm which further reduces this number of configuration combinations is also presented. The results of an investigation of the feasibility of detouring signals locally in multi-path heterogeneous ring networks is also presented.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Aeronautical Engineering: A continuing bibliography with indexes (supplement 177)

    Get PDF
    This bibliography lists 469 reports, articles and other documents introduced into the NASA scientific and technical information system in July 1984

    Improving the conditioning of the optimization criterion in acoustic multi-channel equalization using shorter reshaping filters

    No full text
    Abstract In acoustic multi-channel equalization techniques, such as complete multi-channel equalization based on the multiple-input/output inverse theorem (MINT), relaxed multi-channel least-squares (RMCLS), and partial multi-channel equalization based on MINT (PMINT), the length of the reshaping filters is generally chosen such that perfect dereverberation can be achieved for perfectly estimated room impulse responses (RIRs). However, since in practice the available RIRs typically differ from the true RIRs, this reshaping filter length may not be optimal. This paper provides a mathematical analysis of the robustness increase of equalization techniques against RIR perturbations when using a shorter reshaping filter length than conventionally used. Based on the condition number of the (weighted) convolution matrix of the RIRs, a mathematical relationship between the reshaping filter length and the robustness against RIR perturbations is established. It is shown that shorter reshaping filters than conventionally used yield a smaller condition number, i.e., a higher robustness against RIR perturbations. In addition, we propose an automatic non-intrusive procedure for determining the reshaping filter length based on the L-curve. Simulation results confirm that using a shorter reshaping filter length than conventionally used yields a significant increase in robustness against RIR perturbations for MINT, RMCLS, and PMINT. Furthermore, it is shown that PMINT using an optimal intrusively determined reshaping filter length outperforms all other considered techniques. Finally, it is shown that the automatic non-intrusively determined reshaping filter length in PMINT yields a similar performance as the optimal intrusively determined reshaping filter length

    Core science and technology development plan for indirect-drive ICF ignition. Revision 1

    Full text link
    • …
    corecore