1,148 research outputs found

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Non Linear Ultrasound Doppler and the Detection of Targeted Contrast Agents

    No full text
    One of the main challenges in molecular imaging with targeted contrast agents is the detection and discrimination of attached agents from the rest of the signals originating from freely flowing agents and tissue. The aim of this thesis was to develop methods for the detection of targeted microbubbles. One approach consisted of investigating the use of nonlinear Doppler for this purpose. Nonlinear Doppler enables the differentiation of moving from non-moving and linear from nonlinear scattering. Targeted microbubbles are static and nonlinear scatterers and they should be detected using this technique. A novel nonlinear Doppler technique: Pulse subtraction Doppler, was developed and compared to pulse inversion Doppler. It is shown that both techniques lead to similar Doppler spectra and depending on the medical applications and the equipment limitations, both techniques have benefits. This served as a starting point for the derivation of a generalised nonlinear Doppler technique, based on combined linear pulse pair sequences and tested in a simulation study. The response from a single microbubble was simulated for different pulse combinations and the pulse sequences were compared with regards to criteria specific to imaging requirements. It was shown that depending on initially set criteria, such as transmitted energy, mechanical index or scanner characteristics, certain pulse combinations offer alternatives to the current imaging modalities and allow to take into account specific constrains due to the targeted application/equipment. Furthermore, the proposed approach is directly applicable in a strict non linear imaging approach, without Doppler processing. An in vitro phantom was designed in order to assess pulse subtraction Doppler for the detection and discrimination of static nonlinear microbubbles in the presence of free flowing ones. It was shown that pulse subtraction Doppler enables such discrimination and the practicability for in vivo situations is discussed. The pulse subtraction Doppler sequences were also tested on a phantom containing magnetic bubbles. It was shown that the magnetic bubbles can be immobilised through a magnetic field to a specific region of interest under flow conditions. The bubbles also showed to be acoustically detectable and to scatter linearly at diagnostic driving pressures. Preliminary work regarding experimental biotinylated microbubbles and their attachment to streptavidin coated surfaces is also presented. Due to their proximity to a wall, researchers have found that targeted microbubbles exhibit different acoustic signatures compared to free ones and this knowledge can improve their detection techniques. The behaviour of microbubbles against a membrane of varying stiffness was also studied through high speed camera observations. It was found both experimentally and by comparison to theoretical modelling that within the stiffness range of human blood vessels the change in acoustical behaviour of microbubbles is negligible. This thesis has taken two complementary research approaches which have shown to constitute advancements for the detection and discrimination of targeted microbubbles

    Acquisition strategies for fat/water separated MRI

    Get PDF
    This thesis focuses on new ways to more efficiently acquire the signal for fat/water separated MRI, also known as Dixon methods. In paper I, the concept of dual bandwidths was introduced to improve SNR efficiency by removing dead times in a spin echo PROPELLER sequence. By correcting for the displacement of fat, we were able to improve the motion correction. This required additional considerations during reconstruction in order to avoid noise amplification, which was solved with a noise-whitening Tikhonov regularization. Paper II explores the combination of fat/water separation in k-space with partially acquired data, i.e. partial Fourier sampling. With reduced sampling coverage comes the ability of increased spatial resolution, which is often limited in fat/water imaging, particularly in gradient echo sequences. A modified POCS routine was also developed with real-valued estimates, exploiting Hermitian symmetry to improve the inverse problem conditioning in the fully sampled region. A single-TR dual-bandwidth RARE (fast/turbo spin echo) sequence without dead times was developed in Paper III, which uses partial Fourier sampling with late and early echoes to improve the chemical shift encoding. The proposed sequence can acquire images with 0.5 mm in-plane resolution without dead times, with image quality exceeding current state-of-the-art techniques. An automated selection of gradient waveforms based on Cramér-Rao bounds was implemented on the scanner. In Paper IV, the dual-bandwidth concept was generalized to continuous bandwidths. Instead of the conventional shift of a trapezoidal readout gradient, we describe a new method of encoding chemical shift by using asymmetric readout waveforms. Asymmetric readouts were implemented in a RARE sequence to completely remove dead times from multi-TR acquisitions, with typical scan time reductions of 25 %. The developed methods enable fat/water imaging with reduced scan times and increased spatial resolution, which has previously limited their use

    Novel MRI Technologies for Structural and Functional Imaging of Tissues with Ultra-short T₂ Values

    Get PDF
    Conventional MRI has several limitations such as long scan durations, motion artifacts, very loud acoustic noise, signal loss due to short relaxation times, and RF induced heating of electrically conducting objects. The goals of this work are to evaluate and improve the state-of-the-art methods for MRI of tissue with short T₂, to prove the feasibility of in vivo Concurrent Excitation and Acquisition, and to introduce simultaneous electroglottography measurement during functional lung MRI

    Advanced detection strategies for ultrasound contrast agents

    Get PDF
    __Abstract__ Ultrasound contrast agent was discovered serendipitously by Gramiak and Shah in I968 when they injected indocyanine green dye into the heart and observed increased echogenicity of the blood containing the dye. Small cavitation bubbles that were formed upon injection of the dye were traced to be the source of the enhanced echoes. Nowadays, ultrasound contrast agent still consists of small bubbles that are free flowing in the blood stream. However, as the uncontrolled process of cavitation and violent collapse is considered harmful for cells and tissue, contrast agent is usually prepared under controlled conditions outside the body and injected through a vein where they are taken up into the blood stream and transported to the region under investigation

    Metabolite Mapping with Extended Brain Coverage Using a Fast Multisection MRSI Pulse Sequence and a Multichannel Coil

    Get PDF
    Multisection magnetic resonance spectroscopic imaging is a widely used pulse sequence that has distinct advantages over other spectroscopic imaging sequences, such as dynamic shimming, large region-of-interest coverage within slices, and rapid data acquisition. It has limitations, however, in the number of slices that can be acquired in realistic scan times and information loss from spacing between slices. In this paper, we synergize the multi-section spectroscopic imaging pulse sequence with multichannel coil technology to overcome these limitations. These combined techniques now permit elimination of the gaps between slices and acquisition of a larger number of slices to realize the whole brain metabolite mapping without incurring the penalties of longer repetition times (and therefore longer acquisition times) or lower signal-to-noise ratios
    corecore