254 research outputs found

    Unbiased coherent-to-diffuse ratio estimation for dereverberation

    Full text link
    We investigate the estimation of the time- and frequency-dependent coherent-to-diffuse ratio (CDR) from the measured spatial coherence between two omnidirectional microphones. We illustrate the relationship between several known CDR es-timators using a geometric interpretation in the complex plane, discuss the problem of estimator bias, and propose unbiased versions of the estimators. Furthermore, we show that knowl-edge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for an unbiased CDR estimation. Finally, we apply the CDR estimators to the problem of dereverberation, using automatic speech recognition word error rate as objective performance measure

    Recurrent neural networks for multi-microphone speech separation

    Get PDF
    This thesis takes the classical signal processing problem of separating the speech of a target speaker from a real-world audio recording containing noise, background interference — from competing speech or other non-speech sources —, and reverberation, and seeks data-driven solutions based on supervised learning methods, particularly recurrent neural networks (RNNs). Such speech separation methods can inject robustness in automatic speech recognition (ASR) systems and have been an active area of research for the past two decades. We particularly focus on applications where multi-channel recordings are available. Stand-alone beamformers cannot simultaneously suppress diffuse-noise and protect the desired signal from any distortions. Post-filters complement the beamformers in obtaining the minimum mean squared error (MMSE) estimate of the desired signal. Time-frequency (TF) masking — a method having roots in computational auditory scene analysis (CASA) — is a suitable candidate for post-filtering, but the challenge lies in estimating the TF masks. The use of RNNs — in particular the bi-directional long short-term memory (BLSTM) architecture — as a post-filter estimating TF masks for a delay-and-sum beamformer (DSB) — using magnitude spectral and phase-based features — is proposed. The data—recorded in 4 challenging realistic environments—from the CHiME-3 challenge is used. Two different TF masks — Wiener filter and log-ratio — are identified as suitable targets for learning. The separated speech is evaluated based on objective speech intelligibility measures: short-term objective intelligibility (STOI) and frequency-weighted segmental SNR (fwSNR). The word error rates (WERs) as reported by the previous state-of-the-art ASR back-end — when fed with the test data of the CHiME-3 challenge — are interpreted against the objective scores for understanding the relationships of the latter with the former. Overall, a consistent improvement in the objective scores brought in by the RNNs is observed compared to that of feed-forward neural networks and a baseline MVDR beamformer

    Designing the Wiener post-filter for diffuse noise suppression using imaginary parts of inter-channel cross-spectra

    Get PDF
    International audienceThis paper describes a new design of the Wiener post-filter for diffuse noise suppression. The Wiener post-filter is well-known as an effective post-processing of the minimum variance distortionless response beamformer, and its output is the optimal estimate of the target signal in the sense of the minimum mean square error. It is essential to accurately estimate the target power spectrum from the observed signals contaminated by noise when designing the Wiener post-filter. In our method, it is estimated from the imaginary parts of the inter-channel observation cross-spectra, under the assumption that the inter-channel noise cross-spectra are real-valued. The post-filter is designed using the estimate and this design is shown to be effective even for a small-sized array through experiments using simulated and real environmental noise

    Speech enhancement in binaural hearing protection devices

    Get PDF
    The capability of people to operate safely and effective under extreme noise conditions is dependent on their accesses to adequate voice communication while using hearing protection. This thesis develops speech enhancement algorithms that can be implemented in binaural hearing protection devices to improve communication and situation awareness in the workplace. The developed algorithms which emphasize low computational complexity, come with the capability to suppress noise while enhancing speech

    Treatise on Hearing: The Temporal Auditory Imaging Theory Inspired by Optics and Communication

    Full text link
    A new theory of mammalian hearing is presented, which accounts for the auditory image in the midbrain (inferior colliculus) of objects in the acoustical environment of the listener. It is shown that the ear is a temporal imaging system that comprises three transformations of the envelope functions: cochlear group-delay dispersion, cochlear time lensing, and neural group-delay dispersion. These elements are analogous to the optical transformations in vision of diffraction between the object and the eye, spatial lensing by the lens, and second diffraction between the lens and the retina. Unlike the eye, it is established that the human auditory system is naturally defocused, so that coherent stimuli do not react to the defocus, whereas completely incoherent stimuli are impacted by it and may be blurred by design. It is argued that the auditory system can use this differential focusing to enhance or degrade the images of real-world acoustical objects that are partially coherent. The theory is founded on coherence and temporal imaging theories that were adopted from optics. In addition to the imaging transformations, the corresponding inverse-domain modulation transfer functions are derived and interpreted with consideration to the nonuniform neural sampling operation of the auditory nerve. These ideas are used to rigorously initiate the concepts of sharpness and blur in auditory imaging, auditory aberrations, and auditory depth of field. In parallel, ideas from communication theory are used to show that the organ of Corti functions as a multichannel phase-locked loop (PLL) that constitutes the point of entry for auditory phase locking and hence conserves the signal coherence. It provides an anchor for a dual coherent and noncoherent auditory detection in the auditory brain that culminates in auditory accommodation. Implications on hearing impairments are discussed as well.Comment: 603 pages, 131 figures, 13 tables, 1570 reference

    Calibration, foreground subtraction, and signal extraction in hydrogen cosmology

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 265-271).By using the hyperfine 21 cm transition to map out the distribution of neutral hydrogen at high redshifts, hydrogen cosmology has the potential to place exquisite constraints on fundamental cosmological parameters, as well as to provide direct observations of our Universe prior to the formation of the fist luminous objects. However, this theoretical promise has yet to become observational reality. Chief amongst the observational obstacles are a need for extremely well-calibrated instruments and methods for dealing with foreground contaminants such as Galactic synchrotron radiation. In this thesis we explore a number of these challenges by proposing and testing a variety of techniques for calibration, foreground subtraction, and signal extraction in hydrogen cosmology. For tomographic hydrogen cosmology experiments, we explore a calibration algorithm known as redundant baseline calibration, extending treatments found in the existing literature to include rigorous calculations of uncertainties and extensions to not-quite-redundant baselines. We use a principal component analysis to model foregrounds, and take advantage of the resulting sparseness of foreground spectra to propose various foreground subtraction algorithms. These include fitting low-order polynomials to spectra (either in image space or Fourier space) and inverse variance weighting. The latter method is described in a unified mathematical framework that includes power spectrum estimation. Foreground subtraction is also explored in the context of global signal experiments, and data analysis methods that incorporate angular information are presented. Finally, we apply many of the aforementioned methods to data from the Murchison Widefield Array, placing an upper limit on the Epoch of Reionization power spectrum at redshift z = 9:1.by Adrian Chi-Yan Liu.Ph.D

    Using deep learning methods for supervised speech enhancement in noisy and reverberant environments

    Get PDF
    In real world environments, the speech signals received by our ears are usually a combination of different sounds that include not only the target speech, but also acoustic interference like music, background noise, and competing speakers. This interference has negative effect on speech perception and degrades the performance of speech processing applications such as automatic speech recognition (ASR), speaker identification, and hearing aid devices. One way to solve this problem is using source separation algorithms to separate the desired speech from the interfering sounds. Many source separation algorithms have been proposed to improve the performance of ASR systems and hearing aid devices, but it is still challenging for these systems to work efficiently in noisy and reverberant environments. On the other hand, humans have a remarkable ability to separate desired sounds and listen to a specific talker among noise and other talkers. Inspired by the capabilities of human auditory system, a popular method known as auditory scene analysis (ASA) was proposed to separate different sources in a two stage process of segmentation and grouping. The main goal of source separation in ASA is to estimate time frequency masks that optimally match and separate noise signals from a mixture of speech and noise. In this work, multiple algorithms are proposed to improve upon source separation in noisy and reverberant acoustic environment. First, a simple and novel algorithm is proposed to increase the discriminability between two sound sources by scaling (magnifying) the head-related transfer function of the interfering source. Experimental results from applications of this algorithm show a significant increase in the quality of the recovered target speech. Second, a time frequency masking-based source separation algorithm is proposed that can separate a male speaker from a female speaker in reverberant conditions by using the spatial cues of the source signals. Furthermore, the proposed algorithm has the ability to preserve the location of the sources after separation. Three major aims are proposed for supervised speech separation based on deep neural networks to estimate either the time frequency masks or the clean speech spectrum. Firstly, a novel monaural acoustic feature set based on a gammatone filterbank is presented to be used as the input of the deep neural network (DNN) based speech separation model, which shows significant improvement in objective speech intelligibility and speech quality in different testing conditions. Secondly, a complementary binaural feature set is proposed to increase the ability of source separation in adverse environment with non-stationary background noise and high reverberation using 2-channel recordings. Experimental results show that the combination of spatial features with this complementary feature set improves significantly the speech intelligibility and speech quality in noisy and reverberant conditions. Thirdly, a novel dilated convolution neural network is proposed to improve the generalization of the monaural supervised speech enhancement model to different untrained speakers, unseen noises and simulated rooms. This model increases the speech intelligibility and speech quality of the recovered speech significantly, while being computationally more efficient and requiring less memory in comparison to other models. In addition, the proposed model is modified with recurrent layers and dilated causal convolution layers for real-time processing. This model is causal which makes it suitable for implementation in hearing aid devices and ASR system, while having fewer trainable parameters and using only information about previous time frames in output prediction. The main goal of the proposed algorithms are to increase the intelligibility and the quality of the recovered speech from noisy and reverberant environments, which has the potential to improve both speech processing applications and signal processing strategies for hearing aid and cochlear implant technology

    Amplitude and phase sonar calibration and the use of target phase for enhanced acoustic target characterisation

    Get PDF
    This thesis investigates the incorporation of target phase into sonar signal processing, for enhanced information in the context of acoustical oceanography. A sonar system phase calibration method, which includes both the amplitude and phase response is proposed. The technique is an extension of the widespread standard-target sonar calibration method, based on the use of metallic spheres as standard targets. Frequency domain data processing is used, with target phase measured as a phase angle difference between two frequency components. This approach minimizes the impact of range uncertainties in the calibration process. Calibration accuracy is examined by comparison to theoretical full-wave modal solutions. The system complex response is obtained for an operating frequency of 50 to 150 kHz, and sources of ambiguity are examined. The calibrated broadband sonar system is then used to study the complex scattering of objects important for the modelling of marine organism echoes, such as elastic spheres, fluid-filled shells, cylinders and prolate spheroids. Underlying echo formation mechanisms and their interaction are explored. Phase-sensitive sonar systems could be important for the acquisition of increased levels of information, crucial for the development of automated species identification. Studies of sonar system phase calibration and complex scattering from fundamental shapes are necessary in order to incorporate this type of fully-coherent processing into scientific acoustic instruments
    • …
    corecore