56 research outputs found

    Advances in DFT-Based Single-Microphone Speech Enhancement

    No full text
    The interest in the field of speech enhancement emerges from the increased usage of digital speech processing applications like mobile telephony, digital hearing aids and human-machine communication systems in our daily life. The trend to make these applications mobile increases the variety of potential sources for quality degradation. Speech enhancement methods can be used to increase the quality of these speech processing devices and make them more robust under noisy conditions. The name "speech enhancement" refers to a large group of methods that are all meant to improve certain quality aspects of these devices. Examples of speech enhancement algorithms are echo control, bandwidth extension, packet loss concealment and noise reduction. In this thesis we focus on single-microphone additive noise reduction and aim at methods that work in the discrete Fourier transform (DFT) domain. The main objective of the presented research is to improve on existing single-microphone schemes for an extended range of noise types and noise levels, thereby making these methods more suitable for mobile speech communication applications than state-of-the-art algorithms. The research topics in this thesis are three-fold. At first, we focus on improved estimation of the a priori signal-to-noise ratio (SNR) from the noisy speech. We focus on two aspects of a priori SNR estimation. Firstly, we present an adaptive time-segmentation algorithm, which we use to reduce the variance of the estimated a priori SNR. Secondly, an approach is presented to reduce the bias of the estimated a priori SNR, which is often present during transitions between speech sounds. Secondly, we investigate the derivation of clean speech estimators under models that take properties of speech into account. This problem is approached from two different angles. At first, we consider the derivation of clean speech estimators under the use of a combined stochastic/deterministic model for the complex DFT coefficients. The use of a deterministic model is based on the fact that certain speech sounds have a more deterministic character. Secondly, we focus on the derivation of complex DFT and magnitude DFT estimators under super-Gaussian densities. Derivation of clean speech estimators under these types of densities is based on measured histograms of speech DFT coefficients. We present two different type of estimators under super-Gaussian densities. Minimum mean-square error (MMSE) estimators are derived under a generalized Gamma density for the clean speech DFT coefficients and DFT magnitudes. Maximum a posteriori (MAP) estimators are derived under the multivariate normal inverse Gaussian (MNIG) density for the clean speech DFT coefficients. Estimators derived under the MNIG density have some theoretical advantages over estimators derived under the generalized Gamma density. More specifically, under the MNIG density the statistical models in the complex DFT and the polar domain are consistent, which is not the case for estimators derived under the generalized Gamma density. In addition, the MNIG density can model vector processes, which allows for taking into account the dependency between the real and imaginary part of DFT coefficients. Finally, we developed a method for tracking of the noise power spectral density (PSD). The developed method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. This approach makes it possible, in contrast to existing methods, to update the noise PSD when speech is continuously present. Furthermore, the tracking delay is considerably reduced compared to state-of-the-art noise tracking algorithms. A comparison is performed between a combination of individual components presented in this thesis and a state-of-the-art speech enhancement system from literature. Subjective experiments by means of a listening test show that the system based on contributions of this thesis improves significantly over the state-of-the-art speech enhancement system.Electrical Engineering, Mathematics and Computer Scienc

    Noise PSD Insensitive RTF Estimation in a Reverberant and Noisy Environment

    No full text
    Spatial filtering techniques typically rely on estimates of the target relative transfer function (RTF). However, the target speech signal is typically corrupted by late reverberation and ambient noise, which complicates RTF estimation. Existing methods subtract the noise covariance matrix to obtain the target plus late reverberation covariance matrix, from where the RTF is estimated. However, the noise covariance matrix is typically unknown. More specifically, the noise power spectral density (PSD) is typically unknown, while the spatial coherence matrix can be assumed known as it might remain time-invariant for a longer time. Using the spatial coherence matrices we simplify the signal model such that the off-diagonal elements are not affected by the PSDs of the late reverberation and the ambient noise. Then we use these elements to estimate the target covariance matrix, from where the RTF can be obtained. Hence, the resulting estimate of the RTF is insensitive to the noise PSD. Experiments demonstrate the estimation performance of our proposed method.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Signal Processing System

    Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information

    No full text
    The processing required for the global maximization of the intelligibility of speech acquired by multiple microphones and rendered by a single loudspeaker, is considered in this paper. The intelligibility is quantized, based on the mutual information rate between the message spoken by the talker and the message as interpreted by the listener. We prove that then, in each of a set of narrow-band channels, the processing can be decomposed into a minimum variance distortionless response (MVDR) beamforming operation that reduces the noise in the talker environment, followed by a gain operation that, given the far-end noise and beamforming operation, accounts for the noise at the listener end. Our experiments confirm that both processing steps are necessary for the effective conveyance ofa message and, importantly, that the second step must be aware of the first step.Accepted Author ManuscriptCircuits and System

    On the Estimation of Complex Speech DFT Coefficients Without Assuming Independent Real and Imaginary Parts

    No full text
    This letter considers the estimation of speech signals contaminated by additive noise in the discrete Fourier transform (DFT) domain. Existing complex-DFT estimators assume independency of the real and imaginary parts of the speech DFT coefficients, although this is not in line with measurements. In this letter, we derive some general results on these estimators, under more realistic assumptions. Assuming that speech and noise are independent, speech DFT coefficients have uniform phase, and that noise DFT coefficients have a Gaussian density, we show theoretically that the spectral gain function for speech DFT estimation is real and upper-bounded by the corresponding gain function for spectral magnitude estimation. We also show that the minimum mean-square error (MMSE) estimator of the speech phase equals the noisy phase. No assumptions are made about the distribution of the speech spectral magnitudes. Recently, speech spectral amplitude estimators have been derived under a generalized-Gamma amplitude distribution. As an example, we will derive the corresponding complex-DFT estimators, without making the independence assumption.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Relative Acoustic Transfer Function Estimation in Wireless Acoustic Sensor Networks

    No full text
    In this paper, we present an algorithm to estimate the relative acoustic transfer function (RTF) of a target source in wireless acoustic sensor networks (WASNs). Two well-known methods to estimate the RTF are the covariance subtraction (CS) method and the covariance whitening (CW) approach, the latter based on the generalized eigenvalue decomposition. Both methods depend on the use of the noisy correlation matrix, which, in practice, has to be estimated using limited and (in WASNs) quantized data. The bit rate and the fact that we use limited data records therefore directly affect the accuracy of the estimated RTFs. Therefore, we first theoretically analyze the estimation performance of the two approaches in terms of bit rate. Second, we propose a rate-distribution method by minimizing the power usage and constraining the expected estimation error for both RTF estimators. The optimal rate distributions are found by using convex optimization techniques. The model-based methods, however, are impractical due to the dependence on the true RTFs. We therefore further develop two greedy rate-distribution methods for both approaches. Finally, numerical simulations on synthetic data and real audio recordings show the superiority of the proposed approaches in power usage compared to uniform rate allocation. We find that in order to satisfy the same RTF estimation accuracy, the rate-distributed CW methods consume much less transmission energy than the CS-based methods.Accepted author manuscriptCircuits and System

    Greedy Gossip Algorithm with Synchronous Communication for Wireless Sensor Networks

    No full text
    Randomized gossip (RG) based distributed averaging has been popular for wireless sensor networks (WSNs) in multiple areas. With RG, randomly two adjacent nodes are selected to communicate and exchange information iteratively until consensus is reached. One way to improve the convergence speed of RG is to use greedy gossip with eavesdropping (GGE). Instead of randomly selecting two nodes, GGE selects the two nodes based on the maximum difference between nodes in each iteration. To further increase the convergence speed in terms of transmissions, we present in this paper a synchronous version of the GGE algorithm, called greedy gossip with synchronous communication (GGwSC). The presented algorithm allows multiple node pairs to exchange their values synchronously. Because of the selection criterion of the maximum difference between the values at the nodes, there is at least one node pair with different information, such that the relative error must be reduced after each iteration. The convergence rate in terms of the number of transmissions is demonstrated to be improved compared to GGE. Experimental results validate that the proposed GGwSC is quite e↵ective for the random geometric graph (RGG) as well as for several other special network topologies.Circuits and System

    Clock-Offset and Microphone Gain Mismatch Invariant Beamforming

    No full text
    The use of wireless acoustic sensor networks (WASNs) has received increased attention over the last decade. The advantages of WASNs over stand-alone multi-microphone devices are that the microphone array is not anymore limited by the dimensions of a single device, and that microphones can be placed at arbitrary locations. One of the disadvantages, however, is that for many applications, like beamforming, the clocks of all devices in the network need to be synchronised and that the microphone gains need to be equalised. In this paper we will prove that a specific class of beamformers is clock-offset and gain mismatch invariant. The parameters for these beamformers (acoustic transfer function and power spectral density matrices) can be estimated directly from the uncalibrated microphone signals, instead of first synchronising the clocks and equalising the gains and then estimating them. The resulting beamformers are applied to the non-calibrated microphone signals. We will substantiate, by means of computer simulations, that the proposed approach gives identical results compared to the setup where microphone signals are first calibrated, so that clock-offset compensation and microphone gain equalisation becomes unnecessary.Circuits and System

    Structured Total Least Squares Based Internal Delay Estimation For Distributed Microphone Auto-Localization

    No full text
    Auto-localization in wireless acoustic sensor networks (WASNs) can be achieved by time-of-arrival (TOA) measurements between sensors and sources. Most existing approaches are centralized, and they require a fusion center to communicate with other nodes. In practice, WASN topologies are time-varying with nodes joining or leaving the network, which poses scalability issues for such algorithms. In particular, for an increasing number of nodes, the total transmission power required to reach the fusion center increases. Therefore, in order to facilitate scalability, we present a structured total least squares (STLS) based internal delay estimation for distributed microphone localization where the internal delay refers to the time taken for a source signal reaching a sensor to that it is registered as received by the capture device. Each node only needs to communicate with its neighbors instead of with a remote host, and they run an STLS algorithm locally to estimate local internal delays and positions (i.e., its own and those of its neighbors), such that the original centralized computation is divided into many subproblems. Experiments demonstrate that the decentralized internal delay estimation converges to the centralized results with increasing signal-to-noise ratio (SNR). More importantly, less computational complexity and transmission power are required to obtain comparable localization accuracy.Circuits and System

    Rate-Distributed Spatial Filtering Based Noise Reduction in Wireless Acoustic Sensor Networks

    No full text
    In wireless acoustic sensor networks (WASNs), sensors typically have a limited energy budget as they are often battery driven. Energy efficiency is therefore essential to the design of algorithms in WASNs. One way to reduce energy costs is to only select the sensors which are most informative, a problem known as sensor selection. In this way, only sensors that significantly contribute to the task at hand will be involved. In this work, we consider a more general approach, which is based on rate-distributed spatial filtering. Together with the distance over which transmission takes place, bit rate directly influences the energy consumption. We try to minimize the battery usage due to transmission, while constraining the noise reduction performance. This results in an efficient rate allocation strategy, which depends on the underlying signal statistics, as well as the distance from sensors to a fusion center (FC). Under the utilization of a linearly constrained minimum variance (LCMV) beamformer, the problem is derived as a semi-definite program. Furthermore, we show that rate allocation is more general than sensor selection, and sensor selection can be seen as a special case of the presented rate-allocation solution, e.g., the best microphone subset can be determined by thresholding the rates. Finally, numerical simulations for the application of estimating several target sources in a WASN demonstrate that the proposed method outperforms the microphone subset selection based approaches in the sense of energy usage, and we find that the sensors close to the FC and close to point sources are allocated with higher rates.Accepted Author ManuscriptCircuits and System

    Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario

    No full text
    Estimation of the acoustic-scene related parameters such as relative transfer functions (RTFs) from source to microphones, source power spectral densities (PSDs) and PSDs of the late reverberation is essential and also challenging. Existing maximum likelihood estimators typically consider only subsets of these parameters and use each time frame separately. In this paper we explicitly focus on the single source scenario and first propose a joint maximum likelihood estimator (MLE) to estimate all parameters jointly using a single time frame. Since the RTFs are typically invariant for a number of consecutive time frames we also propose a joint maximum likelihood estimator (MLE) using multiple time frames which has similar estimation performance compared to a recently proposed reference algorithm called simultaneously confirmatory factor analysis (SCFA), but at a much lower complexity. Moreover, we present experimental results which demonstrate that the estimation accuracy, together with the performance of noise reduction, speech quality and speech intelligibility, of our proposed joint MLE outperform those of existing MLE based approaches that use only a single time frame.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Signal Processing SystemsMultimedia Computin
    corecore