64 research outputs found

    Speech Enhancement with Adaptive Thresholding and Kalman Filtering

    Get PDF
    Speech enhancement has been extensively studied for many years and various speech enhance- ment methods have been developed during the past decades. One of the objectives of speech en- hancement is to provide high-quality speech communication in the presence of background noise and concurrent interference signals. In the process of speech communication, the clean speech sig- nal is inevitably corrupted by acoustic noise from the surrounding environment, transmission media, communication equipment, electrical noise, other speakers, and other sources of interference. These disturbances can significantly degrade the quality and intelligibility of the received speech signal. Therefore, it is of great interest to develop efficient speech enhancement techniques to recover the original speech from the noisy observation. In recent years, various techniques have been developed to tackle this problem, which can be classified into single channel and multi-channel enhancement approaches. Since single channel enhancement is easy to implement, it has been a significant field of research and various approaches have been developed. For example, spectral subtraction and Wiener filtering, are among the earliest single channel methods, which are based on estimation of the power spectrum of stationary noise. However, when the noise is non-stationary, or there exists music noise and ambient speech noise, the enhancement performance would degrade considerably. To overcome this disadvantage, this thesis focuses on single channel speech enhancement under adverse noise environment, especially the non-stationary noise environment. Recently, wavelet transform based methods have been widely used to reduce the undesired background noise. On the other hand, the Kalman filter (KF) methods offer competitive denoising results, especially in non-stationary environment. It has been used as a popular and powerful tool for speech enhancement during the past decades. In this regard, a single channel wavelet thresholding based Kalman filter (KF) algorithm is proposed for speech enhancement in this thesis. The wavelet packet (WP) transform is first applied to the noise corrupted speech on a frame-by-frame basis, which decomposes each frame into a number of subbands. A voice activity detector (VAD) is then designed to detect the voiced/unvoiced frames of the subband speech. Based on the VAD result, an adaptive thresholding scheme is applied to each subband speech followed by the WP based reconstruction to obtain the pre-enhanced speech. To achieve a further level of enhancement, an iterative Kalman filter (IKF) is used to process the pre-enhanced speech. The proposed adaptive thresholding iterative Kalman filtering (AT-IKF) method is evaluated and compared with some existing methods under various noise conditions in terms of segmental SNR and perceptual evaluation of speech quality (PESQ) as two well-known performance indexes. Firstly, we compare the proposed adaptive thresholding (AT) scheme with three other threshold- ing schemes: the non-linear universal thresholding (U-T), the non-linear wavelet packet transform thresholding (WPT-T) and the non-linear SURE thresholding (SURE-T). The experimental results show that the proposed AT scheme can significantly improve the segmental SNR and PESQ for all input SNRs compared with the other existing thresholding schemes. Secondly, extensive computer simulations are conducted to evaluate the proposed AT-IKF as opposed to the AT and the IKF as standalone speech enhancement methods. It is shown that the AT-IKF method still performs the best. Lastly, the proposed ATIKF method is compared with three representative and popular meth- ods: the improved spectral subtraction based speech enhancement algorithm (ISS), the improved Wiener filter based method (IWF) and the representative subband Kalman filter based algorithm (SIKF). Experimental results demonstrate the effectiveness of the proposed method as compared to some previous works both in terms of segmental SNR and PESQ

    End-to-End Probabilistic Inference for Nonstationary Audio Analysis

    Get PDF
    Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters. Further, we formulate this nonlinear model's state space representation, making it amenable to infinite-horizon Gaussian process regression with approximate inference via expectation propagation, which scales linearly in the number of time steps and quadratically in the state dimensionality. By doing so, we are able to process audio signals with hundreds of thousands of data points. We demonstrate, on various tasks with empirical data, how this inference scheme outperforms more standard techniques that rely on extended Kalman filtering

    Single Channel Speech Enhancement using Kalman Filter

    Get PDF
    The quality and intelligibility of speech conversation are generally degraded by the surrounding noises. The main objective of speech enhancement (SE) is to eliminate or reduce such disturbing noises from the degraded speech. Various SE methods have been proposed in literature. Among them, the Kalman filter (KF) is known to be an efficient SE method that uses the minimum mean square error (MMSE). However, most of the conventional KF based speech enhancement methods need access to clean speech and additive noise information for the state-space model parameters, namely, the linear prediction coefficients (LPCs) and the additive noise variance estimation, which is impractical in the sense that in practice, we can access only the noisy speech. Moreover, it is quite difficult to estimate these model parameters efficiently in the presence of adverse environmental noises. Therefore, the main focus of this thesis is to develop single channel speech enhancement algorithms using Kalman filter, where the model parameters are estimated in noisy conditions. Depending on these parameter estimation techniques, the proposed SE methods are classified into three approaches based on non-iterative, iterative, and sub-band iterative KF. In the first approach, a non-iterative Kalman filter based speech enhancement algorithm is presented, which operates on a frame-by-frame basis. In this proposed method, the state-space model parameters, namely, the LPCs and noise variance, are estimated first in noisy conditions. For LPC estimation, a combined speech smoothing and autocorrelation method is employed. A new method based on a lower-order truncated Taylor series approximation of the noisy speech along with a difference operation serving as high-pass filtering is introduced for the noise variance estimation. The non-iterative Kalman filter is then implemented with these estimated parameters effectively. In order to enhance the SE performance as well as parameter estimation accuracy in noisy conditions, an iterative Kalman filter based single channel SE method is proposed as the second approach, which also operates on a frame-by-frame basis. For each frame, the state-space model parameters of the KF are estimated through an iterative procedure. The Kalman filtering iteration is first applied to each noisy speech frame, reducing the noise component to a certain degree. At the end of this first iteration, the LPCs and other state-space model parameters are re-estimated using the processed speech frame and the Kalman filtering is repeated for the same processed frame. This iteration continues till the KF converges or a maximum number of iterations is reached, giving further enhanced speech frame. The same procedure will repeat for the following frames until the last noisy speech frame being processed. For further improving the speech enhancement performance, a sub-band iterative Kalman filter based SE method is also proposed as the third approach. A wavelet filter-bank is first used to decompose the noisy speech into a number of sub-bands. To achieve the best trade-off among the noise reduction, speech intelligibility and computational complexity, a partial reconstruction scheme based on consecutive mean squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is combined with the partially reconstructed LF sub-band speech to reconstruct the full-band enhanced speech. Experimental results have shown that the proposed KF based SE methods are capable of reducing adverse environmental noises for a wide range of input SNRs, and the overall performance of the proposed methods in terms of different evaluation metrics is superior to some existing state-of-the art SE methods

    Gaussian Process Modelling for Audio Signals

    Get PDF
    PhDAudio signals are characterised and perceived based on how their spectral make-up changes with time. Uncovering the behaviour of latent spectral components is at the heart of many real-world applications involving sound, but is a highly ill-posed task given the infi nite number of ways any signal can be decomposed. This motivates the use of prior knowledge and a probabilistic modelling paradigm that can characterise uncertainty. This thesis studies the application of Gaussian processes to audio, which offer a principled non-parametric way to specify probability distributions over functions whilst also encoding prior knowledge. Along the way we consider what prior knowledge we have about sound, the way it behaves, and the way it is perceived, and write down these assumptions in the form of probabilistic models. We show how Bayesian time-frequency analysis can be reformulated as a spectral mixture Gaussian process, and utilise modern day inference methods to carry out joint time-frequency analysis and nonnegative matrix factorisation. Our reformulation results in increased modelling flexibility, allowing more sophisticated prior knowledge to be encoded, which improves performance on a missing data synthesis task. We demonstrate the generality of this paradigm by showing how the joint model can additionally be applied to both denoising and source separation tasks without modi cation. We propose a hybrid statistical-physical model for audio spectrograms based on observations about the way amplitude envelopes decay over time, as well as a nonlinear model based on deep Gaussian processes. We examine the benefi ts of these methods, all of which are generative in the sense that novel signals can be sampled from the underlying models, allowing us to consider the extent to which they encode the important perceptual characteristics of sound

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)

    Spectrogram inversion and potential applications for hearing research

    Get PDF

    Intercarrier Interference Suppression for the OFDM Systems in Time-Varying Multipath Fading Channels

    Get PDF
    Due to its spectral efficiency and robustness over the multipath channels, orthogonal frequency division multiplexing (OFDM) has served as one of the major modulation schemes for the modern communication systems. In the future, the wireless OFDM systems are expected to operate at high carrier-frequencies, high speed and high throughput mobile reception, where the fasting time-varying fading channels are encountered. The channel variation destroys the orthogonality among the subcarriers and leads to the intercarrier interference (ICI). ICI poses a significant limitation to the wireless OFDM systems. The aim of this dissertation is to find an efficient method of providing reliable communication using OFDM in the fast time-varying fading channel scenarios. First, we investigate the OFDM performance in the situation of time-varying mobile channels in the presence of multiple Doppler frequency shifts. A new mathematical framework of the ICI effect is derived. The simulation results show that ICI induces an irreducible error probability floor, which in proportional to the Doppler frequency shifts. Furthermore, it is observed that ICI power arises from a few adjacent subcarriers. This observation motivates us to design the low-complexity Q-tap equalizers, namely, Minimum Mean Square Error (MMSE) linear equalizer and Decision Feedback (DF) non-linear equalizer to mitigate the ICI. Simulation results show that both Q-tap equalizers can improve the system performance in the sense of symbol error rate (SER). To employ these equalizers, the channel state information is also required. In this dissertation, we also design a pilot-aided channel estimation via Wiener filtering for a time-varying Wide-sense Stationary Uncorrelated Scatterers (WSSUS) channel model. The channel estimator utilizes that channel statistical properties. Our proposed low-complexity ICI suppression scheme, which incorporates the Q-tap equalizer with our proposed channel estimator, can significantly improve the performance of the OFDM systems in a fast time-varying fading channels. At the last part of the dissertation, an alternative ICI mitigation approach, which is based on the ICI self-cancellation coding, is also discussed. The EM-based approach, which solves the phase and amplitude ambiguities associated with this approach, is also introduced

    Estudo de formas de onda e conceção de algoritmos para operação conjunta de sistemas de comunicação e radar

    Get PDF
    The focus of this thesis is the processing of signals and design of algorithms that can be used to enable radar functions in communications systems. Orthogonal frequency division multiplexing (OFDM) is a popular multicarrier modulation waveform in communication systems. As a wideband signal, OFDM improves resolution and enables spectral efficiency in radar systems, while also improving detection performance thanks to its inherent frequency diversity. This thesis aims to use multicarrier waveforms for radar systems, to enable the simultaneous operation of radar and communication functions on the same device. The thesis is divided in two parts. The first part, studies the adaptation and application of other multicarrier waveforms to radar functions. At the present time many studies have been carried out to jointly use the OFDM signal for communication and radar functions, but other waveforms have shown to be possible candidates for communication applications. Therefore, studies on the evaluation of the application of these same signals to radar functions are necessary. In this thesis, to demonstrate that other multicarrier waveforms can overcome the OFDM waveform in radar/communication (RadCom) systems, we propose the adaptation of the filter bank multicarrier (FBMC), generalized frequency division multiplexing (GFDM) and universal filtering multicarrier (UFMC) waveforms for radar functions. These alternative waveforms were compared performance-wise regarding achievable target parameter estimation performance, amount of residual background noise in the radar image, impact of intersystem interference and flexibility of parameterization. In the second part of the thesis, signal processing techniques are explored to solve some of the limitations of the use of multicarrier waveforms for RadCom systems. Radar systems based on OFDM are promising candidates for future intelligent transport networks. Exploring the dual functionality enabled by OFDM, we presents cooperative methods for high-resolution delay-Doppler and direction-of-arrival estimation. High-resolution parameter estimation is an important requirement for automotive radar systems, especially in multi-target scenarios that require reliable target separation performance. By exploring the cooperation between vehicles, the studies presented in this thesis also enable the distributed tracking of targets. The result is a highly accurate multi-target tracking across the entire cooperative vehicle network, leading to improvements in transport reliability and safety.O foco desta tese é o processamento de sinais e desenvolvimento de algoritmos que podem ser utilizados para a habilitar a função de radar nos sistemas de comunicação. OFDM (Orthogonal Frequency Division Multiplexing) é uma forma de onda com modulação multi-portadora, popular em sistemas de comunicação. Para sistemas de radar, O OFDM melhora a resolução e fornece eficiência espectral, além disso sua diversidade de frequências melhora o desempenho na detecção do radar. Essa tese tem como objetivo utilizar formas de onda multi-portadoras para sistemas de radar, possibilitando a operação simultânea de funções de radar e de comunicação num mesmo dispositivo. A tese esta dividida em duas partes. Na primeira parte da tese são realizados estudos da adaptabilidade de outras formas de onda multi-portadora para funções de radar. Nos dias atuais, muitos estudos sobre o uso do sinal OFDM para funções de comunicação e radar vêm sendo realizados, no entanto, outras formas de onda mostram-se possíveis candidatas a aplicações em sistemas de comunicação, e assim, avaliações para funções de sistema de radar se tornam necessárias. Nesta tese, com a intenção de demonstrar que formas de onda multi-portadoras alternativas podem superar o OFDM nos sistemas de Radar/comunicação (RadCom), propomos a adaptação das seguintes formas de onda: FBMC (Filter Bank Multicarrier); GFDM (Generalized Frequency Division Multiplexing); e UFMC (Universal Filtering Multicarrier) para funções de radar. Também produzimos uma análise de desempenho dessas formas de onda sobre o aspecto da estimativa de parâmetros-alvo, ruído de fundo, interferência entre sistemas e parametrização do sistema. Na segunda parte da tese serão explorados técnicas de processamento de sinal de forma a solucionar algumas das limitações do uso de formas de ondas multi-portadora para sistemas RadCom. Os sistemas de radar baseados no OFDM são candidatos promissores para futuras redes de transporte inteligentes, porque combinam funções de estimativa de alvo com funções de rede de comunicação em um único sistema. Explorando a funcionalidade dupla habilitada pelo OFDM, nesta tese, apresentamos métodos cooperativos de alta resolução para estimar o posição, velocidade e direção dos alvos. A estimativa de parâmetros de alta resolução é um requisito importante para sistemas de radar automotivo, especialmente em cenários de múltiplos alvos que exigem melhor desempenho de separação de alvos. Ao explorar a cooperação entre veículos, os estudos apresentados nesta tese também permitem o rastreamento distribuído de alvos. O resultado é um rastreamento multi-alvo altamente preciso em toda a rede de veículos cooperativos, levando a melhorias na confiabilidade e segurança do transporte.Programa Doutoral em Telecomunicaçõe

    Low-Complexity Algorithms for Channel Estimation in Optimised Pilot-Assisted Wireless OFDM Systems

    Get PDF
    Orthogonal frequency division multiplexing (OFDM) has recently become a dominant transmission technology considered for the next generation fixed and mobile broadband wireless communication systems. OFDM has an advantage of lessening the severe effects of the frequency-selective (multipath) fading due to the band splitting into relatively flat fading subchannels, and allows for low-complexity transceiver implementation based on the fast Fourier transform algorithms. Combining OFDM modulation with multilevel frequency-domain symbol mapping (e.g., QAM) and spatial multiplexing (SM) over the multiple-input multiple-output (MIMO) channels, can theoretically achieve near Shannon capacity of the communication link. However, the high-rate and spectrumefficient system implementation requires coherent detection at the receiving end that is possible only when accurate channel state information (CSI) is available. Since in practice, the response of the wireless channel is unknown and is subject to random variation with time, the receiver typically employs a channel estimator for CSI acquisition. The channel response information retrieved by the estimator is then used by the data detector and can also be fed back to the transmitter by means of in-band or out-of-band signalling, so the latter could adapt power loading, modulation and coding parameters according to the channel conditions. Thus, design of an accurate and robust channel estimator is a crucial requirement for reliable communication through the channel, which is selective in time and frequency. In a MIMO configuration, a separate channel estimator has to be associated with each transmit/receive antenna pair, making the estimation algorithm complexity a primary concern. Pilot-assisted methods, relying on the insertion of reference symbols in certain frequencies and time slots, have been found attractive for identification of the doubly-selective radio channels from both the complexity and performance standpoint. In this dissertation, a family of the reduced-complexity estimators for the single and multiple-antenna OFDM systems is developed. The estimators are based on the transform-domain processing and have the same order of computational complexity, irrespective of the number of pilot subcarriers and their positioning. The common estimator structure represents a cascade of successive small-dimension filtering modules. The number of modules, as well as their order inside the cascade, is determined by the class of the estimator (one or two-dimensional) and availability of the channel statistics (correlation and signal-to-noise power ratio). For fine precision estimation in the multipath channels with statistics not known a priori, we propose recursive design of the filtering modules. Simulation results show that in the steady state, performance of the recursive estimators approaches that of their theoretical counterparts, which are optimal in the minimum mean square error (MMSE) sense. In contrast to the majority of the channel estimators developed so far, our modular-type architectures are suitable for the reconfigurable OFDM transceivers where the actual channel conditions influence the decision of what class of filtering algorithm to use, and how to allot pilot subcarrier positions in the band. In the pilot-assisted transmissions, channel estimation and detection are performed separately from each other over the distinct subcarrier sets. The estimator output is used only to construct the detector transform, but not as the detector input. Since performance of both channel estimation and detection depends on the signal-to-noise power vi ratio (SNR) at the corresponding subcarriers, there is a dilemma of the optimal power allocation between the data and the pilot symbols as these are conflicting requirements under the total transmit power constraint. The problem is exacerbated by the variety of channel estimators. Each kind of estimation algorithm is characterised by its own SNR gain, which in general can vary depending on the channel correlation. In this dissertation, we optimise pilot-data power allocation for the case of developed low-complexity one and two-dimensional MMSE channel estimators. The resultant contribution is manifested by the closed-form analytical expressions of the upper bound (suboptimal approximate value) on the optimal pilot-to-data power ratio (PDR) as a function of a number of design parameters (number of subcarriers, number of pilots, number of transmit antennas, effective order of the channel model, maximum Doppler shift, SNR, etc.). The resultant PDR equations can be applied to the MIMO-OFDM systems with arbitrary arrangement of the pilot subcarriers, operating in an arbitrary multipath fading channel. These properties and relatively simple functional representation of the derived analytical PDR expressions are designated to alleviate the challenging task of on-the-fly optimisation of the adaptive SM-MIMO-OFDM system, which is capable of adjusting transmit signal configuration (e.g., block length, number of pilot subcarriers or antennas) according to the established channel conditions
    corecore