64 research outputs found
Speech Enhancement with Adaptive Thresholding and Kalman Filtering
Speech enhancement has been extensively studied for many years and various speech enhance- ment methods have been developed during the past decades. One of the objectives of speech en- hancement is to provide high-quality speech communication in the presence of background noise and concurrent interference signals. In the process of speech communication, the clean speech sig- nal is inevitably corrupted by acoustic noise from the surrounding environment, transmission media, communication equipment, electrical noise, other speakers, and other sources of interference. These disturbances can significantly degrade the quality and intelligibility of the received speech signal. Therefore, it is of great interest to develop efficient speech enhancement techniques to recover the original speech from the noisy observation. In recent years, various techniques have been developed to tackle this problem, which can be classified into single channel and multi-channel enhancement approaches. Since single channel enhancement is easy to implement, it has been a significant field of research and various approaches have been developed. For example, spectral subtraction and Wiener filtering, are among the earliest single channel methods, which are based on estimation of the power spectrum of stationary noise. However, when the noise is non-stationary, or there exists music noise and ambient speech noise, the enhancement performance would degrade considerably. To overcome this disadvantage, this thesis focuses on single channel speech enhancement under adverse noise environment, especially the non-stationary noise environment.
Recently, wavelet transform based methods have been widely used to reduce the undesired background noise. On the other hand, the Kalman filter (KF) methods offer competitive denoising results, especially in non-stationary environment. It has been used as a popular and powerful tool for speech enhancement during the past decades. In this regard, a single channel wavelet thresholding based Kalman filter (KF) algorithm is proposed for speech enhancement in this thesis. The wavelet packet (WP) transform is first applied to the noise corrupted speech on a frame-by-frame basis, which decomposes each frame into a number of subbands. A voice activity detector (VAD) is then designed to detect the voiced/unvoiced frames of the subband speech. Based on the VAD result, an adaptive thresholding scheme is applied to each subband speech followed by the WP based reconstruction to obtain the pre-enhanced speech. To achieve a further level of enhancement, an iterative Kalman filter (IKF) is used to process the pre-enhanced speech.
The proposed adaptive thresholding iterative Kalman filtering (AT-IKF) method is evaluated and compared with some existing methods under various noise conditions in terms of segmental SNR and perceptual evaluation of speech quality (PESQ) as two well-known performance indexes. Firstly, we compare the proposed adaptive thresholding (AT) scheme with three other threshold- ing schemes: the non-linear universal thresholding (U-T), the non-linear wavelet packet transform thresholding (WPT-T) and the non-linear SURE thresholding (SURE-T). The experimental results show that the proposed AT scheme can significantly improve the segmental SNR and PESQ for all input SNRs compared with the other existing thresholding schemes. Secondly, extensive computer simulations are conducted to evaluate the proposed AT-IKF as opposed to the AT and the IKF as standalone speech enhancement methods. It is shown that the AT-IKF method still performs the best. Lastly, the proposed ATIKF method is compared with three representative and popular meth- ods: the improved spectral subtraction based speech enhancement algorithm (ISS), the improved Wiener filter based method (IWF) and the representative subband Kalman filter based algorithm (SIKF). Experimental results demonstrate the effectiveness of the proposed method as compared to some previous works both in terms of segmental SNR and PESQ
End-to-End Probabilistic Inference for Nonstationary Audio Analysis
Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters. Further, we formulate this nonlinear model's state space representation, making it amenable to infinite-horizon Gaussian process regression with approximate inference via expectation propagation, which scales linearly in the number of time steps and quadratically in the state dimensionality. By doing so, we are able to process audio signals with hundreds of thousands of data points. We demonstrate, on various tasks with empirical data, how this inference scheme outperforms more standard techniques that rely on extended Kalman filtering
Single Channel Speech Enhancement using Kalman Filter
The quality and intelligibility of speech conversation are generally degraded by the
surrounding noises. The main objective of speech enhancement (SE) is to eliminate
or reduce such disturbing noises from the degraded speech. Various SE methods have
been proposed in literature. Among them, the Kalman filter (KF) is known to be an
efficient SE method that uses the minimum mean square error (MMSE). However,
most of the conventional KF based speech enhancement methods need access to clean
speech and additive noise information for the state-space model parameters, namely,
the linear prediction coefficients (LPCs) and the additive noise variance estimation,
which is impractical in the sense that in practice, we can access only the noisy speech.
Moreover, it is quite difficult to estimate these model parameters efficiently in the
presence of adverse environmental noises. Therefore, the main focus of this thesis is to
develop single channel speech enhancement algorithms using Kalman filter, where the
model parameters are estimated in noisy conditions. Depending on these parameter
estimation techniques, the proposed SE methods are classified into three approaches
based on non-iterative, iterative, and sub-band iterative KF.
In the first approach, a non-iterative Kalman filter based speech enhancement
algorithm is presented, which operates on a frame-by-frame basis. In this proposed
method, the state-space model parameters, namely, the LPCs and noise variance, are
estimated first in noisy conditions. For LPC estimation, a combined speech smoothing
and autocorrelation method is employed. A new method based on a lower-order
truncated Taylor series approximation of the noisy speech along with a difference
operation serving as high-pass filtering is introduced for the noise variance estimation.
The non-iterative Kalman filter is then implemented with these estimated parameters
effectively.
In order to enhance the SE performance as well as parameter estimation accuracy
in noisy conditions, an iterative Kalman filter based single channel SE method is
proposed as the second approach, which also operates on a frame-by-frame basis.
For each frame, the state-space model parameters of the KF are estimated through
an iterative procedure. The Kalman filtering iteration is first applied to each noisy
speech frame, reducing the noise component to a certain degree. At the end of this
first iteration, the LPCs and other state-space model parameters are re-estimated
using the processed speech frame and the Kalman filtering is repeated for the same
processed frame. This iteration continues till the KF converges or a maximum number
of iterations is reached, giving further enhanced speech frame. The same procedure
will repeat for the following frames until the last noisy speech frame being processed.
For further improving the speech enhancement performance, a sub-band iterative
Kalman filter based SE method is also proposed as the third approach. A wavelet
filter-bank is first used to decompose the noisy speech into a number of sub-bands.
To achieve the best trade-off among the noise reduction, speech intelligibility and
computational complexity, a partial reconstruction scheme based on consecutive mean
squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially
reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is
combined with the partially reconstructed LF sub-band speech to reconstruct the
full-band enhanced speech.
Experimental results have shown that the proposed KF based SE methods are
capable of reducing adverse environmental noises for a wide range of input SNRs,
and the overall performance of the proposed methods in terms of different evaluation
metrics is superior to some existing state-of-the art SE methods
Gaussian Process Modelling for Audio Signals
PhDAudio signals are characterised and perceived based on how their spectral make-up changes with time. Uncovering the behaviour of latent spectral components is at the heart of many real-world applications involving sound, but is a highly ill-posed task given the infi nite number of ways any signal can be decomposed. This motivates the use of prior knowledge and a probabilistic modelling paradigm that can characterise uncertainty. This thesis studies the application of Gaussian processes to audio, which offer a principled non-parametric way to specify probability distributions over functions whilst also encoding prior knowledge. Along the way we consider what prior knowledge we have about sound, the way it behaves, and the way it is perceived, and write down these assumptions in the form of probabilistic models. We show how Bayesian time-frequency analysis can be reformulated as a spectral mixture Gaussian process, and utilise modern day inference methods to carry out joint time-frequency analysis and nonnegative matrix factorisation. Our reformulation results in increased modelling flexibility, allowing more sophisticated prior knowledge to be encoded, which improves performance on a missing data synthesis task. We demonstrate the generality of this paradigm by showing how the joint model can additionally be applied to both denoising and source separation tasks without modi cation. We propose a hybrid statistical-physical model for audio spectrograms based on observations about the way amplitude envelopes decay over time, as well as a nonlinear model based on deep Gaussian processes. We examine the benefi ts of these methods, all of which are generative in the sense that novel signals can be sampled from the underlying models, allowing us to consider the extent to which they encode the important perceptual characteristics of sound
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Intercarrier Interference Suppression for the OFDM Systems in Time-Varying Multipath Fading Channels
Due to its spectral efficiency and robustness over the multipath channels, orthogonal frequency division multiplexing (OFDM) has served as one of the major modulation schemes for the modern communication systems. In the future, the wireless OFDM systems are expected to operate at high carrier-frequencies, high speed and high throughput mobile reception, where the fasting time-varying fading channels are encountered. The channel variation destroys the orthogonality among the subcarriers and leads to the intercarrier interference (ICI). ICI poses a significant limitation to the wireless OFDM systems. The aim of this dissertation is to find an efficient method of providing reliable communication using OFDM in the fast time-varying fading channel scenarios. First, we investigate the OFDM performance in the situation of time-varying mobile channels in the presence of multiple Doppler frequency shifts. A new mathematical framework of the ICI effect is derived. The simulation results show that ICI induces an irreducible error probability floor, which in proportional to the Doppler frequency shifts. Furthermore, it is observed that ICI power arises from a few adjacent subcarriers. This observation motivates us to design the low-complexity Q-tap equalizers, namely, Minimum Mean Square Error (MMSE) linear equalizer and Decision Feedback (DF) non-linear equalizer to mitigate the ICI. Simulation results show that both Q-tap equalizers can improve the system performance in the sense of symbol error rate (SER). To employ these equalizers, the channel state information is also required. In this dissertation, we also design a pilot-aided channel estimation via Wiener filtering for a time-varying Wide-sense Stationary Uncorrelated Scatterers (WSSUS) channel model. The channel estimator utilizes that channel statistical properties. Our proposed low-complexity ICI suppression scheme, which incorporates the Q-tap equalizer with our proposed channel estimator, can significantly improve the performance of the OFDM systems in a fast time-varying fading channels. At the last part of the dissertation, an alternative ICI mitigation approach, which is based on the ICI self-cancellation coding, is also discussed. The EM-based approach, which solves the phase and amplitude ambiguities associated with this approach, is also introduced
Estudo de formas de onda e conceção de algoritmos para operação conjunta de sistemas de comunicação e radar
The focus of this thesis is the processing of signals and design of algorithms
that can be used to enable radar functions in communications systems.
Orthogonal frequency division multiplexing (OFDM) is a popular multicarrier
modulation waveform in communication systems. As a wideband
signal, OFDM improves resolution and enables spectral efficiency in radar
systems, while also improving detection performance thanks to its inherent
frequency diversity. This thesis aims to use multicarrier waveforms for radar
systems, to enable the simultaneous operation of radar and communication
functions on the same device. The thesis is divided in two parts. The first
part, studies the adaptation and application of other multicarrier waveforms
to radar functions. At the present time many studies have been carried out
to jointly use the OFDM signal for communication and radar functions, but
other waveforms have shown to be possible candidates for communication
applications. Therefore, studies on the evaluation of the application of these
same signals to radar functions are necessary. In this thesis, to demonstrate
that other multicarrier waveforms can overcome the OFDM waveform
in radar/communication (RadCom) systems, we propose the adaptation of
the filter bank multicarrier (FBMC), generalized frequency division multiplexing
(GFDM) and universal filtering multicarrier (UFMC) waveforms for radar
functions. These alternative waveforms were compared performance-wise
regarding achievable target parameter estimation performance, amount of
residual background noise in the radar image, impact of intersystem interference
and flexibility of parameterization. In the second part of the thesis,
signal processing techniques are explored to solve some of the limitations
of the use of multicarrier waveforms for RadCom systems. Radar systems
based on OFDM are promising candidates for future intelligent transport networks.
Exploring the dual functionality enabled by OFDM, we presents cooperative
methods for high-resolution delay-Doppler and direction-of-arrival
estimation. High-resolution parameter estimation is an important requirement
for automotive radar systems, especially in multi-target scenarios that
require reliable target separation performance. By exploring the cooperation
between vehicles, the studies presented in this thesis also enable the distributed
tracking of targets. The result is a highly accurate multi-target tracking
across the entire cooperative vehicle network, leading to improvements
in transport reliability and safety.O foco desta tese é o processamento de sinais e desenvolvimento de algoritmos
que podem ser utilizados para a habilitar a função de radar nos sistemas
de comunicação. OFDM (Orthogonal Frequency Division Multiplexing)
é uma forma de onda com modulação multi-portadora, popular em sistemas
de comunicação. Para sistemas de radar, O OFDM melhora a resolução e
fornece eficiência espectral, além disso sua diversidade de frequências melhora
o desempenho na detecção do radar. Essa tese tem como objetivo
utilizar formas de onda multi-portadoras para sistemas de radar, possibilitando
a operação simultânea de funções de radar e de comunicação num
mesmo dispositivo. A tese esta dividida em duas partes. Na primeira parte
da tese são realizados estudos da adaptabilidade de outras formas de onda
multi-portadora para funções de radar. Nos dias atuais, muitos estudos sobre
o uso do sinal OFDM para funções de comunicação e radar vêm sendo
realizados, no entanto, outras formas de onda mostram-se possíveis candidatas
a aplicações em sistemas de comunicação, e assim, avaliações para
funções de sistema de radar se tornam necessárias. Nesta tese, com a
intenção de demonstrar que formas de onda multi-portadoras alternativas
podem superar o OFDM nos sistemas de Radar/comunicação (RadCom),
propomos a adaptação das seguintes formas de onda: FBMC (Filter Bank
Multicarrier); GFDM (Generalized Frequency Division Multiplexing); e UFMC
(Universal Filtering Multicarrier) para funções de radar. Também produzimos
uma análise de desempenho dessas formas de onda sobre o aspecto
da estimativa de parâmetros-alvo, ruído de fundo, interferência entre sistemas
e parametrização do sistema. Na segunda parte da tese serão explorados
técnicas de processamento de sinal de forma a solucionar algumas
das limitações do uso de formas de ondas multi-portadora para sistemas
RadCom. Os sistemas de radar baseados no OFDM são candidatos
promissores para futuras redes de transporte inteligentes, porque combinam
funções de estimativa de alvo com funções de rede de comunicação
em um único sistema. Explorando a funcionalidade dupla habilitada pelo
OFDM, nesta tese, apresentamos métodos cooperativos de alta resolução
para estimar o posição, velocidade e direção dos alvos. A estimativa de
parâmetros de alta resolução é um requisito importante para sistemas de
radar automotivo, especialmente em cenários de múltiplos alvos que exigem
melhor desempenho de separação de alvos. Ao explorar a cooperação entre
veículos, os estudos apresentados nesta tese também permitem o rastreamento
distribuído de alvos. O resultado é um rastreamento multi-alvo altamente
preciso em toda a rede de veículos cooperativos, levando a melhorias
na confiabilidade e segurança do transporte.Programa Doutoral em Telecomunicaçõe
Low-Complexity Algorithms for Channel Estimation in Optimised Pilot-Assisted Wireless OFDM Systems
Orthogonal frequency division multiplexing (OFDM) has recently become a dominant transmission technology considered for the next generation fixed and mobile broadband wireless communication systems. OFDM has an advantage of lessening the severe effects of the frequency-selective (multipath) fading due to the band splitting into relatively flat fading subchannels, and allows for low-complexity transceiver implementation based on the fast Fourier transform algorithms. Combining OFDM modulation with multilevel frequency-domain symbol mapping (e.g., QAM) and spatial multiplexing (SM) over the multiple-input multiple-output (MIMO) channels, can theoretically achieve near Shannon capacity of the communication link. However, the high-rate and spectrumefficient system implementation requires coherent detection at the receiving end that is possible only when accurate channel state information (CSI) is available. Since in practice, the response of the wireless channel is unknown and is subject to random variation with time, the receiver typically employs a channel estimator for CSI acquisition. The channel response information retrieved by the estimator is then used by the data detector and can also be fed back to the transmitter by means of in-band or out-of-band signalling, so the latter could adapt power loading, modulation and coding parameters according to the channel conditions. Thus, design of an accurate and robust channel estimator is a crucial requirement for reliable communication through the channel, which is selective in time and frequency. In a MIMO configuration, a separate channel estimator has to be associated with each transmit/receive antenna pair, making the estimation algorithm complexity a primary concern. Pilot-assisted methods, relying on the insertion of reference symbols in certain frequencies and time slots, have been found attractive for identification of the doubly-selective radio channels from both the complexity and performance standpoint. In this dissertation, a family of the reduced-complexity estimators for the single and multiple-antenna OFDM systems is developed. The estimators are based on the transform-domain processing and have the same order of computational complexity, irrespective of the number of pilot subcarriers and their positioning. The common estimator structure represents a cascade of successive small-dimension filtering modules. The number of modules, as well as their order inside the cascade, is determined by the class of the estimator (one or two-dimensional) and availability of the channel statistics (correlation and signal-to-noise power ratio). For fine precision estimation in the multipath channels with statistics not known a priori, we propose recursive design of the filtering modules. Simulation results show that in the steady state, performance of the recursive estimators approaches that of their theoretical counterparts, which are optimal in the minimum mean square error (MMSE) sense. In contrast to the majority of the channel estimators developed so far, our modular-type architectures are suitable for the reconfigurable OFDM transceivers where the actual channel conditions influence the decision of what class of filtering algorithm to use, and how to allot pilot subcarrier positions in the band. In the pilot-assisted transmissions, channel estimation and detection are performed separately from each other over the distinct subcarrier sets. The estimator output is used only to construct the detector transform, but not as the detector input. Since performance of both channel estimation and detection depends on the signal-to-noise power vi ratio (SNR) at the corresponding subcarriers, there is a dilemma of the optimal power allocation between the data and the pilot symbols as these are conflicting requirements under the total transmit power constraint. The problem is exacerbated by the variety of channel estimators. Each kind of estimation algorithm is characterised by its own SNR gain, which in general can vary depending on the channel correlation. In this dissertation, we optimise pilot-data power allocation for the case of developed low-complexity one and two-dimensional MMSE channel estimators. The resultant contribution is manifested by the closed-form analytical expressions of the upper bound (suboptimal approximate value) on the optimal pilot-to-data power ratio (PDR) as a function of a number of design parameters (number of subcarriers, number of pilots, number of transmit antennas, effective order of the channel model, maximum Doppler shift, SNR, etc.). The resultant PDR equations can be applied to the MIMO-OFDM systems with arbitrary arrangement of the pilot subcarriers, operating in an arbitrary multipath fading channel. These properties and relatively simple functional representation of the derived analytical PDR expressions are designated to alleviate the challenging task of on-the-fly optimisation of the adaptive SM-MIMO-OFDM system, which is capable of adjusting transmit signal configuration (e.g., block length, number of pilot subcarriers or antennas) according to the established channel conditions
- …