127 research outputs found

    Colour image coding with wavelets and matching pursuit

    Get PDF
    This thesis considers sparse approximation of still images as the basis of a lossy compression system. The Matching Pursuit (MP) algorithm is presented as a method particularly suited for application in lossy scalable image coding. Its multichannel extension, capable of exploiting inter-channel correlations, is found to be an efficient way to represent colour data in RGB colour space. Known problems with MP, high computational complexity of encoding and dictionary design, are tackled by finding an appropriate partitioning of an image. The idea of performing MP in the spatio-frequency domain after transform such as Discrete Wavelet Transform (DWT) is explored. The main challenge, though, is to encode the image representation obtained after MP into a bit-stream. Novel approaches for encoding the atomic decomposition of a signal and colour amplitudes quantisation are proposed and evaluated. The image codec that has been built is capable of competing with scalable coders such as JPEG 2000 and SPIHT in terms of compression ratio

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Broadband adaptive beamforming with low complexity and frequency invariant response

    No full text
    This thesis proposes different methods to reduce the computational complexity as well as increasing the adaptation rate of adaptive broadband beamformers. This is performed exemplarily for the generalised sidelobe canceller (GSC) structure. The GSC is an alternative implementation of the linearly constrained minimum variance beamformer, which can utilise well-known adaptive filtering algorithms, such as the least mean square (LMS) or the recursive least squares (RLS) to perform unconstrained adaptive optimisation.A direct DFT implementation, by which broadband signals are decomposed into frequency bins and processed by independent narrowband beamforming algorithms, is thought to be computationally optimum. However, this setup fail to converge to the time domain minimum mean square error (MMSE) if signal components are not aligned to frequency bins, resulting in a large worst case error. To mitigate this problem of the so-called independent frequency bin (IFB) processor, overlap-save based GSC beamforming structures have been explored. This system address the minimisation of the time domain MMSE, with a significant reduction in computational complexity when compared to time-domain implementations, and show a better convergence behaviour than the IFB beamformer. By studying the effects that the blocking matrix has on the adaptive process for the overlap-save beamformer, several modifications are carried out to enhance both the simplicity of the algorithm as well as its convergence speed. These modifications result in the GSC beamformer utilising a significantly lower computational complexity compare to the time domain approach while offering similar convergence characteristics.In certain applications, especially in the areas of acoustics, there is a need to maintain constant resolution across a wide operating spectrum that may extend across several octaves. To attain constant beamwidth is difficult, particularly if uniformly spaced linear sensor array are employed for beamforming, since spatial resolution is reciprocally proportional to both the array aperture and the frequency. A scaled aperture arrangement is introduced for the subband based GSC beamformer to achieve near uniform resolution across a wide spectrum, whereby an octave-invariant design is achieved. This structure can also be operated in conjunction with adaptive beamforming algorithms. Frequency dependent tapering of the sensor signals is proposed in combination with the overlap-save GSC structure in order to achieve an overall frequency-invariant characteristic. An adaptive version is proposed for frequency-invariant overlap-save GSC beamformer. Broadband adaptive beamforming algorithms based on the family of least mean squares (LMS) algorithms are known to exhibit slow convergence if the input signal is correlated. To improve the convergence of the GSC when based on LMS-type algorithms, we propose the use of a broadband eigenvalue decomposition (BEVD) to decorrelate the input of the adaptive algorithm in the spatial dimension, for which an increase in convergence speed can be demonstrated over other decorrelating measures, such as the Karhunen-Loeve transform. In order to address the remaining temporal correlation after BEVD processing, this approach is combined with subband decomposition through the use of oversampled filter banks. The resulting spatially and temporally decorrelated GSC beamformer provides further enhanced convergence speed over spatial or temporal decorrelation methods on their own

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201

    White Noise Reduction for Wideband Sensor Array Signal Processing

    Get PDF
    The performance of wideband array signal processing algorithms is dependant on the noise level in the system. In this thesis, a method is proposed for reducing the level of white noise in wideband arrays via a judiciously designed spatial transformation followed by a bank of high-pass filters. The method is initially introduced for uniform linear arrays (ULAs) and analysed in detail. The spectrum of the signal and noise after being processed by the proposed noise reduction method is analysed, and the correlation matrix of the processed noise is derived. The reduced noise level leads to a higher signal-to-noise ratio (SNR) for the system, which can have a significant effect on the performance improvement of various beamforming methods and other array signal processing applications such as direction of arrival (DOA) estimation. The performance of two well-known beamformers, the reference signal based (RSB) beamformer and the linearly constrained minimum variance (LCMV) beamformer is reviewed. Then, the theoretical effect of applying the proposed noise reduction method as a pre-processing step on the performance enhancement of RSB and LCMV beamformers is studied. The theoretical results are then confirmed by simulation. As a representative example of wideband DOA estimation application, a compressive sensing-based DOA estimation method is employed to demonstrate the improved estimation by applying the pre-processing noise reduction method, which is confirmed by simulation. Next, the idea is extended to wideband non-uniform linear arrays (NLAs). Since, NLA does not have a uniform spacing, the beam response of the row vectors of the transformation is distorted. Therefore, the transformation is re-designed using the least squares method to satisfy the band-pass requirements of the transformation. Simulation results show a satisfactory improvement in the the performance of RSB and LCMV beamformers for the NLA structure. The idea is further extended to uniform rectangular arrays (URAs) and uniform circular arrays (UCAs), as two major types of the planar arrays. Two methods are proposed for reducing the effect of white noise in wideband URAs and for each one, a different transformation is designed. The first one is based on a two-dimensional (2D) transformation and the second one is an adaptation of the method developed for the ULA case. The developed method for the UCA structure is based on a one-dimensional (1D) transformation, with modified modulation for the transformation to satisfy the required band-pass characteristics of the transformation. Same as linear array structures, the RSB and LCMV beamformers are used to demonstrate the performance enhancement of the method for planar arrays

    Wideband data-independent beamforming for subarrays

    Get PDF
    The desire to operate large antenna arrays for e.g. RADAR applications over a wider frequency range is currently limited by the hardware, which due to weight, cost and size only permits complex multipliers behind each element. In contrast, wideband processing would have to rely on tap delay lines enabling digital filters for every element.As an intermediate step, in this thesis we consider a design where elements are grouped into subarrays, within which elements are still individually controlled by narrowband complex weights, but where each subarray output is given a tap delay line or finite impulse response digital filter for further wideband processing. Firstly, this thesis explores how a tap delay line attached to every subarray can be designed as a delay-and-sum beamformer. This filter is set to realised a fractional delay design based on a windowed sinc function. At the element level, we show that designing a narrowband beam w.r.t. a centre frequency of wideband operation is suboptimal,and suggest an optimisation technique that can yield sufficiently accurate gain over a frequency band of interest for an arbitrary look direction, which however comes at the cost of reduced aperture efficiency, as well as significantly increased sidelobes. We also suggest an adaptive method to enhance the frequency characteristic of a partial wideband array design, by utilising subarrays pointing in different directions in different frequency bands - resolved by means of a filter bank - to adaptively suppress undesired components in the beam patterns of the subarrays. Finally, the thesis proposes a novel array design approach obtained by rotational tiling of subarrays such that the overall array aperture is densely constructed from the same geometric subarray by rotation and translation only. Since the grating lobes of differently oriented subarrays do not necessarily align, an effective grating lobe attenuation w.r.t. the main beam is achieved. Based on a review of findings from geometry,a number of designs are highlight and transformed into numerical examples, and the theoretically expected grating lobe suppression is compared to uniformly spaced arrays.Supported by a number of models and simulations, the thesis thus suggests various numerical and hardware design techniques, mainly the addition of tap-delay-line per subarray and some added processing overhead, that can help to construct a large partial wideband array close in wideband performance to currently existing hardware.The desire to operate large antenna arrays for e.g. RADAR applications over a wider frequency range is currently limited by the hardware, which due to weight, cost and size only permits complex multipliers behind each element. In contrast, wideband processing would have to rely on tap delay lines enabling digital filters for every element.As an intermediate step, in this thesis we consider a design where elements are grouped into subarrays, within which elements are still individually controlled by narrowband complex weights, but where each subarray output is given a tap delay line or finite impulse response digital filter for further wideband processing. Firstly, this thesis explores how a tap delay line attached to every subarray can be designed as a delay-and-sum beamformer. This filter is set to realised a fractional delay design based on a windowed sinc function. At the element level, we show that designing a narrowband beam w.r.t. a centre frequency of wideband operation is suboptimal,and suggest an optimisation technique that can yield sufficiently accurate gain over a frequency band of interest for an arbitrary look direction, which however comes at the cost of reduced aperture efficiency, as well as significantly increased sidelobes. We also suggest an adaptive method to enhance the frequency characteristic of a partial wideband array design, by utilising subarrays pointing in different directions in different frequency bands - resolved by means of a filter bank - to adaptively suppress undesired components in the beam patterns of the subarrays. Finally, the thesis proposes a novel array design approach obtained by rotational tiling of subarrays such that the overall array aperture is densely constructed from the same geometric subarray by rotation and translation only. Since the grating lobes of differently oriented subarrays do not necessarily align, an effective grating lobe attenuation w.r.t. the main beam is achieved. Based on a review of findings from geometry,a number of designs are highlight and transformed into numerical examples, and the theoretically expected grating lobe suppression is compared to uniformly spaced arrays.Supported by a number of models and simulations, the thesis thus suggests various numerical and hardware design techniques, mainly the addition of tap-delay-line per subarray and some added processing overhead, that can help to construct a large partial wideband array close in wideband performance to currently existing hardware

    Efficient Synthesis of Room Acoustics via Scattering Delay Networks

    Get PDF
    An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible

    MVDR broadband beamforming using polynomial matrix techniques

    Get PDF
    This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation.This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation
    • …