13 research outputs found

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Design of a reusable distributed arithmetic filter and its application to the affine projection algorithm

    Get PDF
    Digital signal processing (DSP) is widely used in many applications spanning the spectrum from audio processing to image and video processing to radar and sonar processing. At the core of digital signal processing applications is the digital filter which are implemented in two ways, using either finite impulse response (FIR) filters or infinite impulse response (IIR) filters. The primary difference between FIR and IIR is that for FIR filters, the output is dependent only on the inputs, while for IIR filters the output is dependent on the inputs and the previous outputs. FIR filters also do not sur from stability issues stemming from the feedback of the output to the input that aect IIR filters. In this thesis, an architecture for FIR filtering based on distributed arithmetic is presented. The proposed architecture has the ability to implement large FIR filters using minimal hardware and at the same time is able to complete the FIR filtering operation in minimal amount of time and delay when compared to typical FIR filter implementations. The proposed architecture is then used to implement the fast affine projection adaptive algorithm, an algorithm that is typically used with large filter sizes. The fast affine projection algorithm has a high computational burden that limits the throughput, which in turn restricts the number of applications. However, using the proposed FIR filtering architecture, the limitations on throughput are removed. The implementation of the fast affine projection adaptive algorithm using distributed arithmetic is unique to this thesis. The constructed adaptive filter shares all the benefits of the proposed FIR filter: low hardware requirements, high speed, and minimal delay.Ph.D.Committee Chair: Anderson, Dr. David V.; Committee Member: Hasler, Dr. Paul E.; Committee Member: Mooney, Dr. Vincent J.; Committee Member: Taylor, Dr. David G.; Committee Member: Vuduc, Dr. Richar

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    A system for video-based analysis of face motion during speech

    Get PDF
    During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods

    Adaptive Conjoint Wavelet-Support Vector Classifiers

    Full text link
    Combined wavelet - large margin classifiers succeed in solving difficult signal classification problems in cases where solely using a large margin classifier like, e.g., the Support Vector Machine may fail. This thesis investigates the problem of conjointly designing both classifier stages to achieve a most effective classifier architecture. Particularly, the wavelet features should be adapted to the Support Vector classifier and the specific classification problem. Three different approaches to achieve this goal are considered: The classifier performance is seriously affected by the wavelet or filter used for feature extraction. To optimally choose this wavelet with respect to the subsequent Support Vector classification, appropriate criteria may be used. The radius - margin Support Vector Machine error bound is proven to be computable by two standard Support Vector problems. Criteria which are computationally still more efficient may be sufficient for filter adaptation. For the classification by a Support Vector Machine, several criteria are examined rating feature sets obtained from various orthogonal filter banks. An adaptive search algorithm is devised that, once the criterion is fixed, efficiently finds the optimal wavelet filter. To extract shift invariant wavelet features, Kingsbury's dual-tree complex wavelet transform is examined. The dual-tree filter bank construction leads to wavelets with vanishing negative frequency parts. An enhanced transform is established in the frequency domain for standard wavelet filters without special filter design. The translation and rotational invariance is improved compared with the common wavelet transform as shown for various standard wavelet filters. So the framework well applies to adapted signal classification. Wavelet adaptation for signal classification is a special case of feature selection. Feature selection is an important combinatorial optimisation problem in the context of supervised pattern classification. Four novel continuous feature selection approaches directly minimising the classifier performance are presented. In particular, they include linear and nonlinear Support Vector classifiers. The key ideas of the approaches are additional regularisation and embedded nonlinear feature selection. To solve the optimisation problems, difference of convex functions programming which is a general framework for non-convex continuous optimisation is applied. This optimisation framework may also be interesting for other applications and succeeds in robustly solving the problems, and hence, building more powerful feature selection methods

    Fast imaging in non-standard X-ray computed tomography geometries

    Get PDF

    Perspectives on panoramic photography

    Get PDF
    Digital imaging brings a new set of possibilities to photography. For example, little pictures can be assembled to form a large panorama, and digital cameras are trying to mimic the human visual system to produce better pictures. This manuscript aims at developing the algorithms required to stitch a set of pictures together to obtain a bigger and better image. This thesis explores three important topics of panoramic photography: The alignment of images, the matching of the colours, and the rendering of the resulting panorama. In addition, one chapter is devoted to 3D and constrained estimation. Aligning pictures can be difficult when the scene changes while taking the photographs. A method is proposed to model these changes —or outliers— that appear in image pairs, by computing the outlier distribution from the image histograms and handling the image-to-image correspondence problem as a mixture of inliers versus outliers. Compared to the standard methods, this approach uses the information contained in the image in a better way, and leads to a more reliable result. Digital cameras aim at reproducing the adaptation capabilities of the human eye in capturing the colours of a scene. As a consequence, there is often a large colour mismatch between two pictures. This work exposes a novel way of correcting for colour mismatches by modelling the transformation introduced by the camera, and reversing it to get consistent colours. Finally, this manuscript proposes a method to render high dynamic range images that contain very bright as well as very dark regions. To reproduce this kind of pictures the contrast has to be reduced in order to match the maximum contrast displayable on a screen or on paper. This last method, which is based on a complex model of the human visual system, reduces the contrast of the image while maintaining the little details visible the scene

    Elevation and Deformation Extraction from TomoSAR

    Get PDF
    3D SAR tomography (TomoSAR) and 4D SAR differential tomography (Diff-TomoSAR) exploit multi-baseline SAR data stacks to provide an essential innovation of SAR Interferometry for many applications, sensing complex scenes with multiple scatterers mapped into the same SAR pixel cell. However, these are still influenced by DEM uncertainty, temporal decorrelation, orbital, tropospheric and ionospheric phase distortion and height blurring. In this thesis, these techniques are explored. As part of this exploration, the systematic procedures for DEM generation, DEM quality assessment, DEM quality improvement and DEM applications are first studied. Besides, this thesis focuses on the whole cycle of systematic methods for 3D & 4D TomoSAR imaging for height and deformation retrieval, from the problem formation phase, through the development of methods to testing on real SAR data. After DEM generation introduction from spaceborne bistatic InSAR (TanDEM-X) and airborne photogrammetry (Bluesky), a new DEM co-registration method with line feature validation (river network line, ridgeline, valley line, crater boundary feature and so on) is developed and demonstrated to assist the study of a wide area DEM data quality. This DEM co-registration method aligns two DEMs irrespective of the linear distortion model, which improves the quality of DEM vertical comparison accuracy significantly and is suitable and helpful for DEM quality assessment. A systematic TomoSAR algorithm and method have been established, tested, analysed and demonstrated for various applications (urban buildings, bridges, dams) to achieve better 3D & 4D tomographic SAR imaging results. These include applying Cosmo-Skymed X band single-polarisation data over the Zipingpu dam, Dujiangyan, Sichuan, China, to map topography; and using ALOS L band data in the San Francisco Bay region to map urban building and bridge. A new ionospheric correction method based on the tile method employing IGS TEC data, a split-spectrum and an ionospheric model via least squares are developed to correct ionospheric distortion to improve the accuracy of 3D & 4D tomographic SAR imaging. Meanwhile, a pixel by pixel orbit baseline estimation method is developed to address the research gaps of baseline estimation for 3D & 4D spaceborne SAR tomography imaging. Moreover, a SAR tomography imaging algorithm and a differential tomography four-dimensional SAR imaging algorithm based on compressive sensing, SAR interferometry phase (InSAR) calibration reference to DEM with DEM error correction, a new phase error calibration and compensation algorithm, based on PS, SVD, PGA, weighted least squares and minimum entropy, are developed to obtain accurate 3D & 4D tomographic SAR imaging results. The new baseline estimation method and consequent TomoSAR processing results showed that an accurate baseline estimation is essential to build up the TomoSAR model. After baseline estimation, phase calibration experiments (via FFT and Capon method) indicate that a phase calibration step is indispensable for TomoSAR imaging, which eventually influences the inversion results. A super-resolution reconstruction CS based study demonstrates X band data with the CS method does not fit for forest reconstruction but works for reconstruction of large civil engineering structures such as dams and urban buildings. Meanwhile, the L band data with FFT, Capon and the CS method are shown to work for the reconstruction of large manmade structures (such as bridges) and urban buildings
    corecore