103 research outputs found

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    Compressive and Noncompressive Power Spectral Density Estimation from Periodic Nonuniform Samples

    Get PDF
    This paper presents a novel power spectral density estimation technique for band-limited, wide-sense stationary signals from sub-Nyquist sampled data. The technique employs multi-coset sampling and incorporates the advantages of compressed sensing (CS) when the power spectrum is sparse, but applies to sparse and nonsparse power spectra alike. The estimates are consistent piecewise constant approximations whose resolutions (width of the piecewise constant segments) are controlled by the periodicity of the multi-coset sampling. We show that compressive estimates exhibit better tradeoffs among the estimator's resolution, system complexity, and average sampling rate compared to their noncompressive counterparts. For suitable sampling patterns, noncompressive estimates are obtained as least squares solutions. Because of the non-negativity of power spectra, compressive estimates can be computed by seeking non-negative least squares solutions (provided appropriate sampling patterns exist) instead of using standard CS recovery algorithms. This flexibility suggests a reduction in computational overhead for systems estimating both sparse and nonsparse power spectra because one algorithm can be used to compute both compressive and noncompressive estimates.Comment: 26 pages, single spaced, 9 figure

    Variable Block Size Motion Compensation In The Redundant Wavelet Domain

    Get PDF
    Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression

    Infrared and Visible Image Fusion Based on Oversampled Graph Filter Banks

    Get PDF
    The infrared image (RI) and visible image (VI) fusion method merges complementary information from the infrared and visible imaging sensors to provide an effective way for understanding the scene. The graph filter bank-based graph wavelet transform possesses the advantages of the classic wavelet filter bank and graph representation of a signal. Therefore, we propose an RI and VI fusion method based on oversampled graph filter banks. Specifically, we consider the source images as signals on the regular graph and decompose them into the multiscale representations with M-channel oversampled graph filter banks. Then, the fusion rule for the low-frequency subband is constructed using the modified local coefficient of variation and the bilateral filter. The fusion maps of detail subbands are formed using the standard deviation-based local properties. Finally, the fusion image is obtained by applying the inverse transform on the fusion subband coefficients. The experimental results on benchmark images show the potential of the proposed method in the image fusion applications

    Disentangling Multispectral Functional Connectivity With Wavelets

    Get PDF
    The field of brain connectomics develops our understanding of the brain's intrinsic organization by characterizing trends in spontaneous brain activity. Linear correlations in spontaneous blood-oxygen level dependent functional magnetic resonance imaging (BOLD-fMRI) fluctuations are often used as measures of functional connectivity (FC), that is, as a quantity describing how similarly two brain regions behave over time. Given the natural spectral scaling of BOLD-fMRI signals, it may be useful to represent BOLD-fMRI as multiple processes occurring over multiple scales. The wavelet domain presents a transform space well suited to the examination of multiscale systems as the wavelet basis set is constructed from a self-similar rescaling of a time and frequency delimited kernel. In the present study, we utilize wavelet transforms to examine fluctuations in whole-brain BOLD-fMRI connectivity as a function of wavelet spectral scale in a sample (N = 31) of resting healthy human volunteers. Information theoretic criteria measure relatedness between spectrally-delimited FC graphs. Voxelwise comparisons of between-spectra graph structures illustrate the development of preferential functional networks across spectral bands

    Estimation and Modeling Problems in Parametric Audio Coding

    Get PDF

    Computationally efficient locally adaptive demosaicing of color filter array images using the dual-tree complex wavelet packet transform

    Get PDF
    Most digital cameras use an array of alternating color filters to capture the varied colors in a scene with a single sensor chip. Reconstruction of a full color image from such a color mosaic is what constitutes demosaicing. In this paper, a technique is proposed that performs this demosaicing in a way that incurs a very low computational cost. This is done through a (dual-tree complex) wavelet interpretation of the demosaicing problem. By using a novel locally adaptive approach for demosaicing (complex) wavelet coefficients, we show that many of the common demosaicing artifacts can be avoided in an efficient way. Results demonstrate that the proposed method is competitive with respect to the current state of the art, but incurs a lower computational cost. The wavelet approach also allows for computationally effective denoising or deblurring approaches

    DNN-Assisted Speech Enhancement Approaches Incorporating Phase Information

    Get PDF
    Speech enhancement is a widely adopted technique that removes the interferences in a corrupted speech to improve the speech quality and intelligibility. Speech enhancement methods can be implemented in either time domain or time-frequency (T-F) domain. Among various proposed methods, the time-frequency domain methods, which synthesize the enhanced speech with the estimated magnitude spectrogram and the noisy phase spectrogram, gain the most popularity in the past few decades. However, this kind of techniques tend to ignore the importance of phase processing. To overcome this problem, the thesis aims to jointly enhance the magnitude and phase spectra by means of the most recent deep neural networks (DNNs). More specifically, three major contributions are presented in this thesis. First, we present new schemes based on the basic Kalman filter (KF) to remove the background noise in the noisy speech in time domain, where the KF acts as joint estimator for both the magnitude and phase spectra of speech. A DNN-augmented basic KF is first proposed, where DNN is applied for estimating key parameters in the KF, namely the linear prediction coefficients (LPCs). By training the DNN with a large database and making use of the powerful learning ability of DNN, the proposed algorithm is able to estimate LPCs from noisy speech more accurately and robustly, leading to an improved performance as compared to traditional KF based approaches in speech enhancement. We further present a high-frequency (HF) component restoration algorithm to extenuate the degradation in the HF regions of the Kalman-filtered speech, in which the DNN-based bandwidth extension is applied to estimate the magnitude of HF component from the low-frequency (LF) counterpart. By incorporating the restoration algorithm, the enhanced speech suffers less distortion in the HF component. Moreover, we propose a hybrid speech enhancement system that exploits DNN for speech reconstruction and Kalman filtering for further denoising. Two separate networks are adopted in the estimation of magnitude spectrogram and LPCs of the clean speech, respectively. The estimated clean magnitude spectrogram is combined with the phase of the noisy speech to reconstruct the estimated clean speech. A KF with the estimated parameters is then utilized to remove the residual noise in the reconstructed speech. The proposed hybrid system takes advantages of both the DNN-based reconstruction and traditional Kalman filtering, and can work reliably in either matched or unmatched acoustic environments. Next, we incorporate the DNN-based parameter estimation scheme in two advanced KFs: subband KF and colored-noise KF. The DNN-augmented subband KF method decomposes the noisy speech into several subbands, and performs Kalman filtering to each subband speech, where the parameters of the KF are estimated by the trained DNN. The final enhanced speech is then obtained by synthesizing the enhanced subband speeches. In the DNN-augmented colored-noise KF system, both clean speech and noise are modelled as autoregressive (AR) processes, whose parameters comprise the LPCs and the driving noise variances. The LPCs are obtained through training a multi-objective DNN, while the driving noise variances are obtained by solving an optimization problem aiming to minimize the difference between the modelled and observed AR spectra of the noisy speech. The colored-noise Kalman filter with DNN-estimated parameters is then applied to the noisy speech for denoising. A post-subtraction technique is adopted to further remove the residual noise in the Kalman-filtered speech. Extensive computer simulations show that the two proposed advanced KF systems achieve significant performance gains when compared to conventional Kalman filter based algorithms as well as recent DNN-based methods under both seen and unseen noise conditions. Finally, we focus on the T-F domain speech enhancement with masking technique, which aims to retain the speech dominant components and suppress the noise dominant parts of the noisy speech. We first derive a new type of mask, namely constrained ratio mask (CRM), to better control the trade-off between speech distortion and residual noise in the enhanced speech. The CRM is estimated with a trained DNN based on the input noisy feature set and is applied to the noisy magnitude spectrogram for denoising. We further extend the CRM to the complex spectrogram estimation, where the enhanced magnitude spectrogram is obtained with the CRM, while the estimated phase spectrogram is reconstructed with the noisy phase spectrogram and the phase derivatives. Performance evaluation reveals our proposed CRM outperforms several traditional masks in terms of objective metrics. Moreover, the enhanced speech resulting from the CRM based complex spectrogram estimation has a better speech quality than that obtained without using phase reconstruction
    corecore