64 research outputs found

    Speech enhancement with frequency domain auto-regressive modeling

    Full text link
    Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model. The AR model is applied in the frequency domain of the sub-band speech signals to separate the envelope and carrier parts. A novel neural architecture based on dual path long short term memory (DPLSTM) model is proposed, which jointly enhances the sub-band envelope and carrier components. The dereverberated envelope-carrier signals are modulated and the sub-band signals are synthesized to reconstruct the audio signal back. The DPLSTM model for dereverberation of envelope and carrier components also allows the joint learning of the network weights for the down stream ASR task. In the ASR tasks on the REVERB challenge dataset as well as on the VOiCES dataset, we illustrate that the joint learning of speech dereverberation network and the E2E ASR model yields significant performance improvements over the baseline ASR system trained on log-mel spectrogram as well as other benchmarks for dereverberation (average relative improvements of 10-24% over the baseline system). The speech quality improvements, evaluated using subjective listening tests, further highlight the improved quality of the reconstructed audio.Comment: 10 page

    Single-carrier frequency domain equalization using subband decomposition for optical wireless communications

    Get PDF
    Optical wireless communication is intended to be applied into indoor and visual distance high-rate data transmission, which is complementary to radio frequency communications. There are specific problems and requirements in optical wireless communications compared with radio frequency communications. Single-carrier frequency domain equalization (SCFDE) is a transmission scheme which has been considered as an alternative of orthogonal frequency-division multiplexing (OFDM), because it has most of the advantages of OFDM and avoids the problems of OFDM. Subband decomposition is a multi-resolution signal analysis and synthesis method, using quadrature mirror filter (QMF) bank to convert time-domain signal to frequency-domain subbands. As a time-to-frequency transform, subband decomposition technique can be employed to frequency domain equalization. Compared with the commonly used discrete Fourier transform (DFT), subband transform is more flexible and efficient to compensate the fading of frequency-selective channels. This thesis includes the theories of SCFDE, subband decomposition, subband equalization, and optical wireless transmission scheme. Also, simulations are given to demonstrate the processing of signals and the implementation of equalizers. The results show that the transmitted signal can be effectively equalized to compensate the channel fading, and the transmission scheme is appropriate for optical wireless communications

    Wavelet-based multi-carrier code division multiple access systems

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    Orthogonal transmultiplexers in communication: a review

    Full text link

    Wavelet Filter Banks in Perceptual Audio Coding

    Get PDF
    This thesis studies the application of the wavelet filter bank (WFB) in perceptual audio coding by providing brief overviews of perceptual coding, psychoacoustics, wavelet theory, and existing wavelet coding algorithms. Furthermore, it describes the poor frequency localization property of the WFB and explores one filter design method, in particular, for improving channel separation between the wavelet bands. A wavelet audio coder has also been developed by the author to test the new filters. Preliminary tests indicate that the new filters provide some improvement over other wavelet filters when coding audio signals that are stationary-like and contain only a few harmonic components, and similar results for other types of audio signals that contain many spectral and temporal components. It has been found that the WFB provides a flexible decomposition scheme through the choice of the tree structure and basis filter, but at the cost of poor localization properties. This flexibility can be a benefit in the context of audio coding but the poor localization properties represent a drawback. Determining ways to fully utilize this flexibility, while minimizing the effects of poor time-frequency localization, is an area that is still very much open for research

    Discrete multitone modulation with principal component filter banks

    Get PDF
    Discrete multitone (DMT) modulation is an attractive method for communication over a nonflat channel with possibly colored noise. The uniform discrete Fourier transform (DFT) filter bank and cosine modulated filter bank have in the past been used in this system because of low complexity. We show in this paper that principal component filter banks (PCFB) which are known to be optimal for data compression and denoising applications, are also optimal for a number of criteria in DMT modulation communication. For example, the PCFB of the effective channel noise power spectrum (noise psd weighted by the inverse of the channel gain) is optimal for DMT modulation in the sense of maximizing bit rate for fixed power and error probabilities. We also establish an optimality property of the PCFB when scalar prefilters and postfilters are used around the channel. The difference between the PCFB and a traditional filter bank such as the brickwall filter bank or DFT filter bank is significant for effective power spectra which depart considerably from monotonicity. The twisted pair channel with its bridged taps, next and fext noises, and AM interference, therefore appears to be a good candidate for the application of a PCFB. This is demonstrated with the help of numerical results for the case of the ADSL channel

    Blind fault detection using spectral signatures

    Get PDF
    This work studies a blind fault detection method, which only analyses a system\u27s output signal for any change in the characteristics from pre-fault to post-fault to identify the occurrence of faults. In our case the fault considered to develop the procedure is change in time constant of an aircraft\u27s aileron-actuator system and its simplified version - a position servo system. The method is studied as an alternative to conventional fault detection and identification methods. The output signal is passed through a filter bank to enhance the effect of a fault. The Short time Fourier transform is performed on the enhanced pre-fault and post-fault signals components to obtain indicators. Fault detection is approached as a clustering problem determining distances to fault signatures. This work presents two techniques to create signatures from the indicators. In the first method, the mean of the indicators is the signature. Tests on a position servo system show that the method effectively classifies the indicators by more than 85 % and can be used for online classification. A second method uses Principal Component Analysis and defines vector sub-space signatures. It is observed that for the position servo system, the pre-fault indicators had 14 % of false alarms and post-fault indicators the missed the faults by 17%. This second method was also applied to one axis model of an F-14 aircraft\u27s aileron-actuator system. The results obtained showed around 80 % of correctly identified pre-fault indicators and post-fault indicators. The blind fault detection method studies has potential but needs to be understood further by applying it to more varied cases of faults and systems

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.
    corecore