3,352 research outputs found

    Improving subband spectral estimation using modified AR model

    Get PDF
    It has already been shown that spectral estimation can be improved when applied to subband outputs of an adapted filterbank rather than to the original fullband signal. In the present paper, this procedure is applied jointly to a novel predictive autoregressive (AR) model. The model exploits time-shifting and is therefore referred to as time-shift AR (TSAR) model. Estimators are proposed for the unknown TS-AR parameters and the spectrum of the observed signal. The TS-AR model yields improved spectrum estimation by taking advantage of the correlation between subseries that after decimation. Simulation results on signals with continuous and line spectra that demonstrate the performance of the proposed method are provided

    Theory of optimal orthonormal subband coders

    Get PDF
    The theory of the orthogonal transform coder and methods for its optimal design have been known for a long time. We derive a set of necessary and sufficient conditions for the coding-gain optimality of an orthonormal subband coder for given input statistics. We also show how these conditions can be satisfied by the construction of a sequence of optimal compaction filters one at a time. Several theoretical properties of optimal compaction filters and optimal subband coders are then derived, especially pertaining to behavior as the number of subbands increases. Significant theoretical differences between optimum subband coders, transform coders, and predictive coders are summarized. Finally, conditions are presented under which optimal orthonormal subband coders yield as much coding gain as biorthogonal ones for a fixed number of subbands

    A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

    Full text link
    This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur

    Adaptive filtering techniques for gravitational wave interferometric data: Removing long-term sinusoidal disturbances and oscillatory transients

    Get PDF
    It is known by the experience gained from the gravitational wave detector proto-types that the interferometric output signal will be corrupted by a significant amount of non-Gaussian noise, large part of it being essentially composed of long-term sinusoids with slowly varying envelope (such as violin resonances in the suspensions, or main power harmonics) and short-term ringdown noise (which may emanate from servo control systems, electronics in a non-linear state, etc.). Since non-Gaussian noise components make the detection and estimation of the gravitational wave signature more difficult, a denoising algorithm based on adaptive filtering techniques (LMS methods) is proposed to separate and extract them from the stationary and Gaussian background noise. The strength of the method is that it does not require any precise model on the observed data: the signals are distinguished on the basis of their autocorrelation time. We believe that the robustness and simplicity of this method make it useful for data preparation and for the understanding of the first interferometric data. We present the detailed structure of the algorithm and its application to both simulated data and real data from the LIGO 40meter proto-type.Comment: 16 pages, 9 figures, submitted to Phys. Rev.

    Online Monaural Speech Enhancement Using Delayed Subband LSTM

    Get PDF
    This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount feature of the proposed method is that the same LSTM is used across frequencies, which drastically reduces the number of network parameters, the amount of training data and the computational burden. Training is performed in a subband manner: the input consists of one frequency, together with a few context frequencies. The network learns a speech-to-noise discriminative function relying on the signal stationarity and on the local spectral pattern, based on which it predicts a clean-speech mask at each frequency. To exploit future information, i.e. look-ahead, we propose an output-delayed subband architecture, which allows the unidirectional forward network to process a few future frames in addition to the current frame. We leverage the proposed method to participate to the DNS real-time speech enhancement challenge. Experiments with the DNS dataset show that the proposed method achieves better performance-measuring scores than the DNS baseline method, which learns the full-band spectra using a gated recurrent unit network.Comment: Paper submitted to Interspeech 202

    Applications of wavelet-based compression to multidimensional Earth science data

    Get PDF
    A data compression algorithm involving vector quantization (VQ) and the discrete wavelet transform (DWT) is applied to two different types of multidimensional digital earth-science data. The algorithms (WVQ) is optimized for each particular application through an optimization procedure that assigns VQ parameters to the wavelet transform subbands subject to constraints on compression ratio and encoding complexity. Preliminary results of compressing global ocean model data generated on a Thinking Machines CM-200 supercomputer are presented. The WVQ scheme is used in both a predictive and nonpredictive mode. Parameters generated by the optimization algorithm are reported, as are signal-to-noise (SNR) measurements of actual quantized data. The problem of extrapolating hydrodynamic variables across the continental landmasses in order to compute the DWT on a rectangular grid is discussed. Results are also presented for compressing Landsat TM 7-band data using the WVQ scheme. The formulation of the optimization problem is presented along with SNR measurements of actual quantized data. Postprocessing applications are considered in which the seven spectral bands are clustered into 256 clusters using a k-means algorithm and analyzed using the Los Alamos multispectral data analysis program, SPECTRUM, both before and after being compressed using the WVQ program

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    MDL Denoising Revisited

    Full text link
    We refine and extend an earlier MDL denoising criterion for wavelet-based denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and non-informative wavelet coefficients, respectively. This suggests two refinements, adding a code-length for the model index, and extending the model in order to account for subband-dependent coefficient distributions. A third refinement is derivation of soft thresholding inspired by predictive universal coding with weighted mixtures. We propose a practical method incorporating all three refinements, which is shown to achieve good performance and robustness in denoising both artificial and natural signals.Comment: Submitted to IEEE Transactions on Information Theory, June 200

    Feature Extracting in the Presence of Environmental Noise, using Subband Adaptive Filtering

    Get PDF
    In this work, a new feature extracting method in noisy environments is proposed. The approach is based on subband decomposition of speech signals followed by adaptive filtering in the noisiest subbbands of speech. The speech decomposition is obtained using low complexity octave filter bank, while adaptive filtering is performed using the normalized least mean square algorithm. The performance of the new feature was evaluated for isolated word speech recognition in the presence of a car noise. The proposed method showed higher recognition accuracy than conventional methods in noisy environments
    corecore