3,352 research outputs found
Improving subband spectral estimation using modified AR model
It has already been shown that spectral estimation can be improved when applied to subband outputs of an adapted filterbank rather than to the original fullband signal. In the present paper, this procedure is applied jointly to a novel predictive autoregressive (AR) model. The model exploits time-shifting and is therefore referred to as time-shift AR (TSAR)
model. Estimators are proposed for the unknown TS-AR parameters and the spectrum of the observed signal. The TS-AR model yields improved spectrum estimation by taking advantage of the correlation between subseries that after decimation. Simulation results on signals with continuous and line spectra that demonstrate the performance of the proposed method are provided
Theory of optimal orthonormal subband coders
The theory of the orthogonal transform coder and methods for its optimal design have been known for a long time. We derive a set of necessary and sufficient conditions for the coding-gain optimality of an orthonormal subband coder for given input statistics. We also show how these conditions can be satisfied by the construction of a sequence of optimal compaction filters one at a time. Several theoretical properties of optimal compaction filters and optimal subband coders are then derived, especially pertaining to behavior as the number of subbands increases. Significant theoretical differences between optimum subband coders, transform coders, and predictive coders are summarized. Finally, conditions are presented under which optimal orthonormal subband coders yield as much coding gain as biorthogonal ones for a fixed number of subbands
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
This paper introduces a deep neural network model for subband-based speech
synthesizer. The model benefits from the short bandwidth of the subband signals
to reduce the complexity of the time-domain speech generator. We employed the
multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into
subbands in time domain. Inspired from the WaveNet, a convolutional neural
network (CNN) model predicts subband speech signals fully in time domain. Due
to the short bandwidth of the subbands, a simple network architecture is enough
to train the simple patterns of the subbands accurately. In the ground truth
experiments with teacher-forcing, the subband synthesizer outperforms the
fullband model significantly in terms of both subjective and objective
measures. In addition, by conditioning the model on the phoneme sequence using
a pronunciation dictionary, we have achieved the fully time-domain neural model
for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end.
The generated speech of the subband TTS shows comparable quality as the
fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur
Adaptive filtering techniques for gravitational wave interferometric data: Removing long-term sinusoidal disturbances and oscillatory transients
It is known by the experience gained from the gravitational wave detector
proto-types that the interferometric output signal will be corrupted by a
significant amount of non-Gaussian noise, large part of it being essentially
composed of long-term sinusoids with slowly varying envelope (such as violin
resonances in the suspensions, or main power harmonics) and short-term ringdown
noise (which may emanate from servo control systems, electronics in a
non-linear state, etc.). Since non-Gaussian noise components make the detection
and estimation of the gravitational wave signature more difficult, a denoising
algorithm based on adaptive filtering techniques (LMS methods) is proposed to
separate and extract them from the stationary and Gaussian background noise.
The strength of the method is that it does not require any precise model on the
observed data: the signals are distinguished on the basis of their
autocorrelation time. We believe that the robustness and simplicity of this
method make it useful for data preparation and for the understanding of the
first interferometric data. We present the detailed structure of the algorithm
and its application to both simulated data and real data from the LIGO 40meter
proto-type.Comment: 16 pages, 9 figures, submitted to Phys. Rev.
Online Monaural Speech Enhancement Using Delayed Subband LSTM
This paper proposes a delayed subband LSTM network for online monaural
(single-channel) speech enhancement. The proposed method is developed in the
short time Fourier transform (STFT) domain. Online processing requires
frame-by-frame signal reception and processing. A paramount feature of the
proposed method is that the same LSTM is used across frequencies, which
drastically reduces the number of network parameters, the amount of training
data and the computational burden. Training is performed in a subband manner:
the input consists of one frequency, together with a few context frequencies.
The network learns a speech-to-noise discriminative function relying on the
signal stationarity and on the local spectral pattern, based on which it
predicts a clean-speech mask at each frequency. To exploit future information,
i.e. look-ahead, we propose an output-delayed subband architecture, which
allows the unidirectional forward network to process a few future frames in
addition to the current frame. We leverage the proposed method to participate
to the DNS real-time speech enhancement challenge. Experiments with the DNS
dataset show that the proposed method achieves better performance-measuring
scores than the DNS baseline method, which learns the full-band spectra using a
gated recurrent unit network.Comment: Paper submitted to Interspeech 202
Applications of wavelet-based compression to multidimensional Earth science data
A data compression algorithm involving vector quantization (VQ) and the discrete wavelet transform (DWT) is applied to two different types of multidimensional digital earth-science data. The algorithms (WVQ) is optimized for each particular application through an optimization procedure that assigns VQ parameters to the wavelet transform subbands subject to constraints on compression ratio and encoding complexity. Preliminary results of compressing global ocean model data generated on a Thinking Machines CM-200 supercomputer are presented. The WVQ scheme is used in both a predictive and nonpredictive mode. Parameters generated by the optimization algorithm are reported, as are signal-to-noise (SNR) measurements of actual quantized data. The problem of extrapolating hydrodynamic variables across the continental landmasses in order to compute the DWT on a rectangular grid is discussed. Results are also presented for compressing Landsat TM 7-band data using the WVQ scheme. The formulation of the optimization problem is presented along with SNR measurements of actual quantized data. Postprocessing applications are considered in which the seven spectral bands are clustered into 256 clusters using a k-means algorithm and analyzed using the Los Alamos multispectral data analysis program, SPECTRUM, both before and after being compressed using the WVQ program
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
MDL Denoising Revisited
We refine and extend an earlier MDL denoising criterion for wavelet-based
denoising. We start by showing that the denoising problem can be reformulated
as a clustering problem, where the goal is to obtain separate clusters for
informative and non-informative wavelet coefficients, respectively. This
suggests two refinements, adding a code-length for the model index, and
extending the model in order to account for subband-dependent coefficient
distributions. A third refinement is derivation of soft thresholding inspired
by predictive universal coding with weighted mixtures. We propose a practical
method incorporating all three refinements, which is shown to achieve good
performance and robustness in denoising both artificial and natural signals.Comment: Submitted to IEEE Transactions on Information Theory, June 200
Feature Extracting in the Presence of Environmental Noise, using Subband Adaptive Filtering
In this work, a new feature extracting method in noisy environments is proposed. The approach is based on subband decomposition of speech signals followed by adaptive filtering in the noisiest subbbands of speech. The speech decomposition is obtained using low complexity octave filter bank, while adaptive filtering is performed using the normalized least mean square algorithm. The performance of the new feature was evaluated for isolated word speech recognition in the presence of a car noise. The proposed method showed higher recognition accuracy than conventional methods in noisy environments
- …