4,207 research outputs found

    Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes

    Get PDF
    This paper presents an adaptive algorithm for the restoration of lost sample values in discrete-time signals that can locally be described by means of autoregressive processes. The only restrictions are that the positions of the unknown samples should be known and that they should be embedded in a sufficiently large neighborhood of known samples. The estimates of the unknown samples are obtained by minimizing the sum of squares of the residual errors that involve estimates of the autoregressive parameters. A statistical analysis shows that, for a burst of lost samples, the expected quadratic interpolation error per sample converges to the signal variance when the burst length tends to infinity. The method is in fact the first step of an iterative algorithm, in which in each iteration step the current estimates of the missing samples are used to compute the new estimates. Furthermore, the feasibility of implementation in hardware for real-time use is established. The method has been tested on artificially generated auto-regressive processes as well as on digitized music and speech signals

    Generalized Perceptual Linear Prediction (gPLP) Features for Animal Vocalization Analysis

    Get PDF
    A new feature extraction model, generalized perceptual linear prediction (gPLP), is developed to calculate a set of perceptually relevant features for digital signal analysis of animalvocalizations. The gPLP model is a generalized adaptation of the perceptual linear prediction model, popular in human speech processing, which incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since such perceptual information is available for a number of animal species, this new approach integrates that information into a generalized model to extract perceptually relevant features for a particular species. To illustrate, qualitative and quantitative comparisons are made between the species-specific model, generalized perceptual linear prediction (gPLP), and the original PLP model using a set of vocalizations collected from captive African elephants (Loxodonta africana) and wild beluga whales (Delphinapterus leucas). The models that incorporate perceptional information outperform the original human-based models in both visualization and classification tasks

    Regularized adaptive long autoregressive spectral analysis

    Full text link
    This paper is devoted to adaptive long autoregressive spectral analysis when (i) very few data are available, (ii) information does exist beforehand concerning the spectral smoothness and time continuity of the analyzed signals. The contribution is founded on two papers by Kitagawa and Gersch. The first one deals with spectral smoothness, in the regularization framework, while the second one is devoted to time continuity, in the Kalman formalism. The present paper proposes an original synthesis of the two contributions: a new regularized criterion is introduced that takes both information into account. The criterion is efficiently optimized by a Kalman smoother. One of the major features of the method is that it is entirely unsupervised: the problem of automatically adjusting the hyperparameters that balance data-based versus prior-based information is solved by maximum likelihood. The improvement is quantified in the field of meteorological radar

    Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates

    Get PDF
    Estimating the parameters of the autoregressive (AR) random process is a problem that has been well-studied. In many applications, only noisy measurements of AR process are available. The effect of the additive noise is that the system can be modeled as an AR model with colored noise, even when the measurement noise is white, where the correlation matrix depends on the AR parameters. Because of the correlation, it is expedient to compute using multiple stacked observations. Performing a weighted least-squares estimation of the AR parameters using an inverse covariance weighting can provide significantly better parameter estimates, with improvement increasing with the stack depth. The estimation algorithm is essentially a vector RLS adaptive filter, with time-varying covariance matrix. Different ways of estimating the unknown covariance are presented, as well as a method to estimate the variances of the AR and observation noise. The notation is extended to vector autoregressive (VAR) processes. Simulation results demonstrate performance improvements in coefficient error and in spectrum estimation

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

    Get PDF
    In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated

    Application of the saber method for improved spectral analysis of noisy speech

    Get PDF
    technical reportA stand alone noise suppression algorithm is described for reducing the spectral effects of acoustically added noise in speech. A fundamental result is developed which shows that the spectral magnitude of speech plus noise can be effectively approximated as the sum of magnitudes of speech and noise. Using this simple phase independent additive model, the noise bias present in the short time spectrum is reduced by subtracting off the expected noise spectrum calculated during nonspeech activity. After bias removal, the time waveform is recalculated from the modified magnitude and saved phase. This Spectral Averaging for Bias Estimation and Removal, or SABER method requires only one FFT per time window for analysis and synthesis
    • …
    corecore