146 research outputs found
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
Improving subband spectral estimation using modified AR model
It has already been shown that spectral estimation can be improved when applied to subband outputs of an adapted filterbank rather than to the original fullband signal. In the present paper, this procedure is applied jointly to a novel predictive autoregressive (AR) model. The model exploits time-shifting and is therefore referred to as time-shift AR (TSAR)
model. Estimators are proposed for the unknown TS-AR parameters and the spectrum of the observed signal. The TS-AR model yields improved spectrum estimation by taking advantage of the correlation between subseries that after decimation. Simulation results on signals with continuous and line spectra that demonstrate the performance of the proposed method are provided
Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach
Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding
New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition
This paper presents a novel noise-robust feature
extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual
MVDR spectrum of the filtered short-time autocorrelation
sequence can reduce the effects of residue of the nonstationary
additive noise which remains after filtering the autocorrelation.
To achieve a more robust front-end, we also modify the robust
distortionless constraint of the MVDR spectral estimation method
via revised weighting of the subband power spectrum values
based on the sub-band signal to noise ratios (SNRs), which adjusts
it to the new proposed approach. This new function allows the
components of the input signal at the frequencies least affected by
noise to pass with larger weights and attenuates more effectively
the noisy and undesired components. This modification results
in reduction of the noise residuals of the estimated spectrum
from the filtered autocorrelation sequence, thereby leading to
a more robust algorithm. Our proposed method, when evaluated
on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions
Model-based analysis of noisy musical recordings with application to audio restoration
This thesis proposes digital signal processing algorithms for noise reduction and enhancement of audio signals. Approximately half of the work concerns signal modeling techniques for suppression of localized disturbances in audio signals, such as impulsive noise and low-frequency pulses. In this regard, novel algorithms and modifications to previous propositions are introduced with the aim of achieving a better balance between computational complexity and qualitative performance, in comparison with other schemes presented in the literature. The main contributions related to this set of articles are: an efficient algorithm for suppression of low-frequency pulses in audio signals; a scheme for impulsive noise detection that uses frequency-warped linear prediction; and two methods for reconstruction of audio signals within long gaps of missing samples.
The remaining part of the work discusses applications of sound source modeling (SSM) techniques to audio restoration. It comprises application examples, such as a method for bandwidth extension of guitar tones, and discusses the challenge of model calibration based on noisy recorded sources. Regarding this matter, a frequency-selective spectral analysis technique called frequency-zooming ARMA (FZ-ARMA) modeling is proposed as an effective way to estimate the frequency and decay time of resonance modes associated with the partials of a given tone, despite the presence of corrupting noise in the observable signal.reviewe
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
- …