2,246 research outputs found
Adaptive Hidden Markov Noise Modelling for Speech Enhancement
A robust and reliable noise estimation algorithm is required in many speech enhancement
systems. The aim of this thesis is to propose and evaluate a robust noise estimation
algorithm for highly non-stationary noisy environments. In this work, we model the
non-stationary noise using a set of discrete states with each state representing a distinct
noise power spectrum. In this approach, the state sequence over time is conveniently
represented by a Hidden Markov Model (HMM).
In this thesis, we first present an online HMM re-estimation framework that models
time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics
by a sequential model update procedure that tracks the noise characteristics
during the absence of speech. In addition the algorithm will when necessary create new
model states to represent novel noise spectra and will merge existing states that have similar
characteristics. We then extend our work in robust noise estimation during speech
activity by incorporating a speech model into our existing noise model. The noise characteristics
within each state are updated based on a speech presence probability which
is derived from a modified Minima controlled recursive averaging method.
We have demonstrated the effectiveness of our noise HMM in tracking both stationary
and highly non-stationary noise, and shown that it gives improved performance over
other conventional noise estimation methods when it is incorporated into a standard
speech enhancement algorithm
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Analytical methods and experimental approaches for electrophysiological studies of brain oscillations
Brain oscillations are increasingly the subject of electrophysiological studies probing their role in the functioning and dysfunction of the human brain. In recent years this research area has seen rapid and significant changes in the experimental approaches and analysis methods. This article reviews these developments and provides a structured overview of experimental approaches, spectral analysis techniques and methods to establish relationships between brain oscillations and behaviour
Automatic Transcription of Polyphonic Vocal Music
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution
- âŠ