Search CORE

471 research outputs found

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Author: Ben Milner
Boll
Chen
Deller
Ephraim
Ephraim
Ephraim
Ephraim
Esfandiar Zavarehei
Friedman
Griffin
Hansen
Ioannis Andrianakis
Jonathan Darch
Kalman
Lim
Lim
Paul White
Qin Yan
Rentzos
Saeed Vaseghi
Sameti
Secrest
Seltzer
Stylianou
Stylianou
Tucker
Turunen
Vaseghi
Weber
Yan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages

Crossref

Southampton (e-Prints Soton)

University of East Anglia digital repository

Model-based speech enhancement for hearing aids

Author: Kavalekalam Mathew Shaji
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2018
Field of study

VBN

Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

Author: Shi Liming
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2019
Field of study

VBN

Reconstruction-based speech enhancement from robust acoustic features

Author: Ahmadi
Ben Milner
Boll
Cappe
Carmona
Chen
Cohen
Darch
de Cheveigné
Ephraim
Ephraim
Gales
Gauvain
Gerkmann
Gonzalez
Hu
Hu
Hu
Jensen
Kawahara
Leggetter
Loizou
Makhoul
Martin
Martin
McAulay
Milner
Milner
Mohammadiha
Oppenheim
Paliwal
Philip Harding
Rangachari
Reynolds
Stylianou
Syrdal
Varga
Xiao
Yan
Zen
Publication venue: 'Elsevier BV'
Publication date: 17/10/2015
Field of study

This paper proposes a method of speech enhancement where a clean speech signal is reconstructed from a sinusoidal model of speech production and a set of acoustic speech features. The acoustic features are estimated from noisy speech and comprise, for each frame, a voicing classification (voiced, unvoiced or non-speech), fundamental frequency (for voiced frames) and spectral envelope. Rather than using different algorithms to estimate each parameter, a single statistical model is developed. This comprises a set of acoustic models and has similarity to the acoustic modelling used in speech recognition. This allows noise and speaker adaptation to be applied to acoustic feature estimation to improve robustness. Objective and subjective tests compare reconstruction-based enhancement with other methods of enhancement and show the proposed method to be highly effective at removing noise

Crossref

University of East Anglia digital repository

<strong>Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation</strong>

Author: Li Chunjian
Publication venue: Institut for Elektroniske Systemer, Aalborg Universitet
Publication date: 01/01/2007
Field of study

VBN

‘Did the speaker change?’: Temporal tracking for overlapping speaker segmentation in multi-speaker scenarios

Author: Hogg Aidan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2022
Field of study

Diarization systems are an essential part of many speech processing applications, such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker domains. This thesis will focus on the first task of the diarization process, that being the task of speaker segmentation which can be thought of as trying to answer the question ‘Did the speaker change?’ in an audio recording. This thesis starts by showing that time-varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently, it is shown that if the pitch is not predictable, then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This thesis then goes on to demonstrate how voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This thesis then extends this work to explore the use of a new multimodal approach for overlapping speaker segmentation that tracks both the fundamental frequency (F0) and direction of arrival (DoA) of each speaker simultaneously. The proposed multiple hypothesis tracking system, which simultaneously tracks both features, shows an improvement in segmentation performance when compared to tracking these features separately. Lastly, this thesis focuses on the DoA estimation part of the newly proposed multimodal approach. It does this by exploring a polynomial extension to the multiple signal classification (MUSIC) algorithm, spatio-spectral polynomial (SSP)-MUSIC, and evaluating its performance when using speech sound sources.Open Acces

Spiral - Imperial College Digital Repository

Automatic Quality Control and Enhancement for Voice-Based Remote Parkinson's Disease Detection

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Kavalekalam Mathew Shaji
Little Max A.
Poorjam Amir Hossein
Raykov Jordan P.
Shi Liming
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

VBN

Tracking Rhythmicity in Biomedical Signals using Sequential Monte Carlo methods

Author: Kim Sungan
Publication venue: PDXScholar
Publication date: 28/09/2009
Field of study

Cyclical patterns are common in signals that originate from natural systems such as the human body and man-made machinery. Often these cyclical patterns are not perfectly periodic. In that case, the signals are called pseudo-periodic or quasi-periodic and can be modeled as a sum of time-varying sinusoids, whose frequencies, phases, and amplitudes change slowly over time. Each time-varying sinusoid represents an individual rhythmical component, called a partial, that can be characterized by three parameters: frequency, phase, and amplitude. Quasi-periodic signals often contain multiple partials that are harmonically related. In that case, the frequencies of other partials become exact integer multiples of that of the slowest partial. These signals are referred to as multi-harmonic signals. Examples of such signals are electrocardiogram (ECG), arterial blood pressure (ABP), and human voice. A Markov process is a mathematical model for a random system whose future and past states are independent conditional on the present state. Multi-harmonic signals can be modeled as a stochastic process with the Markov property. The Markovian representation of multi-harmonic signals enables us to use state-space tracking methods to continuously estimate the frequencies, phases, and amplitudes of the partials. Several research groups have proposed various signal analysis methods such as hidden Markov Models (HMM), short time Fourier transform (STFT), and Wigner-Ville distribution to solve this problem. Recently, a few groups of researchers have proposed Monte Carlo methods which estimate the posterior distribution of the fundamental frequency in multi-harmonic signals sequentially. However, multi-harmonic tracking is more challenging than single-frequency tracking, though the reason for this has not been well understood. The main objectives of this dissertation are to elucidate the fundamental obstacles to multi-harmonic tracking and to develop a reliable multi-harmonic tracker that can track cyclical patterns in multi-harmonic signals

PDXScholar (Portland State University)

Non-Intrusive Speech Intelligibility Prediction

Author: Sørensen Charlotte
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2019
Field of study

VBN