Search CORE

6 research outputs found

The application of non-linear partial differential equations for the removal of noise in audio signal processing

Author: Shipton Jarrod Jay
Publication venue
Publication date: 01/01/2017
Field of study

A dissertation submitted in fulfllment for the degree of Masters of Science in the Faculty of Science School of Computer Science and Applied Mathematics October 2017.This work explores a new method of applying partial di erential equations to audio signal processing, particularly that of noise removal. Two methods are explored and compared to the method of noise removal used in the free software Audacity(R). The rst of these methods uses a non-linear variation of the di usion equation in two dimensions, coupled with a non-linear sink/source term, in order to lter the imaginary and real components of an array of overlapping windows of the signal's Fourier transform. The second model is that of a non-linear di usion function applied to the magnitude of the Fourier transform in order to estimate the noise power spectrum to be used in a spectral subtraction noise removal technique. The technique in this work features nite di erence methods to approximate the solutions of each of the models.LG201

Analysis of very low quality speech for mask-based enhancement

Author: Gonzalez Sira
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2013
Field of study

The complexity of the speech enhancement problem has motivated many different solutions. However, most techniques address situations in which the target speech is fully intelligible and the background noise energy is low in comparison with that of the speech. Thus while current enhancement algorithms can improve the perceived quality, the intelligibility of the speech is not increased significantly and may even be reduced. Recent research shows that intelligibility of very noisy speech can be improved by the use of a binary mask, in which a binary weight is applied to each time-frequency bin of the input spectrogram. There are several alternative goals for the binary mask estimator, based either on the Signal-to-Noise Ratio (SNR) of each time-frequency bin or on the speech signal characteristics alone. Our approach to the binary mask estimation problem aims to preserve the important speech cues independently of the noise present by identifying time-frequency regions that contain significant speech energy. The speech power spectrum varies greatly for different types of speech sound. The energy of voiced speech sounds is concentrated in the harmonics of the fundamental frequency while that of unvoiced sounds is, in contrast, distributed across a broad range of frequencies. To identify the presence of speech energy in a noisy speech signal we have therefore developed two detection algorithms. The first is a robust algorithm that identifies voiced speech segments and estimates their fundamental frequency. The second detects the presence of sibilants and estimates their energy distribution. In addition, we have developed a robust algorithm to estimate the active level of the speech. The outputs of these algorithms are combined with other features estimated from the noisy speech to form the input to a classifier which estimates a mask that accurately reflects the time-frequency distribution of speech energy even at low SNR levels. We evaluate a mask-based speech enhancer on a range of speech and noise signals and demonstrate a consistent increase in an objective intelligibility measure with respect to noisy speech.Open Acces

Spiral - Imperial College Digital Repository

System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

Author: Wada Ted S.
Publication venue: Georgia Institute of Technology
Publication date: 28/06/2012
Field of study

We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.PhDCommittee Chair: Biing-Hwang Juang; Committee Member: Brani Vidakovic; Committee Member: David V. Anderson; Committee Member: Jeff S. Shamma; Committee Member: Xiaoli M

Multimodal assessment of emotional responses by physiological monitoring: novel auditory and visual elicitation strategies in traditional and virtual reality environments

Author: POLO EDOARDO MARIA
Publication venue
Publication date: 19/05/2023
Field of study

This doctoral thesis explores novel strategies to quantify emotions and listening effort through monitoring of physiological signals. Emotions are a complex aspect of the human experience, playing a crucial role in our survival and adaptation to the environment. The study of emotions fosters important applications, such as Human-Computer and Human-Robot interaction or clinical assessment and treatment of mental health conditions such as depression, anxiety, stress, chronic anger, and mood disorders. Listening effort is also an important area of study, as it provides insight into the listeners’ challenges that are usually not identified by traditional audiometric measures. The research is divided into three lines of work, each with a unique emphasis on the methods of emotion elicitation and the stimuli that are most effective in producing emotional responses, with a specific focus on auditory stimuli. The research fostered the creation of three experimental protocols, as well as the use of an available online protocol for studying emotional responses including monitoring of both peripheral and central physiological signals, such as skin conductance, respiration, pupil dilation, electrocardiogram, blood volume pulse, and electroencephalography. An emotional protocol was created for the study of listening effort using a speech-in-noise test designed to be short and not induce fatigue. The results revealed that the listening effort is a complex problem that cannot be studied with a univariate approach, thus necessitating the use of multiple physiological markers to study different physiological dimensions. Specifically, the findings demonstrate a strong association between the level of auditory exertion, the amount of attention and involvement directed towards stimuli that are readily comprehensible compared to those that demand greater exertion. Continuing with the auditory domain, peripheral physiological signals were studied in order to discriminate four emotions elicited in a subject who listened to music for 21 days, using a previously designed and publicly available protocol. Surprisingly, the processed physiological signals were able to clearly separate the four emotions at the physiological level, demonstrating that music, which is not typically studied extensively in the literature, can be an effective stimulus for eliciting emotions. Following these results, a flat-screen protocol was created to compare physiological responses to purely visual, purely auditory, and combined audiovisual emotional stimuli. The results show that auditory stimuli are more effective in separating emotions at the physiological level. The subjects were found to be much more attentive during the audio-only phase. In order to overcome the limitations of emotional protocols carried out in a laboratory environment, which may elicit fewer emotions due to being an unnatural setting for the subjects under study, a final emotional elicitation protocol was created using virtual reality. Scenes similar to reality were created to elicit four distinct emotions. At the physiological level, it was noted that this environment is more effective in eliciting emotions. To our knowledge, this is the first protocol specifically designed for virtual reality that elicits diverse emotions. Furthermore, even in terms of classification, the use of virtual reality has been shown to be superior to traditional flat-screen protocols, opening the doors to virtual reality for the study of conditions related to emotional control

Archivio della ricerca- Università di Roma La Sapienza

Adaptive Hidden Markov Noise Modelling for Speech Enhancement

Author: Bai Jiongjun
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/05/2013
Field of study

A robust and reliable noise estimation algorithm is required in many speech enhancement systems. The aim of this thesis is to propose and evaluate a robust noise estimation algorithm for highly non-stationary noisy environments. In this work, we model the non-stationary noise using a set of discrete states with each state representing a distinct noise power spectrum. In this approach, the state sequence over time is conveniently represented by a Hidden Markov Model (HMM). In this thesis, we first present an online HMM re-estimation framework that models time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics by a sequential model update procedure that tracks the noise characteristics during the absence of speech. In addition the algorithm will when necessary create new model states to represent novel noise spectra and will merge existing states that have similar characteristics. We then extend our work in robust noise estimation during speech activity by incorporating a speech model into our existing noise model. The noise characteristics within each state are updated based on a speech presence probability which is derived from a modified Minima controlled recursive averaging method. We have demonstrated the effectiveness of our noise HMM in tracking both stationary and highly non-stationary noise, and shown that it gives improved performance over other conventional noise estimation methods when it is incorporated into a standard speech enhancement algorithm

Spiral - Imperial College Digital Repository

Classification and Separation Techniques based on Fundamental Frequency for Speech Enhancement

Author: Cabañas-Molero Pablo-Antonio
Publication venue: Jaén : Universidad de Jaén
Publication date: 01/01/2016
Field of study

[ES] En esta tesis se desarrollan nuevos algoritmos de clasificación y mejora de voz basados en las propiedades de la frecuencia fundamental (F0) de la señal vocal. Estas propiedades permiten su discriminación respecto al resto de señales de la escena acústica, ya sea mediante la definición de características (para clasificación) o la definición de modelos de señal (para separación). Tres contribuciones se aportan en esta tesis: 1) un algoritmo de clasificación de entorno acústico basado en F0 para audífonos digitales, capaz de clasificar la señal en las clases voz y no-voz; 2) un algoritmo de detección de voz sonora basado en la aperiodicidad, capaz de funcionar en ruido no estacionario y con aplicación a mejora de voz; 3) un algoritmo de separación de voz y ruido basado en descomposición NMF, donde el ruido se modela de una forma genérica mediante restricciones matemáticas.[EN]This thesis is focused on the development of new classification and speech enhancement algorithms based, explicitly or implicitly, on the fundamental frequency (F0). The F0 of speech has a number of properties that enable speech discrimination from the remaining signals in the acoustic scene, either by defining F0-based signal features (for classification) or F0-based signal models (for separation). Three main contributions are included in this work: 1) an acoustic environment classification algorithm for hearing aids based on F0 to classify the input signal into speech and nonspeech classes; 2) a frame-by-frame basis voiced speech detection algorithm based on the aperiodicity measure, able to work under non-stationary noise and applicable to speech enhancement; 3) a speech denoising algorithm based on a regularized NMF decomposition, in which the background noise is described in a generic way with mathematical constraints.Tesis Univ. Jaén. Departamento de Ingeniería de Telecomunición. Leída el 11 de enero de 201

RUJA (Repositorio Institucional de la Universidad de Jaén)