Search CORE

90 research outputs found

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author
Publication venue: Springer
Publication date: 13/01/2016
Field of study

Springer - Publisher Connector

Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction

Author: Ann Spriet
Benesty
Buchner
Claesson
Cox
Doclo
Doclo
Doclo
Ephraim
Frost
Gannot
Greenberg
Greenberg
Griffiths
Herbordt
Herbordt
Hoffman
Hoshuyama
Hoshuyama
Jablon
Jan Wouters
Link
Marc Moonen
Nilsson
Nordebo
Nordholm
Rombouts
Shynk
Simon Doclo
Sohn
Spriet
Spriet
Spriet
Spriet
Van Gerven
Van Veen
Vanden Berghe
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Online source separation in reverberant environments exploiting known speaker locations

Author: Jack D. Harris (7202816)
Publication venue
Publication date: 01/01/2015
Field of study

This thesis concerns blind source separation techniques using second order statistics and higher order statistics for reverberant environments. A focus of the thesis is algorithmic simplicity with a view to the algorithms being implemented in their online forms. The main challenge of blind source separation applications is to handle reverberant acoustic environments; a further complication is changes in the acoustic environment such as when human speakers physically move. A novel time-domain method which utilises a pair of finite impulse response filters is proposed. The method of principle angles is defined which exploits a singular value decomposition for their design. The pair of filters are implemented within a generalised sidelobe canceller structure, thus the method can be considered as a beamforming method which cancels one source. An adaptive filtering stage is then employed to recover the remaining source, by exploiting the output of the beamforming stage as a noise reference. A common approach to blind source separation is to use methods that use higher order statistics such as independent component analysis. When dealing with realistic convolutive audio and speech mixtures, processing in the frequency domain at each frequency bin is required. As a result this introduces the permutation problem, inherent in independent component analysis, across the frequency bins. Independent vector analysis directly addresses this issue by modeling the dependencies between frequency bins, namely making use of a source vector prior. An alternative source prior for real-time (online) natural gradient independent vector analysis is proposed. A Student's t probability density function is known to be more suited for speech sources, due to its heavier tails, and is incorporated into a real-time version of natural gradient independent vector analysis. The final algorithm is realised as a real-time embedded application on a floating point Texas Instruments digital signal processor platform. Moving sources, along with reverberant environments, cause significant problems in realistic source separation systems as mixing filters become time variant. A method which employs the pair of cancellation filters, is proposed to cancel one source coupled with an online natural gradient independent vector analysis technique to improve average separation performance in the context of step-wise moving sources. This addresses `dips' in performance when sources move. Results show the average convergence time of the performance parameters is improved. Online methods introduced in thesis are tested using impulse responses measured in reverberant environments, demonstrating their robustness and are shown to perform better than established methods in a variety of situations

Loughborough University Institutional Repository

Implementation and evaluation of a low complexity microphone array for speaker recognition

Author: Zulu Peleira Nicholas
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2005
Field of study

Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field

Cape Town University OpenUCT

Performance of optimized sound field control techniques in simulated and real acoustic environments

Author: Jan Abildgaard Pedersen
Marek Olik
Martin Møller
Martin Olsen
Philip Coleman
Philip Jackson
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Crossref

Incorporation of three-dimensional audio into virtual reality scenes for an improved immersive experience

Author: Henriksson Martí Eric
Publication venue: Universitat Politècnica de Catalunya
Publication date: 13/09/2017
Field of study

Audio is a crucial aspect to bear in mind when designing virtual reality applications, as it can add a whole new level of immersion to this kind of experiences if properly used. In order to create realistic sound, it is essential to take audio spatialization into consideration, providing the information necessary for an individual to estimate the position of sound sources and the characteristics of surrounding spaces. This project proposes implementing spatial audio in virtual reality scenes created with a game engine, as well as providing all of the theoretical bases that explain how this can be ultimately achieved. It first touches upon how the human auditory system is able to estimate the direction and distance to an audio source by interpreting cues such as time and level differences between ears, pinnae reflections, reverberation and general variations in loudness. Next, the limited spatial properties present in the most common audio reproduction systems are discussed, arguing why they are insufficient for virtual reality applications. Two spatial audio recording and reproduction techniques for headphones and loudspeakers are presented as alternatives for virtual reality scenarios in which the user remains static. As a means of acquiring the knowledge necessary to understand more advanced spatial audio systems, the concept known as Head Related Transfer Function or HRTF is introduced in great detail. It is explained how HRTFs encompass all physical cues that condition sound localization, as well as how the frequency responses that characterize them can be experimentally measured and used for artificial spatialization of virtual sources. Several HRTF-based spatial audio systems are presented, differentiating between those that apply HRTFs as mathematical models and those that make use of experimental impulse response data sets. These advanced models are the way to go if spatial audio is to be applied to virtual reality experiences that involve user motion, as they are capable of constantly adapting to the user’s position and direction relative to the present virtual sources. The rest of the project focuses on how some of the mentioned HRTF-based spatial audio systems can be implemented in the Unity game engine. The poor built-in spatialization options the main software offers can be complemented and greatly improved with the use of audio plugins that perform HRTF filtering and introduce features such as sound occlusion, room simulation models and sound directivity patterns. Three demos with different levels of complexity are finally carried out in Unity in order to showcase the virtues of spatial audio in virtual reality applications

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Extraordinary acoustic transmission via supercoupling and self-interference cancellation

Author: Byrne Matthew Scott
Publication venue
Publication date: 16/10/2023
Field of study

Supercoupling is a widely researched topic in wave engineering, which has been used to build coupling channels that can, in principle, support total transmission and complete phase uniformity, independent of the length of the channel. This has generally been accomplished by employing dispersion in media that display a near-zero index. In the field of acoustics, prior works have required the presence of periodic embedded resonators, such as membranes or Helmholtz resonators, in order to observe near-zero properties. Here it is shown, theoretically and experimentally, that supercoupling can occur in an acoustic channel without the presence of embedded resonators. A compressibility-near-zero (CNZ) acoustic channel was observed to show remarkable properties analogous to those found in electromagnetics. Furthermore, these principles are employed to develop an acoustic power divider, which takes advantage of the CNZ properties of the channel to also exhibit phase invariance at the output. In the next section, another extraordinary acoustic transmission phenomenon is explored, regarding the potential for sending and receiving from a single acoustic transducer at the same time and at the same frequency. This is made possible through an electrical circuit that is designed to cancel self-interfering signals in acoustic measurement systems. Systems that employ self-interference cancellation (SIC) are often referred to as simultaneous transmit and receive (STAR) or in-band full duplex (IBFD) systems, which have recently enabled sending and receiving of Radio Frequency (RF) signals at the same time and at the same frequency. This has led to commercialization efforts with the promise of doubling the throughput of traditional radio systems including Wi-Fi and 5G cellular communications. Prior to these advances, researchers in vibration control explored self-sensing actuator systems, also referred to as sensoriactuators or sensorless control systems. Inspired by these developments, these approaches are combined and extended to explore STAR functionality in an acoustic measurement system. First, self-interference cancellation (SIC) is applied to time-domain measurements to demonstrate the potential for a practical, single-transducer ultrasonic nondestructive evaluation (NDE) system to measure echo returns while it is actively transmitting at the same frequency. Theoretical models and experimental results are presented and discussed.Electrical and Computer Engineerin

Texas ScholarWorks

Offline and real time noise reduction in speech signals using the discrete wavelet packet decomposition

Author: Oktar Mehmet Alper
Publication venue
Publication date
Field of study

This thesis describes the development of an offline and real time wavelet based speech enhancement system to process speech corrupted with various amounts of white Gaussian noise and other different noise types

UWE Bristol Research Repository

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author: A Rix
AL Maas
B Li
BDV Veen
C-P Chen
CH Knapp
E Habets
E Habets
F Weninger
GE Hinton
GE Hinton
H Hermansky
H Kuttruff
J Allen
J Li
JL Gauvain
K Lebart
M Delcroix
MJF Gales
O Cappe
OLF III
R Chen
S Fischer
S Furui
S Gannot
S Subramaniam
T Toda
T Yoshioka
TH Falk
TH Li
X Xiao
X Xiao
Y Hu
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.Published versio

Crossref

Springer - Publisher Connector

DR-NTU (Digital Repository of NTU)

ScholarBank@NUS

Application of sound source separation methods to advanced spatial audio systems

Author: Cobos Serrano Máximo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 03/12/2010
Field of study

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

RiuNet