Search CORE

157 research outputs found

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Author: Wang Heming
Xu Zhongweiyang
Yang Muqiao
Yu Dong
Yu Meng
Zhang Chunlei
Zhang Hao
Zhang Yixuan
Publication venue
Publication date: 16/09/2023
Field of study

Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynthesize clean, anechoic speech from degraded inputs. This study leverages pre-trained vocoder or codec models to synthesize high-quality speech while enhancing robustness in challenging scenarios. Generative methods effectively handle information loss in speech signals, resulting in regenerated speech that has improved fidelity and reduced artifacts. By harnessing the capabilities of pre-trained models, we achieve faithful reproduction of the original speech in adverse conditions. Experimental evaluations on both simulated datasets and realistic samples demonstrate the effectiveness and robustness of our proposed methods. Especially by leveraging codec, we achieve superior subjective scores for both simulated and realistic recordings. The generated speech exhibits enhanced audio quality, reduced background noise, and reverberation. Our findings highlight the potential of pre-trained generative techniques in speech processing, particularly in scenarios where traditional methods falter. Demos are available at https://whmrtm.github.io/SoundResynthesis.Comment: Paper in submissio

arXiv.org e-Print Archive

Flexible binaural resynthesis of room impulse responses for augmented reality research

Author: Brimijoin W. Owen
Garí Sebastià V. Amengual
Hassager Henrik G.
Robinson Philip W.
Publication venue: HAL CCSD
Publication date: 06/09/2019
Field of study

International audienceA basic building block in audio for Augmented Reality (AR) is the use of virtual sound sources layered on top of any real sources present in an environment. In order to perceive these virtual sources as belonging to the natural scene it is important to match their acoustic parameters to those of a real source with the same characteristics, i.e. radiation properties, sound propagation and head-related impulse response (HRIR). However, it is still unclear to what extent these parameters need to be matched in order to generate plausible scenes in which virtual sound sources blend seamlessly with real sound sources. This contribution presents an auralization framework that allows protyping of augmented reality scenarios from measured multichannel room impulse responses to get a better understanding of the relevance of individual acoustic parameters.A well-established approach for binaural measurement and reproduction of sound scenes is based on capturing binaural room impulse responses (BRIR) using a head and torso simulator (HATS) and convolving these BRIRs dynamically with audio content according to the listener head orientation. However, such measurements are laborious and time consuming, requiring measuring the scene with the HATS in multiple orientations. Additionally, the HATS HRIR is inherently encoded in the BRIRs, making them unsuitable for personalization for different listeners. The approach presented here consists of the resynthesis and dynamic binaural reproduction of multichannel room impulse responses (RIR) using an arbitrary HRIR dataset. Using a compact microphone array, we obtained a pressure RIR and a set of auxiliary RIRs, and we applied the Spatial Decomposition Method (SDM) to estimate the direction-of-arrival (DOA) of the different sound events in the RIR. The DOA information was used to map sound pressure to different locations by means of an HRIR dataset, generating a binaural room impulse response (BRIR) for a specific orientation. By either rotating the DOA or the HRIR data set, BRIRs for any direction may be obtained. Auralizations using SDM are known to whiten the spectrum of late reverberation. Available alternatives such as time-frequency equalization were not feasible in this case, as a different time-frequency filter would be necessary for each direction, resulting in a non-homogeneous equalization of the BRIRs. Instead, the resynthesized BRIRs were decomposed into sub-bands and the decay slope of each sub-band was modified independently to match the reverberation time of the original pressure RIR. In this way we could apply the same reverberation correction factor to all BRIRs. In addition, we used a direction independent equalization to correct for timbral effects of equipment, HRIR, and signal processing. Real-time reproduction was achieved by means of a custom Max/MSP patch, in which the direct sound, early reflections and late reverberation were convolved separately to allow real-time changes in the time-energy properties of the BRIRs. The mixing time of the reproduced BRIRs is configurable and a single direction independent reverberation tail is used. To evaluate the quality of the resynthesis method in a real room, we conducted both objective and perceptual comparisons for a variety of source positions. The objective analysis was performed by comparing real measurements of a KEMAR mannequin with the resynthesis at the same receiver location using a simulated KEMAR HRIR. Typical room acoustic parameters of both real and resynthsized acoustics were found to be in good agreement. The perceptual validation consisted of a comparison of a loudspeaker and its resynthesized counterpart. Non-occluding headphones with individual equalization were used to ensure that listeners were able to simultaneously listen to the real and the virtual samples. Subjects were allowed to listen to the sounds for as long as they desired and freely switch between the real and virtual stimuli in real time. The integration of an Optitrack motion tracking system allowed us to present world-locked audio, accounting for head rotations.We present here the results of this listening test (N = 14) with three sections: discrimination, identification, and qualitative ratings. Preliminary analysis revealed that in these conditions listeners were generally able to discriminate between real and virtual sources and were able to consistently identify which of the presented sources was real and which was virtual. The qualitative analysis revealed that timbral differences are the most prominent cues for discrimination and identification, while spatial cues are well preserved. All the listeners reported good externalization of the binaural audio.Future work includes extending the presented validation to more environments, as well as implementing tools to arbitrarily modify BRIRs in the spatial, temporal, and frequency domains in order to study the perceptual requirements of room acoustics reproduction in AR

Recommended from our members

Hardward and algorithm architectures for real-time additive synthesis

Author: Symons Peter Robert
Publication venue
Publication date: 01/01/2005
Field of study

Additive synthesis is a fundamental computer music synthesis paradigm tracing its origins to the work of Fourier and Helmholtz. Rudimentary implementation linearly combines harmonic sinusoids (or partials) to generate tones whose perceived timbral characteristics are a strong function of the partial amplitude spectrum. Having evolved over time, additive synthesis describes a collection of algorithms each characterised by the time-varying linear combination of basis components to generate temporal evolution of timbre. Basis components include exactly harmonic partials, inharmonic partials with time-varying frequency or non-sinusoidal waveforms each with distinct spectral characteristics. Additive synthesis of polyphonic musical instrument tones requires a large number of independently controlled partials incurring a large computational overhead whose investigation and reduction is a key motivator for this work. The thesis begins with a review of prevalent synthesis techniques setting additive synthesis in context and introducing the spectrum modelling paradigm which provides baseline spectral data to the additive synthesis process obtained from the analysis of natural sounds. We proceed to investigate recursive and phase accumulating digital sinusoidal oscillator algorithms, defining specific metrics to quantify relative performance. The concepts of phase accumulation, table lookup phase-amplitude mapping and interpolated fractional addressing are introduced and developed and shown to underpin an additive synthesis subclass - wavetable lookup synthesis (WLS). WLS performance is simulated against specific metrics and parameter conditions peculiar to computer music requirements. We conclude by presenting processing architectures which accelerate computational throughput of specific WLS operations and the sinusoidal additive synthesis model. In particular, we introduce and investigate the concept of phase domain processing and present several “pipeline friendly” arithmetic architectures using this technique which implement the additive synthesis of sinusoidal partials

Open Research Online (The Open University)

OpenGrey Repository

Spectral Analysis for Modal Parameters Linear Estimate

Author: F. Avanzini
M. Tiraboschi
S. Ntalampiras
Publication venue: 'SMC Media'
Publication date: 01/06/2020
Field of study

AIR Universita degli studi di Milano

De-Elastisation: From Asynchronous Dataflows to Synchronous Circuits

Author: Edwards Douglas
Garside James
Jelodari Mamaghani Mahdi
Publication venue: 'EDAA'
Publication date: 01/01/2015
Field of study

Crossref

The University of Manchester - Institutional Repository

Recommended from our members

High Level Synthesis for Packet Processing Pipelines

Author: Soviani Cristian
Publication venue: Department of Computer Science, Columbia University
Publication date: 01/01/2007
Field of study

Packet processing is an essential function of state-of-the-art network routers and switches. Implementing packet processors in pipelined architectures is a well-known, established technique, albeit different approaches have been proposed. The design of packet processing pipelines is a delicate trade-off between the desire for abstract specifications, short development time, and design maintainability on one hand and very aggressive performance requirements on the other. This thesis proposes a coherent design flow for packet processing pipelines. Like the design process itself, I start by introducing a novel domain-specific language that provides a high-level specification of the pipeline. Next, I address synthesizing this model and calculating its worst-case throughput. Finally, I address some specific circuit optimization issues. I claim, based on experimental results, that my proposed technique can dramatically improve the design process of these pipelines, while the resulting performance matches the expectations of hand-crafted design. The considered pipelines exhibit a pseudo-linear topology, which can be too restrictive in the general case. However, especially due to its high performance, such an architecture may be suitable for applications outside packet processing, in which case some of my proposed techniques could be easily adapted. Since I ran my experiments on FPGAs, this work has an inherent bias towards that technology; however, most results are technology-independent

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Author: Caspe Franco
McPherson Andrew
Robertson Andrew
Saitis Charalampos
Sandler Mark
Shier Jordie
Publication venue
Publication date: 12/09/2023
Field of study

Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.Comment: To be published in The Proceedings of Forum Acusticum, Sep 2023, Turin, Ital

arXiv.org e-Print Archive

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Author: Caspe F
Forum Acusticum
Mcpherson A
Robertson A
Saitis C
Sandler M
Shier J
Publication venue
Publication date: 07/06/2023
Field of study

Queen Mary Research Online