Search CORE

342 research outputs found

Score-Informed Source Separation for Musical Audio Recordings [An overview]

Author: Ewert S
Mueller M
Pardo B
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

CiteSeerX

Crossref

Queen Mary Research Online

Surrey Research Insight

The impact of exploiting spectro-temporal context in computational speech segregation

Author: Bentsen Thomas
Dau Torsten
Kressner Abigail Anne
May Tobias
Publication venue
Publication date: 01/01/2018
Field of study

The experimental data from the study: https://asa.scitation.org/doi/10.1121/1.5020273 Group 1 contains results, masks and audio from the models of the 16 GMM component segregation system Group 2 contains results, masks and audio from the models of the 64 GMM component segregation system There are three folders: Audio: The CLUE sentences that were used for the listener study IBM = Ideal Binary Mask, UP = UnProcessed, EBM = Estimated Binary Mask. The IBM and UP are stored in one of the configuration folders (Front-end), that is: Audio\Group1\Front-end\icra_01_10sec_matched\UP Audio\Group1\Front-end\icra_01_10sec_matched\IBM Audio\Group1\Front-end\icra_01_10sec_matched\EBM Results: The computed metrics for group 1 & 2 as well as Word Recognition Scores (WRSs) from the listener study BinaryMasks: a priori SNR masks, IBMs and EBMs from group 1 and 2. Developed with Matlab R2016a

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Online Research Database In Technology

Spectro-temporal analysis of complex sounds in the human auditory system

Author: Piechowiak Tobias
Publication venue
Publication date: 01/11/2009
Field of study

Online Research Database In Technology

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Edinburgh Research Explorer

Influence of binary mask estimation errors on robust speaker identification

Author: May Tobias
Publication venue
Publication date: 01/01/2017
Field of study

Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating unreliable components in the feature extraction front-end, full marginalization (FM) discards unreliable feature components in the classification back-end. Finally, bounded marginalization (BM) can be used to combine the evidence from both reliable and unreliable feature components during classification. Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable and unreliable feature components in the context of automatic speaker identification (SID). A systematic evaluation under ideal and non-ideal conditions demonstrated that the robustness to errors in the binary mask varied substantially across the different missing-data strategies. Moreover, full and bounded marginalization showed complementary performances in stationary and non-stationary background noises and were subsequently combined using a simple score fusion. This approach consistently outperformed individual SID systems in all considered experimental conditions

Crossref

Online Research Database In Technology

Multi-channel dereverberation for speech intelligibility improvement in hearing aid applications

Author: Kuklasinski Adam
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2016
Field of study

VBN

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Author: Finkelstein Adam
Jin Zeyu
Su Jiaqi
Publication venue
Publication date: 01/01/2020
Field of study

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref