Search CORE

380 research outputs found

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Edinburgh Research Explorer

Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

Author: Benesty Jacob
Christensen Mads Græsbøll
Jensen Jesper Rindom
Jensen Søren Holdt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

VBN

Estimation of acoustic echoes using expectation-maximization methods

Author: Gannot Sharon
Jensen Jesper Rindom
Saqib Usama
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/08/2020
Field of study

VBN

Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

Author: Ahrens Jens
Calamia Paul
Deppisch Thomas
Gar\ued Sebasti\ue0 V. Amengual
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided

Chalmers Research

Enhancement of Periodic Signals:with Application to Speech Signals

Author: Jensen Jesper Rindom
Publication venue
Publication date: 01/01/2012
Field of study

VBN

Paraunitary oversampled filter bank design for channel coding

Author: A Papoulis
C Liu
F Labeau
F Labeau
F Labeau
F Labeau
F Lorenzelli
H Bölcskei
H Bölcskei
J Kliewer
JG McWhirter
M Harteneck
PP Vaidyanathan
PP Vaidyanathan
S Redif
S Weiss
S Weiss
S Weiss
T Esmailian
T Tanaka
W Kellermann
WH Neo
Z Cvetković
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2006
Field of study

Oversampled filter banks (OSFBs) have been considered for channel coding, since their redundancy can be utilised to permit the detection and correction of channel errors. In this paper, we propose an OSFB-based channel coder for a correlated additive Gaussian noise channel, of which the noise covariance matrix is assumed to be known. Based on a suitable factorisation of this matrix, we develop a design for the decoder's synthesis filter bank in order to minimise the noise power in the decoded signal, subject to admitting perfect reconstruction through paraunitarity of the filter bank. We demonstrate that this approach can lead to a significant reduction of the noise interference by exploiting both the correlation of the channel and the redundancy of the filter banks. Simulation results providing some insight into these mechanisms are provided

Crossref

University of Strathclyde Institutional Repository

Online Research @ Cardiff

Springer - Publisher Connector

Directory of Open Access Journals

Model-based Sparse Component Analysis for Reverberant Speech Localization

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Taghizadeh Mohammadjavad
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, the problem of multiple speaker localization via speech separation based on model-based sparse recovery is studies. We compare and contrast computational sparse optimization methods incorporating harmonicity and block structures as well as autoregressive dependencies underlying spectrographic representation of speech signals. The results demonstrate the effectiveness of block sparse Bayesian learning framework incorporating autoregressive correlations to achieve a highly accurate localization performance. Furthermore, significant improvement is obtained using ad-hoc microphones for data acquisition set-up compared to the compact microphone array

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Joint Pitch and DOA Estimation Using the ESPRIT method

Author: Amir Leshem
Jensen Jesper Rindom
Liao Guisheng
Wu Yuntao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

VBN

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Independent Component Analysis Enhancements for Source Separation in Immersive Audio Environments

Author: Zhao Yue
Publication venue: UKnowledge
Publication date: 01/01/2013
Field of study

In immersive audio environments with distributed microphones, Independent Component Analysis (ICA) can be applied to uncover signals from a mixture of other signals and noise, such as in a cocktail party recording. ICA algorithms have been developed for instantaneous source mixtures and convolutional source mixtures. While ICA for instantaneous mixtures works when no delays exist between the signals in each mixture, distributed microphone recordings typically result various delays of the signals over the recorded channels. The convolutive ICA algorithm should account for delays; however, it requires many parameters to be set and often has stability issues. This thesis introduces the Channel Aligned FastICA (CAICA), which requires knowledge of the source distance to each microphone, but does not require knowledge of noise sources. Furthermore, the CAICA is combined with Time Frequency Masking (TFM), yielding even better SOI extraction even in low SNR environments. Simulations were conducted for ranking experiments tested the performance of three algorithms: Weighted Beamforming (WB), CAICA, CAICA with TFM. The Closest Microphone (CM) recording is used as a reference for all three. Statistical analyses on the results demonstrated superior performance for the CAICA with TFM. The algorithms were applied to experimental recordings to support the conclusions of the simulations. These techniques can be deployed in mobile platforms, used in surveillance for capturing human speech and potentially adapted to biomedical fields

University of Kentucky