Search CORE

230 research outputs found

A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones

Author: Cavallaro A
Wang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Crossref

Queen Mary Research Online

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Edinburgh Research Explorer

Pseudo-Determined Blind Source Separation for Ad-hoc Microphone Networks

Author: Cavallaro A
Wang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2018
Field of study

Crossref

Queen Mary Research Online

Over-Determined Source Separation and Localization Using Distributed Microphones

Author: Cavallaro A
Reiss JD
Wang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/05/2016
Field of study

This work was supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/K007491/1

Crossref

Queen Mary Research Online

Microphone-Array Ego-Noise Reduction Algorithms for Auditory Micro Aerial Vehicles

Author: Cavallaro A
Wang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/02/2017
Field of study

Crossref

Queen Mary Research Online

The 2015 Signal Separation Evaluation Campaign

Author: Ito Nobutaka
Kitamura Daichi
Liutkus Antoine
Ono Nobutaka
Rafii Zafar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audienceIn this paper, we report the 2015 community-based Signal Separation Evaluation Campaign (SiSEC 2015). This SiSEC consists of four speech and music datasets including two new datasets: " Professionally produced music recordings " and " Asynchronous recordings of speech mixtures ". Focusing on them, we overview the campaign specifications such as the tasks, datasets and evaluation criteria. We also summarize the performance of the submitted systems

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Independent Component Analysis Enhancements for Source Separation in Immersive Audio Environments

Author: Zhao Yue
Publication venue: UKnowledge
Publication date: 01/01/2013
Field of study

In immersive audio environments with distributed microphones, Independent Component Analysis (ICA) can be applied to uncover signals from a mixture of other signals and noise, such as in a cocktail party recording. ICA algorithms have been developed for instantaneous source mixtures and convolutional source mixtures. While ICA for instantaneous mixtures works when no delays exist between the signals in each mixture, distributed microphone recordings typically result various delays of the signals over the recorded channels. The convolutive ICA algorithm should account for delays; however, it requires many parameters to be set and often has stability issues. This thesis introduces the Channel Aligned FastICA (CAICA), which requires knowledge of the source distance to each microphone, but does not require knowledge of noise sources. Furthermore, the CAICA is combined with Time Frequency Masking (TFM), yielding even better SOI extraction even in low SNR environments. Simulations were conducted for ranking experiments tested the performance of three algorithms: Weighted Beamforming (WB), CAICA, CAICA with TFM. The Closest Microphone (CM) recording is used as a reference for all three. Statistical analyses on the results demonstrated superior performance for the CAICA with TFM. The algorithms were applied to experimental recordings to support the conclusions of the simulations. These techniques can be deployed in mobile platforms, used in surveillance for capturing human speech and potentially adapted to biomedical fields

University of Kentucky

Recommended from our members

Automatic Speech Separation for Brain-Controlled Hearing Technologies

Author: Han Cong
Publication venue
Publication date: 01/01/2024
Field of study

Speech perception in crowded acoustic environments is particularly challenging for hearing impaired listeners. While assistive hearing devices can suppress background noises distinct from speech, they struggle to lower interfering speakers without knowing the speaker on which the listener is focusing. The human brain has a remarkable ability to pick out individual voices in a noisy environment like a crowded restaurant or a busy city street. This inspires the brain-controlled hearing technologies. A brain-controlled hearing aid acts as an intelligent filter, reading wearers’ brainwaves and enhancing the voice they want to focus on. Two essential elements form the core of brain-controlled hearing aids: automatic speech separation (SS), which isolates individual speakers from mixed audio in an acoustic scene, and auditory attention decoding (AAD) in which the brainwaves of listeners are compared with separated speakers to determine the attended one, which can then be amplified to facilitate hearing. This dissertation focuses on speech separation and its integration with AAD, aiming to propel the evolution of brain-controlled hearing technologies. The goal is to help users to engage in conversations with people around them seamlessly and efficiently. This dissertation is structured into two parts. The first part focuses on automatic speech separation models, beginning with the introduction of a real-time monaural speech separation model, followed by more advanced real-time binaural speech separation models. The binaural models use both spectral and spatial features to separate speakers and are more robust to noise and reverberation. Beyond performing speech separation, the binaural models preserve the interaural cues of separated sound sources, which is a significant step towards immersive augmented hearing. Additionally, the first part explores using speaker identifications to improve the performance and robustness of models in long-form speech separation. This part also delves into unsupervised learning methods for multi-channel speech separation, aiming to improve the models' ability to generalize to real-world audio. The second part of the dissertation integrates speech separation introduced in the first part with auditory attention decoding (SS-AAD) to develop brain-controlled augmented hearing systems. It is demonstrated that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. Furthermore, to better align the experimental environment of SS-AAD systems with real-life scenarios, the second part introduces a new AAD task that closely simulates real-world complex acoustic settings. The results show that the SS-AAD system is capable of improving speech intelligibility and facilitating tracking of the attended speaker in realistic acoustic environments. Finally, this part presents employing self-supervised learned speech representation in the SS-AAD systems to enhance the neural decoding of attentional selection

Columbia University Academic Commons

Audio self-supervised learning: a survey

Author: Hu Bin
Jing Xin
Kathan Alexander
Liu Shuo
Mallol-Ragolta Adria
Parada-Cabaleiro Emilia
Qian Kun
Schuller Björn W.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and time consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarising the knowledge in audio SSL are currently missing. To fill this gap, in the present work, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarise the empirical works that exploit the audio modality in multi-modal SSL frameworks, and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions on the development of audio SSL

arXiv.org e-Print Archive

OPUS Augsburg