Search CORE

4 research outputs found

Speaker detection in the wild: Lessons learned from JSALT 2019

Author: Abdoli Sajjad
Ben-Yair Bar
Bouaziz Wassim
Bredin Hervé
Bullock Latane
Castan Diego
Chen Sizhu
Cristia Alejandrina
Dehak Najim
Du Jun
Dupoux Emmanuel
Galmant Léo
García Paola
Gill Marie-Philippe
Guo Ling
Kataria Saurabh
Lavechin Marvin
Lee Kong Aik
Nidadavolu Phani Sankar
Okabe Koji
Sun Lei
Titeux Hadrien
Villalba Jesus
Wang Xin
Publication venue: HAL CCSD
Publication date: 02/12/2019
Field of study

Submitted to ICASSP 2020This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker detection; but our first finding was that an effective diarization improves detection, and not having a diarization stage impoverishes the performance. All the different configurations of our research agree on this fact and follow a main backbone that includes diarization as a previous stage. With this backbone, we analyzed the following problems: voice activity detection, how to deal with noisy signals, domain mismatch, how to improve the clustering; and the overall impact of previous stages in the final speaker detection. In this paper, we show partial results for speaker diarizarion to have a better understanding of the problem and we present the final results for speaker detection

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Speaker detection in the wild: Lessons learned from JSALT 2019

Author: Abdoli Sajjad
Ben-Yair Bar
Bouaziz Wassim
Bredin Hervé
Bullock Latane
Castan Diego
Chen Sizhu
Cristia Alejandrina
Dehak Najim
Du Jun
Dupoux Emmanuel
Galmant Léo
García Paola
Gill Marie-Philippe
Guo Ling
Kataria Saurabh
Lavechin Marvin
Lee Kong Aik
Nidadavolu Phani Sankar
Okabe Koji
Sun Lei
Titeux Hadrien
Villalba Jesus
Wang Xin
Publication venue: HAL CCSD
Publication date: 01/11/2020
Field of study

International audienceThis paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker detection; but our first finding was that an effective diarization improves detection, and not having a diarization stage impoverishes the performance. All the different configurations of our research agree on this fact and follow a main backbone that includes diarization as a previous stage. With this backbone, we analyzed the following problems: voice activity detection, how to deal with noisy signals, domain mismatch, how to improve the clustering; and the overall impact of previous stages in the final speaker detection. In this paper, we show partial results for speaker diarizarion to have a better understanding of the problem and we present the final results for speaker detection

INRIA a CCSD electronic archive server

Latent Iterative Refinement for Modular Source Separation

Author: Bralios Dimitrios
Roux Jonathan Le
Smaragdis Paris
Tzinis Efthymios
Wichern Gordon
Publication venue
Publication date: 21/11/2022
Field of study

Traditional source separation approaches train deep neural network models end-to-end with all the data available at once by minimizing the empirical risk on the whole training set. On the inference side, after training the model, the user fetches a static computation graph and runs the full model on some specified observed mixture signal to get the estimated source signals. Additionally, many of those models consist of several basic processing blocks which are applied sequentially. We argue that we can significantly increase resource efficiency during both training and inference stages by reformulating a model's training and inference procedures as iterative mappings of latent signal representations. First, we can apply the same processing block more than once on its output to refine the input signal and consequently improve parameter efficiency. During training, we can follow a block-wise procedure which enables a reduction on memory requirements. Thus, one can train a very complicated network structure using significantly less computation compared to end-to-end training. During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module

arXiv.org e-Print Archive