1 research outputs found
Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space
In many applications of multi-microphone multi-device processing, the
synchronization among different input channels can be affected by the lack of a
common clock and isolated drops of samples. In this work, we address the issue
of sample drop detection in the context of a conversational speech scenario,
recorded by a set of microphones distributed in space. The goal is to design a
neural-based model that given a short window in the time domain, detects
whether one or more devices have been subjected to a sample drop event. The
candidate time windows are selected from a set of large time intervals,
possibly including a sample drop, and by using a preprocessing step. The latter
is based on the application of normalized cross-correlation between signals
acquired by different devices. The architecture of the neural network relies on
a CNN-LSTM encoder, followed by multi-head attention. The experiments are
conducted using both artificial and real data. Our proposed approach obtained
F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A
comparable performance was found in a larger set of experiments conducted on a
set of multi-channel artificial scenes.Comment: Submitted to ICASSP 202