155,928 research outputs found
On the effect of SNR and superdirective beamforming in speaker diarisation in meetings
This paper examines the effect of sensor performance on speaker diarisation in meetings and investigates the use of more advanced beamforming techniques, beyond the typically employed delay-sum beamformer, for mitigating the effects of poorer sensor performance. We present superdirective beamforming and investigate how different time difference of arrival (TDOA) smoothing and beamforming techniques influence the performance of state-of-the-art diarisation systems. We produced and transcribed a new corpus of meetings recorded in the instrumented meeting room using a high SNR analogue and a newly developed low SNR digital MEMS microphone array (DMMA.2). This research demonstrates that TDOA smoothing has a significant effect on the diarisation error rate and that simple noise reduction and beamforming schemes suffice to overcome audio signal degradation due to the lower SNR of modern MEMS microphones. Index Terms — Speaker diarisation in meetings, digital MEMS microphone array, time difference of arrival (TDOA), superdirective beamforming 1
Phonetic Temporal Neural Model for Language Identification
Deep neural models, particularly the LSTM-RNN model, have shown great
potential for language identification (LID). However, the use of phonetic
information has been largely overlooked by most existing neural LID methods,
although this information has been used very successfully in conventional
phonetic LID systems. We present a phonetic temporal neural model for LID,
which is an LSTM-RNN LID system that accepts phonetic features produced by a
phone-discriminative DNN as the input, rather than raw acoustic features. This
new model is similar to traditional phonetic LID methods, but the phonetic
knowledge here is much richer: it is at the frame level and involves compacted
information of all phones. Our experiments conducted on the Babel database and
the AP16-OLR database demonstrate that the temporal phonetic neural approach is
very effective, and significantly outperforms existing acoustic neural models.
It also outperforms the conventional i-vector approach on short utterances and
in noisy conditions.Comment: Submitted to TASL
- …