1 research outputs found
Unsupervised crosslingual adaptation of tokenisers for spoken language recognition
Phone tokenisers are used in spoken language recognition (SLR) to obtain elementary
phonetic information. We present a study on the use of deep neural
network tokenisers. Unsupervised crosslingual adaptation was performed to
adapt the baseline tokeniser trained on English conversational telephone speech
data to different languages. Two training and adaptation approaches, namely
cross-entropy adaptation and state-level minimum Bayes risk adaptation, were
tested in a bottleneck i-vector and a phonotactic SLR system. The SLR systems
using the tokenisers adapted to different languages were combined using score
fusion, giving 7-18% reduction in minimum detection cost function (minDCF)
compared with the baseline configurations without adapted tokenisers. Analysis
of results showed that the ensemble tokenisers gave diverse representation of
phonemes, thus bringing complementary effects when SLR systems with different
tokenisers were combined. SLR performance was also shown to be related
to the quality of the adapted tokenisers