834 research outputs found

    Predictability of the effects of phoneme merging on speech recognition performance by quantifying phoneme relations

    Get PDF
    To investigate whether the impact of phoneme merging on recognition rate can be predicted, different measures to quantify the relationship between two phonemes a and b were compared: (1) the functional load of their opposition, (2) the bigram type preservation, (3) their information radius, (4) their distance within an information gain tree induced from a distinctive feature matrix, and (5) the symmetric Kullback-Leibler divergence. For each of 25 phoneme pairs we trained a speech recognizer on data in which the respective pair was merged. Based on correlation analyses and predictor selection in stepwise regression modelling we found that the impact of phoneme merging on accuracy can tentatively be captured in terms of functional load and tree distance between the merged phonemes

    Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

    Get PDF
    Phone tokenisers are used in spoken language recognition (SLR) to obtain elementary phonetic information. We present a study on the use of deep neural network tokenisers. Unsupervised crosslingual adaptation was performed to adapt the baseline tokeniser trained on English conversational telephone speech data to different languages. Two training and adaptation approaches, namely cross-entropy adaptation and state-level minimum Bayes risk adaptation, were tested in a bottleneck i-vector and a phonotactic SLR system. The SLR systems using the tokenisers adapted to different languages were combined using score fusion, giving 7-18% reduction in minimum detection cost function (minDCF) compared with the baseline configurations without adapted tokenisers. Analysis of results showed that the ensemble tokenisers gave diverse representation of phonemes, thus bringing complementary effects when SLR systems with different tokenisers were combined. SLR performance was also shown to be related to the quality of the adapted tokenisers
    • …
    corecore