2 research outputs found

    n-gram Frequency Ranking with additional sources of information in a multiple-Gaussian classifier for Language Identification

    Get PDF
    We present new results of our n-gram frequency ranking used for language identification. We use a Parallel phone recognizer (as in PPRLM), but instead of the language model, we create a ranking with the most frequent n-grams. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to model reliably a longer span than PPRLM. This approach overcomes PPRLM (15% relative improvement) due to the inclusion of 4-gram and 5-gram in the classifier. We will also see that the combination of this technique with other sources of information (feature vectors in our classifier) is also advantageous over PPRLM, showing also a detailed analysis of the relevance of these sources and a simple feature selection technique to cope with long feature vectors. The test database has been significantly increased using cross-fold validation, so comparisons are now more reliable

    Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion

    No full text
    Abstract We present the GT-IIR language recognition system submitted to the 2005 NIST Language Recognition Evaluation. Different from conventional frame-based feature extraction, our system adopts a collection of broad output scores from different language recognition systems to form utterance-level score distribution feature vectors over all competing languages, and build vector-based spoken language recognizers by fusing two distinct verifiers, one based on a simple linear discriminant function (LDF) and the other on a complex artificial neural network (ANN), to make final language recognition decisions. The diverse error patterns exhibited in individual LDF and ANN systems facilitate smaller overall verification errors in the combined system than those obtained in separate systems
    corecore