14 research outputs found

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Grapheme and multilingual posterior features for under-resourced speech recognition: a study on Scottish Gaelic

    Get PDF
    Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronunciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we investigate recently proposed grapheme-based ASR in the framework of Kullback-Leibler divergence based hidden Markov model (KL-HMM) for under-resourced languages, particularly Scottish Gaelic which has no lexical resources. More specifically, we study the use of grapheme and multilingual phoneme class conditional probabilities (posterior features) as feature observations in KL-HMM. ASR studies conducted show that the proposed approach yields better system compared to the conventional HMM/GMM approach using cepstral features. Furthermore, grapheme posterior features estimated using both auxiliary data and Gaelic data yield the best system

    Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

    Get PDF
    Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, we focus on effective building of ASR systems in the absence of linguistic expertise for a new domain or language. Particularly, we consider graphemes as alternate subword units for speech recognition. In a grapheme lexicon, pronunciation of a word is derived from its orthography. However, modeling graphemes for speech recognition is a challenging task for two reasons. Firstly, grapheme-to-phoneme (G2P) relationship can be ambiguous as languages continue to evolve after their spelling has been standardized. Secondly, as elucidated in this thesis, typically ASR systems directly model the relationship between graphemes and acoustic features; and the acoustic features depict the envelope of speech, which is related to phones. In this thesis, a grapheme-based ASR approach is proposed where the modeling of the relationship between graphemes and acoustic features is factored through a latent variable into two models, namely, acoustic model and lexical model. In the acoustic model the relationship between latent variables and acoustic features is modeled, while in the lexical model a probabilistic relationship between latent variables and graphemes is modeled. We refer to the proposed approach as probabilistic lexical modeling based ASR. In the thesis we show that the latent variables can be phones or multilingual phones or clustered context-dependent subword units; and an acoustic model can be trained on domain-independent or language-independent resources. The lexical model is trained on transcribed speech data from the target domain or language. In doing so, the parameters of the lexical model capture a probabilistic relationship between graphemes and phones. In the proposed grapheme-based ASR approach, lexicon learning is implicitly integrated as a phase in ASR system training as opposed to the conventional approach where first phone pronunciation lexicon is developed and then a phone-based ASR system is trained. The potential and the efficacy of the proposed approach is demonstrated through experiments and comparisons with other standard approaches on ASR for resource rich languages, nonnative and accented speech, under-resourced languages, and minority languages. The studies revealed that the proposed framework is particularly suitable when the task is challenged by the lack of both linguistic expertise and transcribed data. Furthermore, our investigations also showed that standard ASR approaches in which the lexical model is deterministic are more suitable for phones than graphemes, while probabilistic lexical model based ASR approach is suitable for both. Finally, we show that the captured grapheme-to-phoneme relationship can be exploited to perform acoustic data-driven G2P conversion

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations

    The tablet teacher: learning literacy through technology in Northern Sotho

    Get PDF
    This study evaluates the efficacy of the Bridges to the Future Initiative - South Africa 2 (BFI) tablet program on early literacy skills, as well as the ways in which learner-operated technology interacts with a traditional South African education system. The BFI is a curriculum-aligned early literacy development intervention implemented through technology in grades 2 and 3 in Northern Sotho1 first-language schools. A mixed-methods research design was utilized, involving three components: a literacy skills test administered through a time sequence trial design; a curriculum-aligned uptake and retention test using a pre-post design; and a qualitative research component including classroom observation, participant interviews and prompted drawings by learners. Paired sample t-tests show significantly higher gains during the treatment period in fluency and comprehension, and significantly higher gains in the control period in decoding individual words. It is theorized that this is due to teacher emphasis on emergent literacy. When initial ability is taken into consideration, all ability levels gain more on average during the treatment period in at least one measured skill. Regression analysis determines that time spent on the BFI program is not the most significant determiner of gains in the intervention period. Qualitative analysis supports this finding and suggests that program use cannot replace quality classroom practice in advancing literacy skills. Learners performed better after a delayed retention period than in an initial uptake test, indicating high rates of retention of knowledge gained through program use and traditional instruction, but inconsistent access to literacy skills gained

    Reconnaissance de l'écriture manuscrite en-ligne par approche combinant systèmes à vastes marges et modèles de Markov cachés

    Get PDF
    Handwriting recognition is one of the leading applications of pattern recognition and machine learning. Despite having some limitations, handwriting recognition systems have been used as an input method of many electronic devices and helps in the automation of many manual tasks requiring processing of handwriting images. In general, a handwriting recognition system comprises three functional components; preprocessing, recognition and post-processing. There have been improvements made within each component in the system. However, to further open the avenues of expanding its applications, specific improvements need to be made in the recognition capability of the system. Hidden Markov Model (HMM) has been the dominant methods of recognition in handwriting recognition in offline and online systems. However, the use of Gaussian observation densities in HMM and representational model for word modeling often does not lead to good classification. Hybrid of Neural Network (NN) and HMM later improves word recognition by taking advantage of NN discriminative property and HMM representational capability. However, the use of NN does not optimize recognition capability as the use of Empirical Risk minimization (ERM) principle in its training leads to poor generalization. In this thesis, we focus on improving the recognition capability of a cursive online handwritten word recognition system by using an emerging method in machine learning, the support vector machine (SVM). We first evaluated SVM in isolated character recognition environment using IRONOFF and UNIPEN character databases. SVM, by its use of principle of structural risk minimization (SRM) have allowed simultaneous optimization of representational and discriminative capability of the character recognizer. We finally demonstrate the various practical issues in using SVM within a hybrid setting with HMM. In addition, we tested the hybrid system on the IRONOFF word database and obtained favourable results.Nos travaux concernent la reconnaissance de l'écriture manuscrite qui est l'un des domaines de prédilection pour la reconnaissance des formes et les algorithmes d'apprentissage. Dans le domaine de l'écriture en-ligne, les applications concernent tous les dispositifs de saisie permettant à un usager de communiquer de façon transparente avec les systèmes d'information. Dans ce cadre, nos travaux apportent une contribution pour proposer une nouvelle architecture de reconnaissance de mots manuscrits sans contrainte de style. Celle-ci se situe dans la famille des approches hybrides locale/globale où le paradigme de la segmentation/reconnaissance va se trouver résolu par la complémentarité d'un système de reconnaissance de type discriminant agissant au niveau caractère et d'un système par approche modèle pour superviser le niveau global. Nos choix se sont portés sur des Séparateurs à Vastes Marges (SVM) pour le classifieur de caractères et sur des algorithmes de programmation dynamique, issus d'une modélisation par Modèles de Markov Cachés (HMM). Cette combinaison SVM/HMM est unique dans le domaine de la reconnaissance de l'écriture manuscrite. Des expérimentations ont été menées, d'abord dans un cadre de reconnaissance de caractères isolés puis sur la base IRONOFF de mots cursifs. Elles ont montré la supériorité des approches SVM par rapport aux solutions à bases de réseaux de neurones à convolutions (Time Delay Neural Network) que nous avions développées précédemment, et leur bon comportement en situation de reconnaissance de mots
    corecore