822 research outputs found

    UmobiTalk: Ubiquitous Mobile Speech Based Learning Language Translator for Sesotho Language

    Get PDF
    Published ThesisThe need to conserve the under-resourced languages is becoming more urgent as some of them are becoming extinct; natural language processing can be used to redress this. Currently, most initiatives around language processing technologies are focusing on western languages such as English and French, yet resources for such languages are already available. The Sesotho language is one of the under-resourced Bantu languages; it is mostly spoken in Free State province of South Africa and in Lesotho. Like other parts of South Africa, Free State has experienced high number of migrants and non-Sesotho speakers from neighboring provinces and countries; such people are faced with serious language barrier problems especially in the informal settlements where everyone tends to speak only Sesotho. Non-Sesotho speakers refers to the racial groups such as Xhosas, Zulus, Coloureds, Whites and more, in which Sesotho language is not their native language. As a solution to this, we developed a parallel corpus that has English as source and Sesotho as a target language and packaged it in UmobiTalk - Ubiquitous mobile speech based learning translator. UmobiTalk is a mobile-based tool for learning Sesotho for English speakers. The development of this tool was based on the combination of automatic speech recognition, machine translation and speech synthesis

    A Framework for Personalized Content Recommendations to Support Informal Learning in Massively Diverse Information WIKIS

    Get PDF
    Personalization has proved to achieve better learning outcomes by adapting to specific learners’ needs, interests, and/or preferences. Traditionally, most personalized learning software systems focused on formal learning. However, learning personalization is not only desirable for formal learning, it is also required for informal learning, which is self-directed, does not follow a specified curriculum, and does not lead to formal qualifications. Wikis among other informal learning platforms are found to attract an increasing attention for informal learning, especially Wikipedia. The nature of wikis enables learners to freely navigate the learning environment and independently construct knowledge without being forced to follow a predefined learning path in accordance with the constructivist learning theory. Nevertheless, navigation on information wikis suffer from several limitations. To support informal learning on Wikipedia and similar environments, it is important to provide easy and fast access to relevant content. Recommendation systems (RSs) have long been used to effectively provide useful recommendations in different technology enhanced learning (TEL) contexts. However, the massive diversity of unstructured content as well as user base on such information oriented websites poses major challenges when designing recommendation models for similar environments. In addition to these challenges, evaluation of TEL recommender systems for informal learning is rather a challenging activity due to the inherent difficulty in measuring the impact of recommendations on informal learning with the absence of formal assessment and commonly used learning analytics. In this research, a personalized content recommendation framework (PCRF) for information wikis as well as an evaluation framework that can be used to evaluate the impact of personalized content recommendations on informal learning from wikis are proposed. The presented recommendation framework models learners’ interests by continuously extrapolating topical navigation graphs from learners’ free navigation and applying graph structural analysis algorithms to extract interesting topics for individual users. Then, it integrates learners’ interest models with fuzzy thesauri for personalized content recommendations. Our evaluation approach encompasses two main activities. First, the impact of personalized recommendations on informal learning is evaluated by assessing conceptual knowledge in users’ feedback. Second, web analytics data is analyzed to get an insight into users’ progress and focus throughout the test session. Our evaluation revealed that PCRF generates highly relevant recommendations that are adaptive to changes in user’s interest using the HARD model with rank-based mean average precision (MAP@k) scores ranging between 100% and 86.4%. In addition, evaluation of informal learning revealed that users who used Wikipedia with personalized support could achieve higher scores on conceptual knowledge assessment with average score of 14.9 compared to 10.0 for the students who used the encyclopedia without any recommendations. The analysis of web analytics data show that users who used Wikipedia with personalized recommendations visited larger number of relevant pages compared to the control group, 644 vs 226 respectively. In addition, they were also able to make use of a larger number of concepts and were able to make comparisons and state relations between concepts

    International Summerschool Computer Science 2014: Proceedings of Summerschool 7.7. - 13.7.2014

    Get PDF
    Proceedings of International Summerschool Computer Science 201

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF

    Creating expert knowledge by relying on language learners : a generic approach for mass-producing language resources by combining implicit crowdsourcing and language learning

    Get PDF
    We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of the generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.peer-reviewe

    The interaction of knowledge sources in word sense disambiguation

    Get PDF
    Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Mixed-Language Arabic- English Information Retrieval

    Get PDF
    Includes abstract.Includes bibliographical references.This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries
    • …
    corecore