270 research outputs found

    English in Ethiopia

    Get PDF
    No Abstract

    Unsupervised Machine Learning Approach for Tigrigna Word Sense Disambiguation

    Get PDF
    All human languages have words that can mean different things in different contexts. Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). We use unsupervised machine learning techniques to address the problem of automatically deciding the correct sense of an ambiguous word Tigrigna texts based on its surrounding context. And we report experiments on four selected Tigrigna ambiguous words due to lack of sufficient training data; these are መደብ read as “medeb” has three different meaning (Program, Traditional bed and Grouping), ሓለፈ read as “halefe”; has four dissimilar meanings (Pass, Promote, Boss and Pass away), ሃደመ read as “hademe”; has two different meaning (Running and Building house) and, ከበረ read as “kebere”; has two different meaning (Respecting and Expensive).Finally we tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.8.1 package. “Use training set” evaluation mode was selected to learn the selected algorithms in the preprocessed dataset. We have evaluated the algorithms for the four ambiguous words and achieved the best accuracy within the range of 67 to 83.3 for EM which is encouraging result. Keywords: Attribute- Relation File Format, Cross Validation, Consonant Vowel, Machine Readable Dictionary, Natural Language Processing, System for Ethiopic Representation in ASCII, Word Sense Disambiguatio

    GENERATING AMHARIC PRESENT TENSE VERBS: A NETWORK MORPHOLOGY & DATR ACCOUNT

    Get PDF
    In this thesis I attempt to model, that is, computationally reproduce, the natural transmission (i.e. inflectional regularities) of twenty present tense Amharic verbs (i.e. triradicals beginning with consonants) as used by the language’s speakers. I root my approach in the linguistic theory of network morphology (NM) and model it using the DATR evaluator. In Chapter 1, I provide an overview of Amharic and discuss the fidel as an abugida, the verb system’s root-and-pattern morphology, and how radicals of each lexeme interacts with prefixes and suffixes. I offer an overview of NM in Chapter 2 and DATR in Chapter 3. In both chapters I draw attention to and help interpret key terms used among scholars doing work in both fields. In Chapter 4 I set forth my full theory, along with notation, for generating the paradigms of twenty present tense Amharic verbs that follow four different patterns. Chapter 5, the final chapter, contains a summary and offers several conclusions. I provide the DATR output in the Appendix. In writing, my main hope is that this project will make a contribution, however minimal or sizeable, that might advance the field of Amharic studies in particular and (computational) linguistics in general

    Development of Multilingual Resource Management Mechanisms for Libraries

    Get PDF
    Multilingual is one of the important concept in any library. This study is create on the basis of global recommendations and local requirement for each and every libraries. Select the multilingual components for setting up the multilingual cluster in different libraries to each user. Development of multilingual environment for accessing and retrieving the library resources among the users as well as library professionals. Now, the methodology of integration of Google Indic Transliteration for libraries have follow the five steps such as (i) selection of transliteration tools for libraries (ii) comparison of tools for libraries (iii) integration Methods in Koha for libraries (iv) Development of Google indic transliteration in Koha for users (v) testing for libraries (vi) results for libraries. Development of multilingual framework for libraries is also an important task in integrated library system and in this section have follow the some important steps such as (i) Bengali Language Installation in Koha for libraries (ii) Settings Multilingual System Preferences in Koha for libraries (iii) Translate the Modules for libraries (iv) Bengali Interface in Koha for libraries. Apart from these it has also shows the Bengali data entry process in Koha for libraries such as Data Entry through Ibus Avro Phonetics for libraries and Data Entry through Virtual Keyboard for libraries. Development of Multilingual Digital Resource Management for libraries by using the DSpace and Greenstone. Management of multilingual for libraries in different areas such as federated searching (VuFind Multilingual Discovery tool ; Multilingual Retrieval in OAI-PMH tool ; Multilingual Data Import through Z39.50 Server ). Multilingual bibliographic data edit through MarcEditor for the better management of integrated library management system. It has also create and editing the content by using the content management system tool for efficient and effective retrieval of multilingual digital content resources among the users

    Epistemic modality in Amharic

    Get PDF
    Wydział NeofilologiiCelem rozprawy jest opis i analiza modalności epistemicznej we współczesnym języku amharskim, dominującym etiosemickim języku Etiopii. Modalność epistemiczną rozumie się jako ocenę przez mówiącego jej/jego niedostatecznej wiedzy w stosunku do propozycji. Badaniu zostało poddanych około 70 wyrażeń epistemicznych (epistemifikatorów) poklasyfikowanych na gramatyczne, leksykalne, łącznikowe (stanowiące kategorię pośrednią) i parentetyczne, podług stopnia ich gramatykalizacji/leksykalizacji. Epistemifikatory znacznie wykraczają poza grupę czasowników modalnych typu „móc, musieć, powinien”. Procedury badawcze zastosowane do analizy semantycznej wyrażeń epistemicznych to testy falsyfikacji/weryfikacji i substytucji, pojęcie wymiaru i cechy oraz metoda analizy zdań w terminach struktury tematyczno-rematycznej. Główne narzędzie ich opisu stanowi osiem wymiarów, skonstruowanych na potrzeby języka amharskiego, z których trzy mają charakter formalny, a pięć – semantyczny. W pracy zaproponowano również szkice semantyczne niektórych wyrażeń (zarówno gramatycznych jak i leksykalnych) oraz poddano bardziej szczegółowemu badaniu trzy typy zdań dopełnieniowych. Ostatnia część pracy poświęcona jest interakcji między modalnością epistemiczną a dwiema innymi, niemodalnymi kategoriami czasu i negacji. Korpus językowy wykorzystany w rozprawie został zaczerpnięty z tekstów pisanych i ustnych, które zostały zinterpretowane i przeanalizowane z pomocą amharskojęzycznych informatorów.This thesis is devoted to a description and analysis of the category of epistemic modality in contemporary Amharic, the dominant Ethiosemitic language spoken in Ethiopia. Epistemic modality is understood as the speaker’s assessment of her/his non-knowledge in respect to the proposition. The thesis deals with 70-odd Amharic epistemic expressions (epistemificators) that have been classified into grammatical, lexical, copular (an intermediate category), and parenthetical, according to their degree of grammaticalization/lexicalization.The epistemificators go considerably beyond the simple concept of “modal verbs”. They were studied semantically by means of the following research procedures: tests of falsification/verification and substitution, the concept of epistemic dimensions and their values, and the analysis of sentences in terms of thematic-rhematic structure. The epistemic expressions are described primarily in terms of eight Amharic-specific dimensions, three formal and five semantic. In addition to the dimensional analysis, for selected epistemic expressions (both grammatical and lexical), prose semantic discussions are provided and, subsequently, three types of complement clauses are examined in greater detail. The final part of the thesis discusses the interaction between epistemic modality and two other categories, namely time and negation. The study is based on a corpus of written and spoken texts taken from both printed and electronic media, which have been interpreted and analyzed with the help of Amharic native-speaker informants

    Crowdsourcing for Speech: Economic, Legal and Ethical analysis

    No full text
    With respect to spoken language resource production, Crowdsourcing - the process of distributing tasks to an open, unspecified population via the internet - offers a wide range of opportunities: populations with specific skills are potentially instantaneously accessible somewhere on the globe for any spoken language. As is the case for most newly introduced high-tech services, crowdsourcing raises both hopes and doubts, certainties and questions. A general analysis of Crowdsourcing for Speech processing could be found in (Eskenazi et al., 2013). This article will focus on ethical, legal and economic issues of crowdsourcing in general (Zittrain, 2008a) and of crowdsourcing services such as Amazon Mechanical Turk (Fort et al., 2011; Adda et al., 2011), a major platform for multilingual language resources (LR) production
    corecore