481 research outputs found

    Internationalisation, Localisation and Customisation Aspects of the Dictionary Application "TshwaneLex" *

    Get PDF
    TshwaneLex is the world's only lexicography software suite with which the entire lexicographic process, from initial compilation all the way to final product, may be conducted in the language of one's choice. This is possible thanks to various aspects of internationalisation, localisation and customisation that are built into TshwaneLex. These are discussed by means of examples drawn from a wide variety of projects and languages. Keywords: internationalisation, localisation, mainstream localisa-tion, development localisation, blowback localisation, customisation, tshwanelex, lexicography, dictionary, software, language interface pack (lip), cilubĂ , isizulu, sesotho sa leboa, setswana, swahili, wels

    Online Dictionaries on the Internet: An Overview for the African Languages

    Get PDF
    The main purpose of this research article is rather bold, in that an attempt is made at a comprehensive overview of all currently available African-language Internet dictionaries. Quite surprisingly, a substantial number of such dictionaries is already available, for a large number of languages, with a relatively large number of users. The key characteristics of these dictionaries and various cross-language distributions are expounded on. In a second section the first South African online dictionary interface is introduced. Although compiled by just a small number of scholars, this dictionary contains a world's first in that lexicographic customisation is implemented on various levels in real time on the Internet. Keywords: lexicography, terminology, dictionaries, internet, online, look-up mode, browse mode, african languages, sesotho sa leboa, simultaneous feedback, fuzzy sf, customisatio

    Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of "TshwaneLex"

    Get PDF
    In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching. Keywords: lexicography, dictionary, software, dictionary writing sys-tem (dws), corpus query package (cqp), tshwanelex, corpus, corpus anno-tation, part-of-speech tagger (pos-tagger), machine learning, northern sotho (sesotho sa leboa

    From Corpus to Dictionary: A Hybrid Prescriptive, Descriptive and Proscriptive Undertaking

    Get PDF
    Despite some heroic efforts over the past few years, Lusoga remains mostly underdeveloped. It is under continuous pressure from more prestigious languages, such as the neighbouring Luganda and especially the only official language in Uganda, English. Lusoga is undergoingrapid language shifts, with new concepts entering the language daily. Ironically, this process is taking place before Lusoga has even been properly reduced to writing. There is no single official orthography that is truly being enforced; people who do write, write as they think fit. Languagedata is needed for the production of reliable reference works. In the absence of a substantial body of published material in Lusoga, the researcher can resort to recording and transcribing the living language. This opens Pandora's box, in that spoken language (which is meant to be heard, and is typically less formal) is far more complex than written language (which is meant to be read, and is typically more formalised). Spoken and written variants are, by definition, different. And yet onewants to move the language forward, in a way, before the time is ripe. But then, with over two million speakers, how much longer can one wait? This article reports on the building of a new Lusoga corpus, nearly half of which consists of transcribed oral data. The writing problems encounteredduring the transcription effort are given detailed attention. Dealing with those writing problems in lexicography requires a multipronged approach. While most could be solved by laying down a norm, and thus through prescriptive lexicography, others need a more cautionary approach,and thus descriptive lexicography. Others still can only sensibly be solved when the lexicographer proposes certain options in defiance of existing norms and assumptions, at which point proscriptive lexicography needs to be called in

    From "TshwaneLex to TshwanePedia": Creating and Flexibly Maintaining Online Encyclopaedias*

    Get PDF
    The addition of a restricted number of features to the dictionary (compilation) soft-ware TshwaneLex suffices to turn this application into a tool for the creation and maintenance of encyclopaedias. This article gives a brief overview of those extra features, using the online encyclo-paedia of the James Randi Educational Foundation (JREF) as case study. Keywords: lexicography, dictionary, encyclopaedia, software, online, tshwanelex, tshwanepedia, james randi educational foundation

    From "TshwaneLex to TshwaneTerm": Tailoring Terminology Management for South Africa*

    Get PDF
    The addition of a restricted number of features to the dictionary (compilation) soft-ware TshwaneLex suffices to turn this application into a terminology management system. This article gives a brief overview of those extra features, using the Department of Arts and Culture (DAC) AIDS list as case study. Keywords: lexicography, terminology list, terminology management system, software, tshwanelex, tshwaneterm, dac aids lis

    The lexicographic treatment of the demonstrative copulative in Sesotho sa Leboa — an exercise in multiple cross-referencing

    Get PDF
    In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two sections proceed with a review of the practical implementation of the DC lemmatisation suggestions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' — with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly. Keywords: lexicography, paradigmatic lemmatisation, african languages, sesotho sa leboa (northern sotho, sepedi), demonstrative copulative, cross-referencing, corpus, monolingual dictionary, bilingual dictionar

    Do dictionary users really look up frequent words? — on the overestimation of the value of corpus-based lexicography

    Get PDF
    An innovative online Swahili–English dictionary project is presented. A careful study of some of the log files attached to this reference work reveals some hitherto unknown as-pects of true dictionary look-up behaviour, which results in the depreciation of the importance of corpora for dictionary making. Three lexicography software modules are advanced to further enhance the success of the online dictionary. Keywords: lexicography, software, online, dictionary, log file, corpus, frequency, rank, correlation, swahili, english, tshwanele
    • …
    corecore