25 research outputs found

    Semi-automatic retrieval of definitional information: a northern Sotho case study

    Get PDF
    Corpus-based terminology is currently gaining ground on the international front. It is therefore important that terminologists working on the South African Bantu languages not only take note of this development, but that they should also follow this trend, even if they do not have the same measure of access to highly sophisticated software. The aim of this article is therefore to establish whether it is possible to retrieve definitional information on key concepts from untagged, running text by making use of affordable and easily accessible software such as WordSmith Tools. In order to answer this question, a case study is done in Northern Sotho, using textual material on linguistics as basis for a special field corpus. Syntactic and lexical patterns serving as textual markers of definitional information are identified and the success rate of the computational retrieval of definitional information is analysed and evaluated. Attention is also paid to the retrieval of specifically conceptual information, which turned out to be a fortunate by-product of semi-automatic retrieval of definitional information. Finally, it is illustrated how definitional information retrieved can be utilised in the writing of a formal terminological definition. Keywords: terminology, south african bantu languages, definitional information, semi-automatic information retrieval, terminological definitions, conceptual relationships, lexical patterns, syntactic patterns, textual markers, keyword-in-context (kwic), wordsmith tool

    Issues in the planning and design of a bilingual (English–northern Sotho) explanatory dictionary for industrial electronics

    Get PDF
    The focus of this article is the planning and design of a bilingual, explanatory dic-tionary for industrial electronics with a clearly delimited and very specific target user in mind. Since the number of lemmas to be treated in the dictionary is limited to 500, special care must be taken to select those lemmas that are relevant for both the purpose of the dictionary and the needs of the target user. It is indicated that the main consideration in the planning of the envisaged dic-tionary is user-friendliness, as dictated by the intended target users. In this article, a novel ap-proach to the semi-automatic selection of lemmas for inclusion in an LSP dictionary is described. The procedure that is used for the extraction of definitional information from an electronic corpus is also explained. Keywords: lsp lexicography, dictionary planning, lemma selection, semi-automatic term extraction, definitional information, industrial elec-tronics, corpus-based lexicograph

    The lexicographic treatment of the demonstrative copulative in Sesotho sa Leboa — an exercise in multiple cross-referencing

    Get PDF
    In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two sections proceed with a review of the practical implementation of the DC lemmatisation suggestions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' — with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly. Keywords: lexicography, paradigmatic lemmatisation, african languages, sesotho sa leboa (northern sotho, sepedi), demonstrative copulative, cross-referencing, corpus, monolingual dictionary, bilingual dictionar

    The use of LSP dictionaries in secondary schools – a South African case study

    No full text
    This article reports on the results of a broad evaluation of the efficacy of the Multilingual ExplanatoryScience Dictionary and Multilingual Explanatory Math Dictionary in a multilingual educationalenvironment. The aim of the investigation is to ascertain (a) whether the target users possess the necessarydictionary using skills to make use of the dictionaries effectively, and (b) whether the benefit of exposureto definitions of terms in the home language is significant in the decoding of the meaning of scienceand mathematical terms. Data were collected by means of two questionnaires that were completed bymembers of the intended target user group. Participants in the study revealed themselves inexperiencedand untrained dictionary users with rudimentary dictionary using skills. They were able to perform simplelook-up procedures but performed badly in cases where a more sophisticated approach is called for

    On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

    No full text
    Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logical step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tagset and its design within the bigger picture of the architecture of corpus annotation. Usage-related aspects therefore feature prominently in the design of the tagset for Northern Sotho. It is explained why this proposed tagset is biased towards human readability, rather than machine readability; this choice of a stochastic tagger is motivated, and the relationship between tokenising, tagging, morphological analysis and parsing is discussed. In order to account at least to some extent for the morphological complexity of Northern Sotho at the tagging level, a multilevel annotation is opted for: the first level comprising obligatory information and the second optional and recommended information. Finally, aspects of standardisation are considered against the background of reuse, of sharing of resources, and of possible adaptation for use by other disjunctively written South African Bantu languages. It is not the aim of this article to evaluate the results of any tagging procedure using the proposed tagset. It only describes the design and motivates the choices made with regard to the tagset design. However, an evaluation is in process and results will be published in the near future (cf. Faaß et al., s.a.)
    corecore