27 research outputs found
Semi-automatic retrieval of definitional information: a northern Sotho case study
Corpus-based terminology is currently gaining ground on the international front. It is therefore important that terminologists working on the South African Bantu languages not only take note of this development, but that they should also follow this trend, even if they do not have the same measure of access to highly sophisticated software. The aim of this article is therefore to establish whether it is possible to retrieve definitional information on key concepts from untagged, running text by making use of affordable and easily accessible software such as WordSmith Tools. In order to answer this question, a case study is done in Northern Sotho, using textual material on
linguistics as basis for a special field corpus. Syntactic and lexical patterns serving as textual markers of definitional information are identified and the success rate of the computational retrieval of definitional information is analysed and evaluated. Attention is also paid to the retrieval of specifically conceptual information, which turned out to be a fortunate by-product of semi-automatic retrieval of definitional information. Finally, it is illustrated how definitional information retrieved can be utilised in the writing of a formal terminological definition.
Keywords: terminology, south african bantu languages, definitional information, semi-automatic information retrieval, terminological definitions, conceptual relationships, lexical patterns, syntactic patterns, textual markers, keyword-in-context (kwic), wordsmith tool
Issues in the planning and design of a bilingual (English–northern Sotho) explanatory dictionary for industrial electronics
The focus of this article is the planning and design of a bilingual, explanatory dic-tionary for industrial electronics with a clearly delimited and very specific target user in mind. Since the number of lemmas to be treated in the dictionary is limited to 500, special care must be taken to select those lemmas that are relevant for both the purpose of the dictionary and the needs of the target user. It is indicated that the main consideration in the planning of the envisaged dic-tionary is user-friendliness, as dictated by the intended target users. In this article, a novel ap-proach to the semi-automatic selection of lemmas for inclusion in an LSP dictionary is described. The procedure that is used for the extraction of definitional information from an electronic corpus is also explained.
Keywords: lsp lexicography, dictionary planning, lemma selection, semi-automatic term extraction, definitional information, industrial elec-tronics, corpus-based lexicograph
The lexicographic treatment of the demonstrative copulative in Sesotho sa Leboa — an exercise in multiple cross-referencing
In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two
sections proceed with a review of the practical implementation of the DC lemmatisation suggestions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' — with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly.
Keywords: lexicography, paradigmatic lemmatisation, african languages, sesotho sa leboa (northern sotho, sepedi), demonstrative copulative, cross-referencing, corpus, monolingual dictionary, bilingual dictionar
The use of LSP dictionaries in secondary schools – a South African case study
This article reports on the results of a broad evaluation of the efficacy of the Multilingual ExplanatoryScience Dictionary and Multilingual Explanatory Math Dictionary in a multilingual educationalenvironment. The aim of the investigation is to ascertain (a) whether the target users possess the necessarydictionary using skills to make use of the dictionaries effectively, and (b) whether the benefit of exposureto definitions of terms in the home language is significant in the decoding of the meaning of scienceand mathematical terms. Data were collected by means of two questionnaires that were completed bymembers of the intended target user group. Participants in the study revealed themselves inexperiencedand untrained dictionary users with rudimentary dictionary using skills. They were able to perform simplelook-up procedures but performed badly in cases where a more sophisticated approach is called for
On the development of a tagset for Northern Sotho with special reference to the issue of standardisation
Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logical step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tagset and its design within the bigger picture of the architecture of corpus annotation. Usage-related aspects therefore feature prominently in the design of the tagset for Northern Sotho. It is explained why this proposed tagset is biased towards human readability, rather than machine readability; this choice of a stochastic tagger is motivated, and the relationship between tokenising, tagging, morphological analysis and parsing is discussed. In order to account at least to some extent for the morphological complexity of Northern Sotho at the tagging level, a multilevel annotation is opted for: the first level comprising obligatory information and the second optional and recommended information. Finally, aspects of standardisation are considered against the background of reuse, of sharing of resources, and of possible adaptation for use by other disjunctively written South African Bantu languages. It is not the aim of this article to evaluate the results of any tagging procedure using the proposed tagset. It only describes the design and motivates the choices made with regard to the tagset design. However, an evaluation is in process and results will be published in the near future (cf. FaaĂź et al., s.a.)