Article thumbnail

Semi-automatic Term Extraction for the African Languages, with Special

By Reference To Northern Sotho


Abstract: Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If African-language terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computation-ally straightforward alternatives (i.e. use 'raw corpora') and to make use of widely available soft-ware tools (e.g. WordSmith Tools). The main aim is therefore to discover whether or not the semi-automatic extraction of terminology from untagged and unmarked running text by means of basic corpus query software is feasible for the African languages. In order to answer this question a full-blown case study revolving around Northern Sotho linguistic texts is discussed in great detail. The computational results are compared throughout with the outcome of a manual excerption, and vice versa. Attention is given to the concepts 'recall ' and 'precision'; different approaches are suggested for the treatment of single-word terms versus multi-word terms; and the various findings are sum

Year: 2008
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.