Skip to main content
Article thumbnail
Location of Repository

KB-N: Computerized extraction, representation and dissemination of special terminology.

By Magnar Brekke

Abstract

This paper reports early results of a 3-year project aiming to establish a knowledge-bank for economic-administrative domains. Special knowledge is embedded in text produced typically by experts, captured in language independent concepts as language specific terminology which is stratified with respect to domain specificity ranging from general shared terms to unique domain-focal terms. KB-N refines and integrates computational strategies and tools in NLP for corpus design and analysis, automatic and semi-automatic extraction, representation, and retrieval of terminology, dynamic thesaurus creation, dynamic display of authentic collocational and phraseological evidence, etc. In Phase I of the project we are capturing introductory textbook text across 30-odd subdomains. Texts are XML-coded and POS-tagged, and strictly parallel texts aligned for equivalence mining. Term extraction from English text exploits System Quirk’s built-in functions while term extraction from Norwegian is being developed from scratch. Pruning/supplementation of candidate lists require man/machine interaction where expert knowledge intersects with terminological principles. The system allows dynamic development of conceptual hierarchies. A range of applications are envisaged for the knowledge bank. The theoretically most interesting use of the KB-N Termbank will be in the context of Norwegian-to-English automatic translation On the didactic side KB-N will be integrated with an established e-learning system. The concept of a text-based knowledge-bank builds on the underlying assumption that domain-focal special knowledge is embedded in text produced typically by domain experts for documentary, argumentative, didactic or general communicative purposes. It further assumes that the essential knowledge content is embedded in relatively language independent concepts and manifested through relatively language specific terminology (in casu English and Norwegian used in economic-administrative domains), and that such terminology is stratified with respect to domain specificity ranging from general shared terms down to a small set of domain-focal terms

Year: 2008
OAI identifier: oai:CiteSeerX.psu:10.1.1.134.923
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://mora.rente.nhh.no/proje... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.