Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese

Abstract

Lanser B. Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese. Bielefeld: Universität Bielefeld; 2017.In order to make the growing amount of conceptual knowledge available through ontologies and datasets accessible to humans, NLP applications need access to information on how this knowledge can be verbalized in natural language. One way to provide this kind of information are ontology lexicons, which apart from the actual verbalizations in a given target language can provide further, rich linguistic information about them. Compiling such lexicons manually is a very time-consuming task and requires expertise both in Semantic Web technologies and lexicon engineering, as well as a very good knowledge of the target language at hand. In this thesis we present two alternative approaches to generating ontology lexicons by means of crowdsourcing on the one hand and through the framework M-ATOLL on the other hand. So far, M-ATOLL has been used with a number of Indo-European languages that share a large set of common characteristics. Therefore, another focus of this work will be the generation of ontology lexicons specifically for Non-Indo-European languages. In order to explore these two topics, we use both approaches to generate Japanese ontology lexicons for the DBpedia ontology: First, we use CrowdFlower to generate a small Japanese ontology lexicon for ten exemplary ontology elements according to a two-stage workflow, the main underlying idea of which is to turn the task of generating lexicon entries into a translation task; the starting point of this translation task is a manually created English lexicon for DBpedia. Next, we adapt M-ATOLL's corpus-based approach to being used with Japanese, and use the adapted system to generate two lexicons for five example properties, respectively. Aspects of the DBpedia system that require modifications for being used with Japanese include the dependency patterns employed by M-ATOLL to extract candidate verbalizations from corpus data, and the templates used to generate the actual lexicon entries. Comparison of the lexicons generated by both approaches to manually created gold standards shows that both approaches are viable options for the generation of ontology lexicons also for Non-Indo-European languages

    Similar works