14 research outputs found

    ATOLL - A framework for the automatic induction of ontology lexica

    Get PDF
    Walter S, Unger C, Cimiano P. ATOLL - A framework for the automatic induction of ontology lexica. Data & Knowledge Engineering. 2014;94:148-162.There is a range of large knowledge bases, such as Freebase and DBpedia, as well as linked data sets available on the web, but they typically lack lexical information stating how the properties and classes they comprise are realized lexically. Often only one label is attached, if at all, thus lacking rich linguistic information, e.g. about morphological forms, syntactic arguments or possible lexical variants and paraphrases. While ontology lexicon models like lemon allow for defining such linguistic information with respect to a given ontology, the cost involved in creating and maintaining such lexica is substantial, requiring a high manual effort. Towards lowering this effort we present ATOLL, a framework for the automatic induction of ontology lexica, based both on existing labels and dependency paths extracted from a text corpus. We instantiate ATOLL\ with respect to DBpedia\ as dataset and Wikipedia as corresponding corpus, and evaluate it by comparing the automatically generated lexicon with a manually constructed one. Our results clearly corroborate that our approach shows a high potential to be applied in a semi-automatic fashion in which a lexicon engineer can validate, reject or refine the automatically generated lexical entries, thus having a clear potential to contributing to the reduction the overall cost of creating ontology lexica

    Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese

    Get PDF
    Lanser B. Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese. Bielefeld: Universität Bielefeld; 2017.In order to make the growing amount of conceptual knowledge available through ontologies and datasets accessible to humans, NLP applications need access to information on how this knowledge can be verbalized in natural language. One way to provide this kind of information are ontology lexicons, which apart from the actual verbalizations in a given target language can provide further, rich linguistic information about them. Compiling such lexicons manually is a very time-consuming task and requires expertise both in Semantic Web technologies and lexicon engineering, as well as a very good knowledge of the target language at hand. In this thesis we present two alternative approaches to generating ontology lexicons by means of crowdsourcing on the one hand and through the framework M-ATOLL on the other hand. So far, M-ATOLL has been used with a number of Indo-European languages that share a large set of common characteristics. Therefore, another focus of this work will be the generation of ontology lexicons specifically for Non-Indo-European languages. In order to explore these two topics, we use both approaches to generate Japanese ontology lexicons for the DBpedia ontology: First, we use CrowdFlower to generate a small Japanese ontology lexicon for ten exemplary ontology elements according to a two-stage workflow, the main underlying idea of which is to turn the task of generating lexicon entries into a translation task; the starting point of this translation task is a manually created English lexicon for DBpedia. Next, we adapt M-ATOLL's corpus-based approach to being used with Japanese, and use the adapted system to generate two lexicons for five example properties, respectively. Aspects of the DBpedia system that require modifications for being used with Japanese include the dependency patterns employed by M-ATOLL to extract candidate verbalizations from corpus data, and the templates used to generate the actual lexicon entries. Comparison of the lexicons generated by both approaches to manually created gold standards shows that both approaches are viable options for the generation of ontology lexicons also for Non-Indo-European languages

    Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica

    Get PDF
    Walter S. Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica. Bielefeld: Universität Bielefeld; 2017.There is an increasing interest in providing common web users with access to structured knowledge bases such as DBpedia, for example by means of question answering systems. All such question answering systems have in common that they have to map a natural language input, be it spoken or written, to a formal representation in order to extract the correct answer from the target knowledge base. This is also the case for systems which generate natural language text from a given knowledge base. The main challenge is how to map natural language (spoken or written) to structured data and vice versa. To this end, question answering systems require knowledge about how the vocabulary elements used in the available datasets are verbalized in natural language, covering different verbalization variants. Multilinguality of course increases the complexity of this challenge. In this thesis we introduce M-ATOLL, a framework for automatically inducing ontology lexica in multiple languages, to find such verbalization variants. We have instantiated the system for three languages, English, German and Spanish, by exploiting a set of language-specific dependency patterns for finding lexicalizations in text corpora. Additionally, we extended our framework to extract complex adjective lexicalizations with a machine-learning-based approach. M-ATOLL is the first open-source and multilingual approach for the generation of ontology lexica. In this thesis we present grammatical patterns for three different languages, on which the extraction of lexicalization relies. We provide an analysis of these patterns as well as a comparison with those proposed by other state-of-the-art systems. Additionally, we present a detailed evaluation comparing the different approaches with different settings on a publicly available goldstandard, and discuss their potential and limitations

    Applying Semantic Parsing to Question Answering Over Linked Data: Addressing the Lexical Gap

    Get PDF
    Hakimov S, Unger C, Walter S, Cimiano P. Applying Semantic Parsing to Question Answering Over Linked Data: Addressing the Lexical Gap. In: Biemann C, Handschuh S, Freitas A, Meziane F, Metais E, eds. Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Passau, Germany, June 17-19, 2015, Proceedings. LNCS. Vol 9103. Springer International Publishing; 2015: 103-109.Question answering over linked data has emerged in the past years as an important topic of research in order to provide natural language access to a growing body of linked open data on the Web. In this paper we focus on analyzing the lexical gap that arises as a challenge for any such question answering system. The lexical gap refers to the mismatch between the vocabulary used in a user question and the vocabulary used in the relevant dataset. We implement a semantic parsing approach and evaluate it on the QALD-4 benchmark, showing that the performance of such an approach suffers from training data sparseness. Its performance can, however, be substantially improved if the right lexical knowledge is available. To show this, we model a set of lexical entries by hand to quantify the number of entries that would be needed. Further, we analyze if a state-of-the-art tool for inducing ontology lexica from corpora can derive these lexical entries automatically. We conclude that further research and investments are needed to derive such lexical knowledge automatically or semi-automatically

    Can predicate lexicalizations help in named entity disambiguation?

    Get PDF
    Paulheim H, Unger C. Can predicate lexicalizations help in named entity disambiguation? In: NLP & DBpedia 2015 : Proceedings of the Third NLP & DBpedia Workshop (NLP & DBpedia 2015) co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015) Bethlehem, Pennsylvania, USA, October 11, 2015. CEUR Workshop Proceedings. Vol 1581. 2016: 92-97

    M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages

    Get PDF
    Walter S, Unger C, Cimiano P. M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages. In: Mika P, Tudorache T, Bernstein A, et al., eds. The Semantic Web – ISWC 2014. Lecture Notes in Computer Science. Vol 8796. Cham: Springer International Publishing; 2014: 472-486.Many tasks in which a system needs to mediate between natural language expressions and elements of a vocabulary in an ontology or dataset require knowledge about how the elements of the vocabulary (i.e. classes, properties, and individuals) are expressed in natural language. In a multilingual setting, such knowledge is needed for each of the supported languages. In this paper we present M-ATOLL, a framework for automatically inducing ontology lexica in multiple languages on the basis of a multilingual corpus. The framework exploits a set of language-specific dependency patterns which are formalized as SPARQL queries and run over a parsed corpus. We have instantiated the system for two languages: German and English. We evaluate it in terms of precision, recall and F-measure for English and German by comparing an automatically induced lexicon to manually constructed ontology lexica for DBpedia. In particular, we investigate the contribution of each single dependency pattern and perform an analysis of the impact of different parameters

    Lexicalização de ontologias : o relacionamento entre conteúdo e significado no contexto da Recuperação da Informação

    Get PDF
    Esta proposta visa representar a linguagem natural na forma adequada às ontologias e vice-versa. Para tanto, propõe-se à criação semiautomática de base de léxicos em português brasileiro, contendo informações morfológicas, sintáticas e semânticas apropriadas para a leitura por máquinas, permitindo vincular dados estruturados e não estruturados, bem como integrar a leitura em modelo de recuperação da informação para aumentar a precisão. Os resultados alcançados demonstram a utilização da metodologia, no domínio de risco financeiro em português, para a elaboração da ontologia, da base léxico-semântica e da proposta do modelo de recuperação da informação semântica. Para avaliar a performance do modelo proposto, foram selecionados documentos contendo as principais definições do domínio de risco financeiro. Esses foram indexados com e sem anotação semântica. Para possibilitar a comparação entre as abordagens, foram criadas duas bases, a primeira representando a busca tradicional, e a segunda contendo o índice construído, a partir dos textos com as anotações semânticas para representar a busca semântica. A avaliação da proposta é baseada na revocação e na precisão. As consultas submetidas ao modelo mostram que a busca semântica supera o desempenho da tradicional e validam a metodologia empregada. O procedimento, embora adicione complexidade em sua elaboração, pode ser reproduzido em qualquer outro domínio.The proposal presented in this study seeks to properly represent natural language to ontologies and vice-versa. Therefore, the semi-automatic creation of a lexical database in Brazilian Portuguese containing morphological, syntactic, and semantic information that can be read by machines was proposed, allowing the link between structured and unstructured data and its integration into an information retrieval model to improve precision. The results obtained demonstrated that the methodology can be used in the risco financeiro (financial risk) domain in Portuguese for the construction of an ontology and the lexical-semantic database and the proposal of a semantic information retrieval model. In order to evaluate the performance of the proposed model, documents containing the main definitions of the financial risk domain were selected and indexed with and without semantic annotation. To enable the comparison between the approaches, two databases were created based on the texts with the semantic annotations to represent the semantic search. The first one represents the traditional search and the second contained the index built based on the texts with the semantic annotations to represent the semantic search. The evaluation of the proposal was based on recall and precision. The queries submitted to the model showed that the semantic search outperforms the traditional search and validates the methodology used. Although more complex, the procedure proposed can be used in all kinds of domains

    The WebNLG Challenge: Generating Text from DBPedia Data

    Get PDF
    corecore