249 research outputs found

    Semantic Role Labeling for Knowledge Graph Extraction from Text

    Get PDF
    This paper introduces TakeFive, a new semantic role labeling method that transforms a text into a frame-oriented knowledge graph. It performs dependency parsing, identifies the words that evoke lexical frames, locates the roles and fillers for each frame, runs coercion techniques, and formalizes the results as a knowledge graph. This formal representation complies with the frame semantics used in Framester, a factual-linguistic linked data resource. We tested our method on the WSJ section of the Peen Treebank annotated with VerbNet and PropBank labels and on the Brown corpus. The evaluation has been performed according to the CoNLL Shared Task on Joint Parsing of Syntactic and Semantic Dependencies. The obtained precision, recall, and F1 values indicate that TakeFive is competitive with other existing methods such as SEMAFOR, Pikes, PathLSTM, and FRED. We finally discuss how to combine TakeFive and FRED, obtaining higher values of precision, recall, and F1 measure

    Extraction d'axiomes et de règles logiques à partir de définitions de wikipédia en langage naturel

    Get PDF
    RÉSUMÉ Le Web sémantique repose sur la création de bases de connaissances complexes reliant les données du Web. Notamment, la base de connaissance DBpedia a été créée et est considérée aujourd’hui comme le « noyau du réseau Linked Open Data ». Cependant DBpedia repose sur une ontologie très peu riche en définitions de concepts et ne prend pas en compte l’information textuelle de Wikipedia. L’ontologie de DBpedia contient principalement des liens taxonomiques et des informations sur les instances. L’objectif de notre recherche est d’interpréter le texte en langue naturelle de Wikipédia, afin d’enrichir DBpedia avec des définitions de classes, une hiérarchie de classes (relations taxonomiques) plus riche et de nouvelles informations sur les instances. Pour ce faire, nous avons recours à une approche basée sur des patrons syntaxiques implémentés sous forme de requêtes SPARQL. Ces patrons sont exécutés sur des graphes RDF représentant l’analyse syntaxique des définitions textuelles extraites de Wikipédia. Ce travail a résulté en la création de AXIOpedia, une base de connaissances expressive contenant des axiomes complexes définissant les classes, et des triplets rdf:type reliant les instances à leurs classes.----------ABSTRACT The Semantic Web relies on the creation of rich knowledge bases which links data on the Web. In that matter, DBpedia started as a community effort and is considered today as the central interlinking hub for the emerging Web of data. However, DBpedia relies on a lighweight ontology and deals with some substantial limitations and lacks some important information that could be found in the text and the unstructured data of Wikipedia. Furthermore, the DBpedia ontology contains mainly taxonomical links and data about the instances, and lacks class definitions. The objective of this work is to enrich DBpedia with class definitions and taxonomical links using text in natural language. For this purpose, we rely on a pattern-based approach that transforms textual definitions from Wikipedia into RDF graphs, which are processed to query syntactical pattern occurrences using SPARQL. This work resulted in the creation of AXIOpedia, a rich knowledge base containing complex axioms defining classes and rdf:type relations relating instances with these classes

    Ontology evolution: a process-centric survey

    Get PDF
    Ontology evolution aims at maintaining an ontology up to date with respect to changes in the domain that it models or novel requirements of information systems that it enables. The recent industrial adoption of Semantic Web techniques, which rely on ontologies, has led to the increased importance of the ontology evolution research. Typical approaches to ontology evolution are designed as multiple-stage processes combining techniques from a variety of fields (e.g., natural language processing and reasoning). However, the few existing surveys on this topic lack an in-depth analysis of the various stages of the ontology evolution process. This survey extends the literature by adopting a process-centric view of ontology evolution. Accordingly, we first provide an overall process model synthesized from an overview of the existing models in the literature. Then we survey the major approaches to each of the steps in this process and conclude on future challenges for techniques aiming to solve that particular stage

    Automatic refinement of large-scale cross-domain knowledge graphs

    Get PDF
    Knowledge graphs are a way to represent complex structured and unstructured information integrated into an ontology, with which one can reason about the existing information to deduce new information or highlight inconsistencies. Knowledge graphs are divided into the terminology box (TBox), also known as ontology, and the assertions box (ABox). The former consists of a set of schema axioms defining classes and properties which describe the data domain. Whereas the ABox consists of a set of facts describing instances in terms of the TBox vocabulary. In the recent years, there have been several initiatives for creating large-scale cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO, and Wikidata being amongst the most successful free datasets. Those graphs are often constructed with the extraction of information from semi-structured knowledge, such as Wikipedia, or unstructured text from the web using NLP methods. It is unlikely, in particular when heuristic methods are applied and unreliable sources are used, that the knowledge graph is fully correct or complete. There is a tradeoff between completeness and correctness, which is addressed differently in each knowledge graph’s construction approach. There is a wide variety of applications for knowledge graphs, e.g. semantic search and discovery, question answering, recommender systems, expert systems and personal assistants. The quality of a knowledge graph is crucial for its applications. In order to further increase the quality of such large-scale knowledge graphs, various automatic refinement methods have been proposed. Those methods try to infer and add missing knowledge to the graph, or detect erroneous pieces of information. In this thesis, we investigate the problem of automatic knowledge graph refinement and propose methods that address the problem from two directions, automatic refinement of the TBox and of the ABox. In Part I we address the ABox refinement problem. We propose a method for predicting missing type assertions using hierarchical multilabel classifiers and ingoing/ outgoing links as features. We also present an approach to detection of relation assertion errors which exploits type and path patterns in the graph. Moreover, we propose an approach to correction of relation errors originating from confusions between entities. Also in the ABox refinement direction, we propose a knowledge graph model and process for synthesizing knowledge graphs for benchmarking ABox completion methods. In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using SHACL.We introduce an ILP refinement step which exploits correlations between numerical attributes and relations in order to the efficiently learn Horn rules with numerical attributes. Finally, we investigate the introduction of lexical information from textual corpora into the ILP algorithm in order to improve quality of induced class expressions

    BIOMEDICAL LANGUAGE UNDERSTANDING AND EXTRACTION (BLUE-TEXT): A MINIMAL SYNTACTIC, SEMANTIC METHOD

    Get PDF
    Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting

    Semantic Systems. The Power of AI and Knowledge Graphs

    Get PDF
    This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies
    • …
    corecore