55 research outputs found

    Entity Discovery and Annotation in Tables

    Get PDF
    International audienceThe Web is rich of tables (e.g., HTML tables, speadsheets, Google Fusion tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities intables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type

    Analyzing the Evolution of Semantic Correspondences between SNOMED CT and ICT-9-CM

    Get PDF
    International audienceThe combined use of Knowledge Organizations Systems (KOS) including ontologies, terminologies or codification schemas has widespread in e-health systems over the past decades due to semantic interoperability reasons. However, the dynamic aspect of KOS forces knowledge engineers to maintain KOS elements, as well as semantic correspondences between KOS up-to-date. This is crucial to keep the underlying systems exploiting these KOS consistent over time. In this paper we provide a pragmatic analysis of the evolution of mappings between SNOMED CT and ICD-9-CM affected by the evolution of these two KOS

    Requirements for Implementing Mappings Adaptation Systems

    Get PDF
    International audienceOntologies, or more generally speaking, Knowledge Organization Systems (KOS) have been developed to support the correct interpretation of shared data in collaborative applications. The quantity and the heterogeneity of domain knowledge often require several KOS to describe their content. In order to assure unambiguous interpretation, overlapped concepts of different, but domain-related KOS are semantically connected via mappings. However, in various domains, KOS periodically evolve creating the necessity of reviewing the validity of associated mappings. The size of KOS remains a barrier for a manual review of mappings, and rather requires the support of (semi-) automatic solutions. This article describes our experiences in understanding how KOS evolution affects mappings. We present our lessons learned from various empirical experiments, and we derive primary elements and requirements for improving the automation of mapping maintenance

    Entity Discovery and Annotation in Tables

    Get PDF
    International audienceThe Web is rich of tables (e.g., HTML tables, speadsheets, Google Fusion tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities intables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables. We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type

    On the enrichment of a RDF Repository of City Points of Interest based on Social Data

    Get PDF
    International audiencePoints of interest (POIs) in a city are specific locations that present some significance to people; examples include restaurants, museums, hotels, theatres and landmarks, just to name a few. Due to their role in our social and economic life, POIs have been increasingly gaining the attention of location-based applications, such as on-line maps and social networking sites. While it is relatively easy to find on the Web basic information about a POI, such as its geographic location, telephone number and opening hours, it is more challenging to have a deeper knowledge as to what other people say about it. What if a person wants to know all the restaurants in parsi that serve good seafood and provide a kind service? Typically, the answer to this question has to be looked for on websites that let people leave comments and opinions on POIs, a time-consuming manual task that few are willing to do. This search would be better supported by search engines if information mined from opinions were available in a structured form, such as RDF. Inthis position paper, we describe a general approach to enrich an existing RDF repository about POIs with data obtained from social networking sites

    An Ontology-Driven Approach for Semantic Annotation of Documents with Specific Concepts

    No full text
    International audienceThis paper deals with an ontology-driven approach for semantic annotation of documents from a corpus where each document describes an entity of a same domain. The goal is to annotate each document with concepts being too specific to be explicitly mentioned into texts. The only thing we know about the concepts is their labels. They have no definitions. Moreover, their characteristics in the texts are incomplete. We propose an ontology-based approach, named SAUPODOC, aiming to perform this particular annotation process by combining several approaches. Indeed, SAUPODOC relies on a domain ontology relative to the field under study, which has a pivotal role, on its population with property assertions coming from documents and external resources, and its enrichment with formal specific concept definitions. Eperiments have been carried out in two application domains, showing the benefit of the approach compared to well-known classifiers

    Une approche combinée pour l’enrichissement d’ontologie à partir de textes et de données du LOD

    No full text
    National audienceCet article porte sur l’étiquetage automatique de documents décrivantdes produits, avec des concepts très spécifiques traduisant des besoins précisd’utilisateurs. La particularité du contexte est qu’il se confronte à une triple difficulté: 1) les concepts utilisés pour l’étiquetage n’ont pas de réalisations terminologiquesdirectes dans les documents, 2) leurs définitions formelles ne sontpas connues au départ, 3) toutes les informations nécessaires ne sont pas forcémentprésentes dans les documents mêmes. Pour résoudre ce problème, nousproposons un processus d’annotation en deux étapes, guidé par une ontologie.La première consiste à peupler l’ontologie avec les données extraites des documents,complétées par d’autres issues de ressources externes. La deuxièmeest une étape de raisonnement sur les données extraites qui recouvre soit unephase d’apprentissage de définitions de concepts, soit une phase d’applicationdes définitions apprises. L’approche SAUPODOC est ainsi une approche originaled’enrichissement d’ontologie qui exploite les fondements du Web sémantique,en combinant les apports du LOD et d’outils d’analyse de texte, d’apprentissageautomatique et de raisonnement. L’évaluation, sur deux domaines d’application,donne des résultats de qualité et démontre l’intérêt de l’approche
    • …
    corecore