3,244 research outputs found

    Graph-Embedding Empowered Entity Retrieval

    Full text link
    In this research, we improve upon the current state of the art in entity retrieval by re-ranking the result list using graph embeddings. The paper shows that graph embeddings are useful for entity-oriented search tasks. We demonstrate empirically that encoding information from the knowledge graph into (graph) embeddings contributes to a higher increase in effectiveness of entity retrieval results than using plain word embeddings. We analyze the impact of the accuracy of the entity linker on the overall retrieval effectiveness. Our analysis further deploys the cluster hypothesis to explain the observed advantages of graph embeddings over the more widely used word embeddings, for user tasks involving ranking entities

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest in the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Searching COVID-19 clinical research using graphical abstracts

    Full text link
    Objective. Graphical abstracts are small graphs of concepts that visually summarize the main findings of scientific articles. While graphical abstracts are customarily used in scientific publications to anticipate and summarize their main results, we propose them as a means for expressing graph searches over existing literature. Materials and methods. We consider the COVID-19 Open Research Dataset (CORD-19), a corpus of more than one million abstracts; each of them is described as a graph of co-occurring ontological terms, selected from the Unified Medical Language System (UMLS) and the Ontology of Coronavirus Infectious Disease (CIDO). Graphical abstracts are also expressed as graphs of ontological terms, possibly augmented by utility terms describing their interactions (e.g., "associated with", "increases", "induces"). We build a co-occurrence network of concepts mentioned in the corpus; we then identify the best matches of graphical abstracts on the network. We exploit graph database technology and shortest-path queries. Results. We build a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships. A well-designed interface allows users to explore the network by formulating or adapting queries in the form of an abstract; it produces a bibliography of publications, globally ranked; each publication is further associated with the specific parts of the abstract that it explains, thereby allowing the user to understand each aspect of the matching. Discussion and Conclusion. Our approach supports the process of scientific hypothesis formulation and evidence search; it can be reapplied to any scientific domain, although our mastering of UMLS makes it most suited to clinical domains.Comment: 12 pages, 6 figure

    Generating Semantic Snapshots of Newscasts Using Entity Expansion

    Get PDF
    textabstractTV newscasts report about the latest event-related facts occurring in the world. Relying exclusively on them is, however, insufficient to fully grasp the context of the story being reported. In this paper, we propose an approach that retrieves and analyzes related documents from the Web to automatically generate semantic annotations that provide viewers and experts comprehensive information about the news. We detect named entities in the retrieved documents that further disclose relevant concepts that were not explicitly mentioned in the original newscast. A ranking algorithm based on entity frequency, popularity peak analysis, and domain experts’ rules sorts those annotations to generate what we call Semantic Snapshot of a Newscast (NSS). We benchmark this method against a gold standard generated by domain experts and assessed via a user survey over five BBC newscasts. Results of the experiments show the robustness of our approach holding an Average Normalized Discounted Cumulative Gain of 66.6%

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    Annotation of Toponyms in TEI Digital Literary Editions and Linking to the Web of Data

    Get PDF
    International audienceThis paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym markup , in particular to provide mentions with precise geo-referencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.Este artigo aborda as dificuldades e as vantagens da anotação dos nomes de lugar em textos literários e de crítica literária. Começamos por realçar os problemas de codifi-cação da informação espacial em edições digitais usando o formato TEI, através de duas experiências de anotação manual e da análise de diversos casos. Isto conduzirá à questão de como utilizar os recursos da web semântica para complementar e enrique-cer a marcação de topónimos, em particular com georreferenciação rigorosa. Por último, a anotação automática de um grande corpus irá mostrar o potencial de visuali-zação de locais a partir de textos, ilustrando a análise da evolução da vida literária segundo um ponto de vista espacial e geográfico. Palavras-chave: estudos literários digitais; topónimos; web semântica; bases de dados geográficas; mapas e visualizações

    Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP

    Get PDF
    Extracting valuable insights from vast amounts of unstructured digital text presents significant challenges across diverse domains. This research addresses this challenge by proposing a novel pipeline-based system that generates domain-agnostic and task-agnostic text representations. The proposed approach leverages labeled property graphs (LPG) to encode contextual information, facilitating the integration of diverse linguistic elements into a unified representation. The proposed system enables efficient graph-based querying and manipulation by addressing the crucial aspect of comprehensive context modeling and fine-grained semantics. The effectiveness of the proposed system is demonstrated through the implementation of NLP components that operate on LPG-based representations. Additionally, the proposed approach introduces specialized patterns and algorithms to enhance specific NLP tasks, including nominal mention detection, named entity disambiguation, event enrichments, event participant detection, and temporal link detection. The evaluation of the proposed approach, using the MEANTIME corpus comprising manually annotated documents, provides encouraging results and valuable insights into the system\u27s strengths. The proposed pipeline-based framework serves as a solid foundation for future research, aiming to refine and optimize LPG-based graph structures to generate comprehensive and semantically rich text representations, addressing the challenges associated with efficient information extraction and analysis in NLP
    corecore