120 research outputs found

    Named Entity Resolution in Personal Knowledge Graphs

    Full text link
    Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct. 2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and applications' edited by Tiwari et a

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Sensing the Cultural Significance with AI for Social Inclusion

    Get PDF
    Social Inclusion has been growing as a goal in heritage management. Whereas the 2011 UNESCO Recommendation on the Historic Urban Landscape (HUL) called for tools of knowledge documentation, social media already functions as a platform for online communities to actively involve themselves in heritage-related discussions. Such discussions happen both in “baseline scenarios” when people calmly share their experiences about the cities they live in or travel to, and in “activated scenarios” when radical events trigger their emotions. To organize, process, and analyse the massive unstructured multi-modal (mainly images and texts) user-generated data from social media efficiently and systematically, Artificial Intelligence (AI) is shown to be indispensable. This thesis explores the use of AI in a methodological framework to include the contribution of a larger and more diverse group of participants with user-generated data. It is an interdisciplinary study integrating methods and knowledge from heritage studies, computer science, social sciences, network science, and spatial analysis. AI models were applied, nurtured, and tested, helping to analyse the massive information content to derive the knowledge of cultural significance perceived by online communities. The framework was tested in case study cities including Venice, Paris, Suzhou, Amsterdam, and Rome for the baseline and/or activated scenarios. The AI-based methodological framework proposed in this thesis is shown to be able to collect information in cities and map the knowledge of the communities about cultural significance, fulfilling the expectation and requirement of HUL, useful and informative for future socially inclusive heritage management processes

    Heterogeneous data to knowledge graphs matching

    Get PDF
    Many applications rely on the existence of reusable data. The FAIR (Findability, Accessibility, Interoperability, and Reusability) principles identify detailed descriptions of data and metadata as the core ingredients for achieving reusability. However, creating descriptive data requires massive manual effort. One way to ensure that data is reusable is by integrating it into Knowledge Graphs (KGs). The semantic foundation of these graphs provides the necessary description for reuse. In the Open Research KG, they propose to model artifacts of scientific endeavors, including publications and their key messages. Datasets supporting these publications are essential carriers of scientific knowledge and should be included in KGs. We focus on biodiversity research as an example domain to develop and evaluate our approach. Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. Understanding such a domain and its mechanisms is essential to preserving this vital foundation of human well-being. It is imperative to monitor the current state of biodiversity and its change over time and to understand its forces driving and preserving life in all its variety and richness. This need has resulted in numerous works being published in this field. For example, a large amount of tabular data (datasets), textual data (publications), and metadata (e.g., dataset description) have been generated. So, it is a data-rich domain with an exceptionally high need for data reuse. Managing and integrating these heterogeneous data of biodiversity research remains a big challenge. Our core research problem is how to enable the reusability of tabular data, which is one aspect of the FAIR data principles. In this thesis, we provide answer for this research problem

    Decisioning 2022 : Collaboration in knowledge discovery and decision making: Applications to sustainable agriculture

    Get PDF
    Sustainable agriculture is one of the Sustainable Development Goals (SDG) proposed by UN (United Nations), but little systematic work on Knowledge Discovery and Decision Making has been applied to it. Knowledge discovery and decision making are becoming active research areas in the last years. The era of FAIR (Findable, Accessible, Interoperable, Reusable) data science, in which linked data with a high degree of variety and different degrees of veracity can be easily correlated and put in perspective to have an empirical and scientific perception of best practices in sustainable agricultural domain. This requires combining multiple methods such as elicitation, specification, validation, technologies from semantic web, information retrieval, formal concept analysis, collaborative work, semantic interoperability, ontological matching, specification, smart contracts, and multiple decision making. Decisioning 2022 is the first workshop on Collaboration in knowledge discovery and decision making: Applications to sustainable agriculture. It has been organized by six research teams from France, Argentina, Colombia and Chile, to explore the current frontier of knowledge and applications in different areas related to knowledge discovery and decision making. The format of this workshop aims at the discussion and knowledge exchange between the academy and industry members.Laboratorio de Investigación y Formación en Informática Avanzad

    Human History and Digital Future

    Get PDF
    Korrigierter Nachdruck. Im Kapitel "Wallace/Moullou: Viability of Production and Implementation of Retrospective Photogrammetry in Archaeology" wurden die Acknowledgemens enfternt.The Proceedings of the 46th Annual Conference on Computer Applications and Quantitative Methods in Archaeology, held between March 19th and 23th, 2018 at the University of Tübingen, Germany, discuss the current questions concerning digital recording, computer analysis, graphic and 3D visualization, data management and communication in the field of archaeology. Through a selection of diverse case studies from all over the world, the proceedings give an overview on new technical approaches and best practice from various archaeological and computer-science disciplines

    User-centered semantic dataset retrieval

    Get PDF
    Finding relevant research data is an increasingly important but time-consuming task in daily research practice. Several studies report on difficulties in dataset search, e.g., scholars retrieve only partial pertinent data, and important information can not be displayed in the user interface. Overcoming these problems has motivated a number of research efforts in computer science, such as text mining and semantic search. In particular, the emergence of the Semantic Web opens a variety of novel research perspectives. Motivated by these challenges, the overall aim of this work is to analyze the current obstacles in dataset search and to propose and develop a novel semantic dataset search. The studied domain is biodiversity research, a domain that explores the diversity of life, habitats and ecosystems. This thesis has three main contributions: (1) We evaluate the current situation in dataset search in a user study, and we compare a semantic search with a classical keyword search to explore the suitability of semantic web technologies for dataset search. (2) We generate a question corpus and develop an information model to figure out on what scientific topics scholars in biodiversity research are interested in. Moreover, we also analyze the gap between current metadata and scholarly search interests, and we explore whether metadata and user interests match. (3) We propose and develop an improved dataset search based on three components: (A) a text mining pipeline, enriching metadata and queries with semantic categories and URIs, (B) a retrieval component with a semantic index over categories and URIs and (C) a user interface that enables a search within categories and a search including further hierarchical relations. Following user centered design principles, we ensure user involvement in various user studies during the development process
    corecore