2,988,875 research outputs found

    Case-based reasoning: The marriage of knowledge base and data base

    Get PDF
    The coupling of data and knowledge has a synergistic effect when building an intelligent data base. The goal is to integrate the data and knowledge almost to the point of indistinguishability, permitting them to be used interchangeably. Examples given in this paper suggest that Case-Based Reasoning is a more integrated way to link data and knowledge than pure rule-based reasoning

    Fonduer: Knowledge Base Construction from Richly Formatted Data

    Full text link
    We focus on knowledge base construction (KBC) from richly formatted data. In contrast to KBC from text or tabular data, KBC from richly formatted data aims to extract relations conveyed jointly via textual, structural, tabular, and visual expressions. We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. Fonduer presents a new data model that accounts for three challenging characteristics of richly formatted data: (1) prevalent document-level relations, (2) multimodality, and (3) data variety. Fonduer uses a new deep-learning model to automatically capture the representation (i.e., features) needed to learn how to extract relations from richly formatted data. Finally, Fonduer provides a new programming model that enables users to convert domain expertise, based on multiple modalities of information, to meaningful signals of supervision for training a KBC system. Fonduer-based KBC systems are in production for a range of use cases, including at a major online retailer. We compare Fonduer against state-of-the-art KBC approaches in four different domains. We show that Fonduer achieves an average improvement of 41 F1 points on the quality of the output knowledge base---and in some cases produces up to 1.87x the number of correct entries---compared to expert-curated public knowledge bases. We also conduct a user study to assess the usability of Fonduer's new programming model. We show that after using Fonduer for only 30 minutes, non-domain experts are able to design KBC systems that achieve on average 23 F1 points higher quality than traditional machine-learning-based KBC approaches

    Demand-Weighted Completeness Prediction for a Knowledge Base

    Full text link
    In this paper we introduce the notion of Demand-Weighted Completeness, allowing estimation of the completeness of a knowledge base with respect to how it is used. Defining an entity by its classes, we employ usage data to predict the distribution over relations for that entity. For example, instances of person in a knowledge base may require a birth date, name and nationality to be considered complete. These predicted relation distributions enable detection of important gaps in the knowledge base, and define the required facts for unseen entities. Such characterisation of the knowledge base can also quantify how usage and completeness change over time. We demonstrate a method to measure Demand-Weighted Completeness, and show that a simple neural network model performs well at this prediction task.Comment: To appear in NAACL-HLT 201

    Reasoning with Data Flows and Policy Propagation Rules

    Get PDF
    Data-oriented systems and applications are at the centre of current developments of the World Wide Web. In these scenarios, assessing what policies propagate from the licenses of data sources to the output of a given data-intensive system is an important problem. Both policies and data flows can be described with Semantic Web languages. Although it is possible to define Policy Propagation Rules (PPR) by associating policies to data flow steps, this activity results in a huge number of rules to be stored and managed. In a recent paper, we introduced strategies for reducing the size of a PPR knowledge base by using an ontology of the possible relations between data objects, the Datanode ontology, and applying the (A)AAAA methodology, a knowledge engineering approach that exploits Formal Concept Analysis (FCA). In this article, we investigate whether this reasoning is feasible and how it can be performed. For this purpose, we study the impact of compressing a rule base associated with an inference mechanism on the performance of the reasoning process. Moreover, we report on an extension of the (A)AAAA methodology that includes a coherency check algorithm, that makes this reasoning possible. We show how this compression, in addition to being beneficial to the management of the knowledge base, also has a positive impact on the performance and resource requirements of the reasoning process for policy propagation

    Flabase: towards the creation of a flamenco music knowledge base

    Get PDF
    Online information about flamenco music is scattered overdifferent sites and knowledge bases. Unfortunately, thereis no common repository that indexes all these data. Inthis work, information related to flamenco music is gath-ered from general knowledge bases (e.g., Wikipedia, DB-pedia), music encyclopedias (e.g., MusicBrainz), and spe-cialized flamenco websites, and is then integrated into anew knowledge base called FlaBase. As resources fromdifferent data sources do not share common identifiers, aprocess of pair-wise entity resolution has been performed.FlaBase contains information about 1,174 artists, 76pa-los(flamenco genres), 2,913 albums, 14,078 tracks, and771 Andalusian locations. It is freely available in RDF andJSON formats. In addition, a method for entity recognitionand disambiguation for FlaBase has been created. The sys-tem can recognize and disambiguate FlaBase entity refer-ences in Spanish texts with an f-measure value of 0.77. Weapplied it to biographical texts present in Flabase. By usingthe extracted information, the knowledge base is populatedwith relevant information and a semantic graph is createdconnecting the entities of FlaBase. Artists relevance is thencomputed over the graph and evaluated according to a fla-menco expert criteria. Accuracy of results shows a highdegree of quality and completeness of the knowledge base
    corecore