2,988,875 research outputs found
Case-based reasoning: The marriage of knowledge base and data base
The coupling of data and knowledge has a synergistic effect when building an intelligent data base. The goal is to integrate the data and knowledge almost to the point of indistinguishability, permitting them to be used interchangeably. Examples given in this paper suggest that Case-Based Reasoning is a more integrated way to link data and knowledge than pure rule-based reasoning
Fonduer: Knowledge Base Construction from Richly Formatted Data
We focus on knowledge base construction (KBC) from richly formatted data. In
contrast to KBC from text or tabular data, KBC from richly formatted data aims
to extract relations conveyed jointly via textual, structural, tabular, and
visual expressions. We introduce Fonduer, a machine-learning-based KBC system
for richly formatted data. Fonduer presents a new data model that accounts for
three challenging characteristics of richly formatted data: (1) prevalent
document-level relations, (2) multimodality, and (3) data variety. Fonduer uses
a new deep-learning model to automatically capture the representation (i.e.,
features) needed to learn how to extract relations from richly formatted data.
Finally, Fonduer provides a new programming model that enables users to convert
domain expertise, based on multiple modalities of information, to meaningful
signals of supervision for training a KBC system. Fonduer-based KBC systems are
in production for a range of use cases, including at a major online retailer.
We compare Fonduer against state-of-the-art KBC approaches in four different
domains. We show that Fonduer achieves an average improvement of 41 F1 points
on the quality of the output knowledge base---and in some cases produces up to
1.87x the number of correct entries---compared to expert-curated public
knowledge bases. We also conduct a user study to assess the usability of
Fonduer's new programming model. We show that after using Fonduer for only 30
minutes, non-domain experts are able to design KBC systems that achieve on
average 23 F1 points higher quality than traditional machine-learning-based KBC
approaches
Recommended from our members
DKA-robo: dynamically updating time-invalid knowledge bases using robots
In this paper we present the DKA-robo framework, where a mobile agent is used to update those statements of a knowledge base that have lost validity in time. Managing the dynamic information of knowledge bases constitutes a key issue in many real-world scenarios, because constantly reevaluating data requires efforts in terms of knowledge acquisition and representation. Our solution to such a problem is to use RDF and SPARQL to represent and manage the time-validity of information, combined with an agent acting as a mobile sensor which updates the outdated statements in the knowledge base, therefore always guaranteeing time-valid results against user queries. This demo shows the implementation of our approach in the working environment of our research lab, where a robot is used to sense temperature, humidity, wifi- signal and number of people on demand, updating the lab knowledge base with time-valid information
Demand-Weighted Completeness Prediction for a Knowledge Base
In this paper we introduce the notion of Demand-Weighted Completeness,
allowing estimation of the completeness of a knowledge base with respect to how
it is used. Defining an entity by its classes, we employ usage data to predict
the distribution over relations for that entity. For example, instances of
person in a knowledge base may require a birth date, name and nationality to be
considered complete. These predicted relation distributions enable detection of
important gaps in the knowledge base, and define the required facts for unseen
entities. Such characterisation of the knowledge base can also quantify how
usage and completeness change over time. We demonstrate a method to measure
Demand-Weighted Completeness, and show that a simple neural network model
performs well at this prediction task.Comment: To appear in NAACL-HLT 201
Reasoning with Data Flows and Policy Propagation Rules
Data-oriented systems and applications are at the centre of current developments of the World Wide Web. In these scenarios, assessing what policies propagate from the licenses of data sources to the output of a given data-intensive system is an important problem. Both policies and data flows can be described with Semantic Web languages. Although it is possible to define Policy Propagation Rules (PPR) by associating policies to data flow steps, this activity results in a huge number of rules to be stored and managed. In a recent paper, we introduced strategies for reducing the size of a PPR knowledge base by using an ontology of the possible relations between data objects, the Datanode ontology, and applying the (A)AAAA methodology, a knowledge engineering approach that exploits Formal Concept Analysis (FCA). In this article, we investigate whether this reasoning is feasible and how it can be performed. For this purpose, we study the impact of compressing a rule base associated with an inference mechanism on the performance of the reasoning process. Moreover, we report on an extension of the (A)AAAA methodology that includes a coherency check algorithm, that makes this reasoning possible. We show how this compression, in addition to being beneficial to the management of the knowledge base, also has a positive impact on the performance and resource requirements of the reasoning process for policy propagation
Flabase: towards the creation of a flamenco music knowledge base
Online information about flamenco music is scattered overdifferent sites and knowledge bases. Unfortunately, thereis no common repository that indexes all these data. Inthis work, information related to flamenco music is gath-ered from general knowledge bases (e.g., Wikipedia, DB-pedia), music encyclopedias (e.g., MusicBrainz), and spe-cialized flamenco websites, and is then integrated into anew knowledge base called FlaBase. As resources fromdifferent data sources do not share common identifiers, aprocess of pair-wise entity resolution has been performed.FlaBase contains information about 1,174 artists, 76pa-los(flamenco genres), 2,913 albums, 14,078 tracks, and771 Andalusian locations. It is freely available in RDF andJSON formats. In addition, a method for entity recognitionand disambiguation for FlaBase has been created. The sys-tem can recognize and disambiguate FlaBase entity refer-ences in Spanish texts with an f-measure value of 0.77. Weapplied it to biographical texts present in Flabase. By usingthe extracted information, the knowledge base is populatedwith relevant information and a semantic graph is createdconnecting the entities of FlaBase. Artists relevance is thencomputed over the graph and evaluated according to a fla-menco expert criteria. Accuracy of results shows a highdegree of quality and completeness of the knowledge base
- …
