28,630 research outputs found
Completeness and Consistency Analysis for Evolving Knowledge Bases
Assessing the quality of an evolving knowledge base is a challenging task as
it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data
sources, it is impractical to manually curate the data, and challenging to
continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to
evolving knowledge bases: (i) identification of completeness issues using
knowledge base evolution analysis, and (ii) identification of consistency
issues based on integrity constraints, such as minimum and maximum cardinality,
and range constraints.
For completeness analysis, we use data profiling information from consecutive
knowledge base releases to estimate completeness measures that allow predicting
quality issues. Then, we perform consistency checks to validate the results of
the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a
subset of datasets from both DBpedia and 3cixty knowledge bases. The
performance of the approach is evaluated using precision, recall, and F1 score.
From completeness analysis, we observe a 94% precision for the English DBpedia
KB and 95% precision for the 3cixty Nice KB. We also assessed the performance
of our consistency analysis by using five learning models over three sub-tasks,
namely minimum cardinality, maximum cardinality, and range constraint. We
observed that the best performing model in our experimental setup is the Random
Forest, reaching an F1 score greater than 90% for minimum and maximum
cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic
Discovering Implicational Knowledge in Wikidata
Knowledge graphs have recently become the state-of-the-art tool for
representing the diverse and complex knowledge of the world. Examples include
the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or
Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata.
A distinguishing feature of Wikidata is that the knowledge is collaboratively
edited and curated. While this greatly enhances the scope of Wikidata, it also
makes it impossible for a single individual to grasp complex connections
between properties or understand the global impact of edits in the graph. We
apply Formal Concept Analysis to efficiently identify comprehensible
implications that are implicitly present in the data. Although the complex
structure of data modelling in Wikidata is not amenable to a direct approach,
we overcome this limitation by extracting contextual representations of parts
of Wikidata in a systematic fashion. We demonstrate the practical feasibility
of our approach through several experiments and show that the results may lead
to the discovery of interesting implicational knowledge. Besides providing a
method for obtaining large real-world data sets for FCA, we sketch potential
applications in offering semantic assistance for editing and curating Wikidata
Enriching Knowledge Bases with Counting Quantifiers
Information extraction traditionally focuses on extracting relations between
identifiable entities, such as . Yet, texts
often also contain Counting information, stating that a subject is in a
specific relation with a number of objects, without mentioning the objects
themselves, for example, "California is divided into 58 counties". Such
counting quantifiers can help in a variety of tasks such as query answering or
knowledge base curation, but are neglected by prior work. This paper develops
the first full-fledged system for extracting counting information from text,
called CINEX. We employ distant supervision using fact counts from a knowledge
base as training seeds, and develop novel techniques for dealing with several
challenges: (i) non-maximal training seeds due to the incompleteness of
knowledge bases, (ii) sparse and skewed observations in text sources, and (iii)
high diversity of linguistic patterns. Experiments with five human-evaluated
relations show that CINEX can achieve 60% average precision for extracting
counting information. In a large-scale experiment, we demonstrate the potential
for knowledge base enrichment by applying CINEX to 2,474 frequent relations in
Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct
relations, which is 28% more than the existing Wikidata facts for these
relations.Comment: 16 pages, The 17th International Semantic Web Conference (ISWC 2018
Fact-Free Learning
People may be surprised by noticing certain regularities that hold in existing knowledge they have had for some time. That is, they may learn without getting new factual information. We argue that this can be partly explained by computational complexity. We show that, given a knowledge base, finding a small set of variables that obtain a certain value of R2 is computationally hard, in the sense that this term is used in computer science. We discuss some of the implications of this result and of fact-free learning in general.Learning, Behavioral Economics
The Ontic Probability Interpretation of Quantum Theory - Part I: The Meaning of Einstein's Incompleteness Claim
Ignited by Einstein and Bohr a century ago, the philosophical struggle about Reality is yet unfinished, with no signs of a swift resolution. Despite vast technological progress fueled by the iconic EPR paper (EPR), the intricate link between ontic and epistemic aspects of Quantum Theory (QT) has greatly hindered our grip on Reality and further progress in physical theory. Fallacies concealed by tortuous logical negations made EPR comprehension much harder than it could have been had Einstein written it himself in German. It is plagued with preconceptions about what a physical property is, the 'Uncertainty Principle', and the Principle of Locality. Numerous interpretations of QT vis Ă vis Reality exist and are keenly disputed. This is the first of a series of articles arguing for a physical interpretation called âThe Ontic Probability Interpretationâ (TOPI). A gradual explanation of TOPI is given intertwined with a meticulous logico-philosophical scrutiny of EPR. Part I focuses on the meaning of Einsteinâs âIncompletenessâ claim. A conceptual confusion, a preconception about Reality, and a flawed dichotomy are shown to be severe obstacles for the EPR argument to succeed. Part II analyzes Einsteinâs âIncompleteness/Nonlocality Dilemmaâ. Future articles will further explain TOPI, demonstrating its soundness and potential for nurturing theoretical progress
- âŠ