24 research outputs found
9th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2021)
International audienceFormal Concept Analysis (FCA) is a mathematically well-founded theory aimed at classification and knowledge discovery that can be used for many purposes in Artificial Intelligence (AI). The objective of the ninth edition of the FCA4AI workshop (see http://www.fca4ai.hse.ru/) is to investigate several issues such as: how can FCA support various AI activities (knowledge discovery, knowledge engineering, machine learning, data mining, information retrieval, recommendation...), how can FCA be extended in order to help AI researchers to solve new and complex problems in their domains, and how FCA can play a role in current trends in AI such as explainable AI and fairness of algorithms in decision making.The workshop was held in co-location with IJCAI 2021, Montréal, Canada, August, 28 2021
Discovery of Link Keys in RDF Data Based on Pattern Structures: Preliminary Steps
International audienceIn this paper, we are interested in the discovery of link keys among two different RDF datasets based on FCA and pattern structures. A link key identifies individuals which represent the same real world entity. Two main strategies are used to automatically discover link keys, ignoring or not the classes to which the individuals belong to. Indeed, a link key may be relevant for some pair of classes and not relevant for another. Then, discovering link keys for one pair of classes at a time may be computationally expensive if every pair should be considered. To overcome such limitations, we introduce a specific and original pattern structure where link keys can be discovered in one pass while specifying the pair of classes associated with each link key, focusing on the discovery process and allowing more flexibility
Uncertain Temporal Knowledge Graphs
Abstract. Temporal data can be found in various sources from patient histories, purchase histories, employee histories, to web logs. Recent advances in open information extraction have paved the way for automatic construction of knowledge graphs (kgs) from such sources. Often the extraction tools used to construct kgs produce facts and rules along with their confidence scores, leading to the notion of uncertain temporal kgs. The facts and rules contained in these graphs tend to be noisy and erroneous due to either the accuracy of the extraction tools or uncertainty in the source data. In this work, we use a numerical extension of Markov logic networks to provide formal syntax and semantics for uncertain temporal kgs. Moreover, we propose a set of datalog constraints with inequalities, that extend the underlying schema of the kgs and help in resolving conflicting facts. Finally, we characterize the complexity of two important queries, maximum a-posteriori and conditional probability inference, for uncertain temporal kgs
Explaining differences between unaligned table snapshots
We study the problem of explaining differences between two snapshots of the same database table including record insertions, deletions and in particular record updates. Unlike existing alternatives, our solution induces transformation functions and does not require knowledge of the correct alignment between the record sets. This allows profiling snapshots of tables with unspecified or modified primary keys. In such a problem setting, there are always multiple explanations for the differences. Our goal is to find the simplest explanation. We propose to measure the complexity of explanations on the basis of minimum description length in order to formulate the task as an optimization problem. We show that the problem is NP-hard and propose a heuristic search algorithm to solve practical problem instances. We implement a prototype called Affidavit to assess the explanatory qualities of our approach in experiments based on different real-world data sets. We show that it can scale to both a large number of records and attributes and is able to reliably provide correct explanations under practical levels of modifications
Knowledge-Based Matching of -ary Tuples
An increasing number of data and knowledge sources are accessible by human
and software agents in the expanding Semantic Web. Sources may differ in
granularity or completeness, and thus be complementary. Consequently, they
should be reconciled in order to unlock the full potential of their conjoint
knowledge. In particular, units should be matched within and across sources,
and their level of relatedness should be classified into equivalent, more
specific, or similar. This task is challenging since knowledge units can be
heterogeneously represented in sources (e.g., in terms of vocabularies). In
this paper, we focus on matching n-ary tuples in a knowledge base with a
rule-based methodology. To alleviate heterogeneity issues, we rely on domain
knowledge expressed by ontologies. We tested our method on the biomedical
domain of pharmacogenomics by searching alignments among 50,435 n-ary tuples
from four different real-world sources. Results highlight noteworthy agreements
and particularities within and across sources
Linkex: A Tool for Link Key Discovery Based on Pattern Structures
abbas2019aInternational audienceLinks constitute the core of Linked Data philosophy. With the high growth of data published in the web, many frameworks have been proposed to deal with the link discovery problem, and particularly the identity links. Finding such kinds of links between different RDF data sets is a critical task. In this position paper, we focus on link key which consists of sets of pairs of properties identifying the same entities across heterogeneous datasets. We also propose to formalize the problem of link key discovery using Pattern Structures (PS), the generalization of Formal Concept Analysis dealing with non binary datasets. After providing the proper definitions of link keys and setting the problem in terms of PS, we show that the intents of the pattern concepts correspond to link keys and their extents to sets of identity links generated by their intents. Finally, we discuss an implementation of this framework and we show the applicability and the scalability of the proposed method
A guided walk into link key candidate extraction with relational concept analysis
International audienceData interlinking is an important task for linked data interoperability. One of the possible techniques for finding links is the use of link keys which generalise relational keys to pairs of RDF models. We show how link key candidates may be directly extracted from RDF data sets by encoding the extraction problem in relational concept analysis. This method deals with non functional properties and circular dependent link key expressions. As such, it generalises those presented for non dependent link keys and link keys over the relational model. The proposed method is able to return link key candidates involving several classes at once
Linkex: A Tool for Link Key Discovery Based on Pattern Structures
abbas2019aInternational audienceLinks constitute the core of Linked Data philosophy. With the high growth of data published in the web, many frameworks have been proposed to deal with the link discovery problem, and particularly the identity links. Finding such kinds of links between different RDF data sets is a critical task. In this position paper, we focus on link key which consists of sets of pairs of properties identifying the same entities across heterogeneous datasets. We also propose to formalize the problem of link key discovery using Pattern Structures (PS), the generalization of Formal Concept Analysis dealing with non binary datasets. After providing the proper definitions of link keys and setting the problem in terms of PS, we show that the intents of the pattern concepts correspond to link keys and their extents to sets of identity links generated by their intents. Finally, we discuss an implementation of this framework and we show the applicability and the scalability of the proposed method
ArCo: the Italian Cultural Heritage Knowledge Graph
ArCo is the Italian Cultural Heritage knowledge graph, consisting of a
network of seven vocabularies and 169 million triples about 820 thousand
cultural entities. It is distributed jointly with a SPARQL endpoint, a software
for converting catalogue records to RDF, and a rich suite of documentation
material (testing, evaluation, how-to, examples, etc.). ArCo is based on the
official General Catalogue of the Italian Ministry of Cultural Heritage and
Activities (MiBAC) - and its associated encoding regulations - which collects
and validates the catalogue records of (ideally) all Italian Cultural Heritage
properties (excluding libraries and archives), contributed by CH administrators
from all over Italy. We present its structure, design methods and tools, its
growing community, and delineate its importance, quality, and impact
Reasoning for the description logic ALC with link keys
Data interlinking is a critical task for widening and enhancing linked open data. One way to tackle data interlinking is to use link keys, which generalise keys to the case of two RDF datasets described using different ontologies. Link keys specify pairs of properties to compare for finding same-as links between instances of two classes of two different datasets. Hence, they can be used for finding links. Link keys can also be considered as logical axioms just like keys, ontologies and ontology alignments. We introduce the logic ALC+LK extending the description logic ALC with link keys. It may be used to reason and infer entailed link keys that may be more useful for a particular data interlinking task. We show that link key entailment can be reduced to consistency checking without introducing the negation of link keys. For deciding the consistency of an ALC+LK ontology, we introduce a new tableau-based algorithm. Contrary to the classical ones, the completion rules concerning link keys apply to pairs of individuals not directly related. We show that this algorithm is sound, complete and always terminates