Search CORE

11 research outputs found

A Declarative Framework for Linking Entities

Author: Burdick Douglas
Fagin Ronald
Kolaitis Phokion G.
Popa Lucian
Tan Wang-Chiew
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Conference on Database Theory (ICDT 2015)
Publication date: 01/01/2015
Field of study

The aim of this paper is to introduce and develop a truly declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on the systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons as to why two entities should be linked. We also consider extensions of the core language that capture collective entity resolution, by allowing inter-dependence between links. We identify a class of "good" solutions for entity linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions, and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language, but may become intractable once we allow inter-dependence between link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks

Dagstuhl Research Online Publication Server

Knowledge Refinement via Rule Selection

Author: Kolaitis Phokion G.
Popa Lucian
Qian Kun
Publication venue
Publication date: 28/01/2019
Field of study

In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Combining global and local merges in logic-based entity resolution

Author: Bienvenu Meghyn
Cima Gianluca
Gutierrez Basulto Victor
Ibanez Garcia Yazmin
Publication venue
Publication date
Field of study

Online Research @ Cardiff

Combining Global and Local Merges in Logic-based Entity Resolution

Author: Bienvenu Meghyn
Cima Gianluca
Gutiérrez-Basulto Víctor
Ibáñez-García Yazmín
Publication venue
Publication date: 29/05/2023
Field of study

In the recently proposed Lace framework for collective entity resolution, logical rules and constraints are used to identify pairs of entity references (e.g. author or paper ids) that denote the same entity. This identification is global: all occurrences of those entity references (possibly across multiple database tuples) are deemed equal and can be merged. By contrast, a local form of merge is often more natural when identifying pairs of data values, e.g. some occurrences of 'J. Smith' may be equated with 'Joe Smith', while others should merge with 'Jane Smith'. This motivates us to extend Lace with local merges of values and explore the computational properties of the resulting formalism.Comment: Accepted at KR 202

arXiv.org e-Print Archive

A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases

Author: Fagin Ronald
Kolaitis Phokion G.
Lembo Domenico
Popa Lucian
Scafoglieri Federico
Publication venue
Publication date: 13/03/2023
Field of study

We propose a new framework for combining entity resolution and query answering in knowledge bases (KBs) with tuple-generating dependencies (tgds) and equality-generating dependencies (egds) as rules. We define the semantics of the KB in terms of special instances that involve equivalence classes of entities and sets of values. Intuitively, the former collect all entities denoting the same real-world object, while the latter collect all alternative values for an attribute. This approach allows us to both resolve entities and bypass possible inconsistencies in the data. We then design a chase procedure that is tailored to this new framework and has the feature that it never fails; moreover, when the chase procedure terminates, it produces a universal solution, which in turn can be used to obtain the certain answers to conjunctive queries. We finally discuss challenges arising when the chase does not terminate

arXiv.org e-Print Archive

LACE: A Logical Approach to Collective Entity resolution

Author: Bienvenu Meghyn
Cima Gianluca
Gutierrez Basulto Victor
Publication venue
Publication date
Field of study

Online Research @ Cardiff

Recommended from our members

Relational Learning over Dirty Data Using Data Constraints

Author: Lee Ga Young
Publication venue: 'Oregon State University'
Publication date
Field of study

Real-world datasets are dirty and contain many errors. Examples of these issues are violations of integrity constraints, duplicates, and inconsistencies in representing data values and entities. Applying machine learning on dirty databases may lead to inaccurate results. Users have to spend a lot of time and effort repairing data errors and creating a clean learning database. Moreover, as the information required to fix these errors is not often available, there may be numerous possible clean versions for a dirty database. We propose DLearn, a novel relational learning system that learns directly over dirty databases effectively and efficiently without any preprocessing. DLearn leverages database constraints, such as functional dependency and matching dependency, to learn accurate relational models over inconsistent and heterogeneous data. Its learned models using the unique data properties represent patterns over all possible clean instances of the data in a usable form. Our empirical study indicates that DLearn learns accurate models over large real-world databases efficiently

ScholarsArchive@OSU

REPLACE: A logical framework for combining collective entity resolution and repairing

Author: Bienvenu Meghyn
Cima Gianluca
Gutierrez Basulto Victor
Publication venue
Publication date
Field of study

Online Research @ Cardiff