13 research outputs found
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called matching dependencies (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating three
components of ER: (a) Classifiers for duplicate/non-duplicate record pairs
built using machine learning (ML) techniques, (b) MDs for supporting both the
blocking phase of ML and the merge itself; and (c) The use of the declarative
language LogiQL -an extended form of Datalog supported by the LogicBlox
platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
Detailed Investigation on Strategies Developed for Effective Discovery of Matching Dependencies
ABSTRACT: This paper details about various methods prevailing in literature for efficient discovery of matching dependencies. The concept of matching dependencies (MDs) has recently been proposed for specifying matching rules for object identification. Similar to the functional dependencies with conditions, MDs can also be applied to various data quality applications such as detecting the violations of integrity constraints. The problem of discovering similarity constraints for matching dependencies from a given database instance is taken into consideration. This survey would promote a lot of research in the area of information mining
On Multiple Semantics for Declarative Database Repairs
We study the problem of database repairs through a rule-based framework that
we refer to as Delta Rules. Delta Rules are highly expressive and allow
specifying complex, cross-relations repair logic associated with Denial
Constraints, Causal Rules, and allowing to capture Database Triggers of
interest. We show that there are no one-size-fits-all semantics for repairs in
this inclusive setting, and we consequently introduce multiple alternative
semantics, presenting the case for using each of them. We then study the
relationships between the semantics in terms of their output and the complexity
of computation. Our results formally establish the tradeoff between the
permissiveness of the semantics and its computational complexity. We
demonstrate the usefulness of the framework in capturing multiple data repair
scenarios for an Academic Search database and the TPC-H databases, showing how
using different semantics affects the repair in terms of size and runtime, and
examining the relationships between the repairs. We also compare our approach
with SQL triggers and a state-of-the-art data repair system
Data Cleaning and Query Answering with Matching Dependencies and Matching Functions
Matching dependencies were recently introduced as declarative rules for data cleaning and entity resolution. Enforcing a matching dependency on a database instance identifies the values of some attributes for two tuples, provided that the values of some other attributes are sufficiently similar. Assuming the existence of matching functions for making two attribute values equal, we formally introduce the process of cleaning an instance using matching dependencies, as a chase-like procedure. We show that matching functions naturally introduce a lattice structure on attribute domains, and a partial order of semantic domination between instances. Using the latter, we define the semantics of clean query answering in terms of certain/possible answers as the greatest lower bound/least upper bound of all possible answers obtained from the clean instances. We show that clean query answering is intractable in general. Then we study queries that behave monotonically w. r. t. semantic domination order, and show that we can provide an under/over approximation for clean answers to monotone queries. Moreover, non-monotone positive queries can be relaxed into monotone queries
Employee Job Satisfaction and Employees\u27 Voluntary Turnover Intentions (VTIs)
Within the U.S. sales industry, organizational productivity has decreased due to employee job dissatisfaction and increased voluntary turnover intentions (VTIs). Some leaders in the industry lack knowledge about the relationship between intrinsic and extrinsic job satisfaction, and the negative effect on employees\u27 VTIs. The purpose of this correlational study was to examine whether intrinsic and extrinsic job satisfaction significantly predicted retail sales employees\u27 VTIs. The Minnesota Satisfaction Questionnaire (MSQ) and the Turnover Intentions Scale (TIS-6) were used to collect data from full- or part-time employees in the U.S. retail sales industry. The theoretical framework was based on Herzberg\u27s motivation-hygiene theory. The results of a multiple regression analysis indicated that a combination of intrinsic and extrinsic job satisfaction, F (2, 87) = 3.51, p = .034, R2 = .08), significantly predicted employees\u27 VTIs. However, extrinsic job satisfaction (t = 2.05, p = .034) was the only statistically significant predictor. Business leaders, who understand the factors that increase extrinsic job satisfaction, may increase retention within the organization, provide workforce stability, improve organizational and economic growth, and decrease costs related to job satisfaction and VTIs. The implications for social change include helping to reduce the economy\u27s unemployment rate and improve relationships between the employees, their families, and their communities include (a) improving employees\u27 and stakeholders\u27 perceptions of their organization in the community and (b) improving employees\u27 well-being by understanding the job satisfaction factors that improve their morale