13 research outputs found

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    Detailed Investigation on Strategies Developed for Effective Discovery of Matching Dependencies

    Get PDF
    ABSTRACT: This paper details about various methods prevailing in literature for efficient discovery of matching dependencies. The concept of matching dependencies (MDs) has recently been proposed for specifying matching rules for object identification. Similar to the functional dependencies with conditions, MDs can also be applied to various data quality applications such as detecting the violations of integrity constraints. The problem of discovering similarity constraints for matching dependencies from a given database instance is taken into consideration. This survey would promote a lot of research in the area of information mining

    On Multiple Semantics for Declarative Database Repairs

    Full text link
    We study the problem of database repairs through a rule-based framework that we refer to as Delta Rules. Delta Rules are highly expressive and allow specifying complex, cross-relations repair logic associated with Denial Constraints, Causal Rules, and allowing to capture Database Triggers of interest. We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and we consequently introduce multiple alternative semantics, presenting the case for using each of them. We then study the relationships between the semantics in terms of their output and the complexity of computation. Our results formally establish the tradeoff between the permissiveness of the semantics and its computational complexity. We demonstrate the usefulness of the framework in capturing multiple data repair scenarios for an Academic Search database and the TPC-H databases, showing how using different semantics affects the repair in terms of size and runtime, and examining the relationships between the repairs. We also compare our approach with SQL triggers and a state-of-the-art data repair system

    Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

    No full text
    Matching dependencies were recently introduced as declarative rules for data cleaning and entity resolution. Enforcing a matching dependency on a database instance identifies the values of some attributes for two tuples, provided that the values of some other attributes are sufficiently similar. Assuming the existence of matching functions for making two attribute values equal, we formally introduce the process of cleaning an instance using matching dependencies, as a chase-like procedure. We show that matching functions naturally introduce a lattice structure on attribute domains, and a partial order of semantic domination between instances. Using the latter, we define the semantics of clean query answering in terms of certain/possible answers as the greatest lower bound/least upper bound of all possible answers obtained from the clean instances. We show that clean query answering is intractable in general. Then we study queries that behave monotonically w. r. t. semantic domination order, and show that we can provide an under/over approximation for clean answers to monotone queries. Moreover, non-monotone positive queries can be relaxed into monotone queries

    Employee Job Satisfaction and Employees\u27 Voluntary Turnover Intentions (VTIs)

    Get PDF
    Within the U.S. sales industry, organizational productivity has decreased due to employee job dissatisfaction and increased voluntary turnover intentions (VTIs). Some leaders in the industry lack knowledge about the relationship between intrinsic and extrinsic job satisfaction, and the negative effect on employees\u27 VTIs. The purpose of this correlational study was to examine whether intrinsic and extrinsic job satisfaction significantly predicted retail sales employees\u27 VTIs. The Minnesota Satisfaction Questionnaire (MSQ) and the Turnover Intentions Scale (TIS-6) were used to collect data from full- or part-time employees in the U.S. retail sales industry. The theoretical framework was based on Herzberg\u27s motivation-hygiene theory. The results of a multiple regression analysis indicated that a combination of intrinsic and extrinsic job satisfaction, F (2, 87) = 3.51, p = .034, R2 = .08), significantly predicted employees\u27 VTIs. However, extrinsic job satisfaction (t = 2.05, p = .034) was the only statistically significant predictor. Business leaders, who understand the factors that increase extrinsic job satisfaction, may increase retention within the organization, provide workforce stability, improve organizational and economic growth, and decrease costs related to job satisfaction and VTIs. The implications for social change include helping to reduce the economy\u27s unemployment rate and improve relationships between the employees, their families, and their communities include (a) improving employees\u27 and stakeholders\u27 perceptions of their organization in the community and (b) improving employees\u27 well-being by understanding the job satisfaction factors that improve their morale
    corecore