24 research outputs found

    Investigations on Methods Developed for Effective Discovery of Functional Dependencies

    Get PDF
    ABSTRACT: This paper details about various methods to discover functional dependencies from data.Effective pruning for the discovery of conditional functional dependencies is discussed in detail. Di conditional Functional Dependencies and Fast FDs a heuristic-driven, Depth-first algorithm for mining FD from relation instances are elaborated. Privacy preserving publishing micro data with Full Functional Dependencies and Conditional functional dependencies for capturing data inconsistencies are examined. The approximation measures for functional dependencies and the complexity of inferring functional dependencies are also observed. Compression -Based Evaluation of partial determinations is portrayed. This survey would promote a lot of research in the area of mining functional dependencies from data

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    Language-independent link key-based data interlinking

    No full text
    david2015aInternational audienceLinks are important for the publication of RDF data on the web. Yet, establishing links between data sets is not an easy task. We develop an approach for that purpose which extracts weak link keys. Link keys extend the notion of a key to the case of different data sets. They are made of a set of pairs of properties belonging to two different classes. A weak link key holds between two classes if any resources having common values for all of these properties are the same resources. An algorithm is proposed to generate a small set of candidate link keys. Depending on whether some of the, valid or invalid, links are known, we define supervised and non supervised measures for selecting the appropriate link keys. The supervised measures approximate precision and recall, while the non supervised measures are the ratio of pairs of entities a link key covers (coverage), and the ratio of entities from the same data set it identifies (discrimination). We have experimented these techniques on two data sets, showing the accuracy and robustness of both approaches

    Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

    Get PDF
    Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for more generalized dependencies, called approximate conditional functional dependencies (ACFDs). This paper analyzes the weaknesses of dependency degree, confidence and conviction measures for general CFDs (constant and variable CFDs). A new measure for general CFDs based on incomplete knowledge granularity is proposed to measure the approximation of these dependencies as well as the distribution of data tuples into the conditional equivalence classes. Finally, the effectiveness of stripped conditional partitions and this new measure are evaluated on synthetic and real data sets. These results are important to the study of theory of approximation dependencies and improvement of discovery algorithms of CFDs and ACFDs

    Conceptual Based Hidden Data Analytics and Reduction Method for System Interface Enhancement Through Handheld devices

    Get PDF
    With the increasing demand placed on online systems by users, many organizations and companies are seeking to enhance their online interfaces to facilitate the search process on their hidden databases. Usually, users issue queries to a hidden database by using the search template provided by the system. In this thesis, a new approach based mainly on hidden database reduction preserving functional dependencies is developed for enhancing the online system interface through a small screen device. The developed approach is applied to online market systems like eBay. Offline hidden data analysis is used to discover attributes and their domains and different functional dependencies. In this thesis, a comparative study between several methods for mining functional dependencies shows the advantage of conceptual methods for data reduction. In addition, by using online consecutive reductions on search results, we adopted a method of displaying results in order of decreasing relevance. The validation of the proposed designed and developed methods prove their generality and suitability for system interfacing through continuous data reductions.NPRP-07-794-1-145 grant from the Qatar National Research Fund (a member of Qatar foundation

    Détection de clefs pour l'interconnexion et le nettoyage de jeux de données

    Get PDF
    International audienceCet article propose une méthode d'analyse de jeux de données du Web publiés en RDF basée sur les dépendances de clefs. Ce type particulier de dépendances fonctionnelles, largement étudié dans la théorie des bases de données, permet d'évaluer si un ensemble de propriétés constitue une clef pour l'ensemble de données considéré. Si c'est le cas, il n'y aura alors pas deux instances possédant les mêmes valeurs pour ces propriétés. Après avoir donné les définitions nécessaires, nous proposons un algorithme de détection des clefs minimales sur un jeu de données RDF. Nous utilisons ensuite cet algorithme pour détecter les clefs de plusieurs jeux de données publiées sur le Web et appliquons notre approche pour deux applications : (1) réduire le nombre de propriétés à comparer dans le but de détecter des ressources identiques entre deux jeux de données, et (2) détecter des erreurs à l'intérieur d'un jeu de données
    corecore