Search CORE

24 research outputs found

Investigations on Methods Developed for Effective Discovery of Functional Dependencies

Author: J Anishkumar
P Andrew
Prof S Balamurugan
S Charanyaa
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT: This paper details about various methods to discover functional dependencies from data.Effective pruning for the discovery of conditional functional dependencies is discussed in detail. Di conditional Functional Dependencies and Fast FDs a heuristic-driven, Depth-first algorithm for mining FD from relation instances are elaborated. Privacy preserving publishing micro data with Full Functional Dependencies and Conditional functional dependencies for capturing data inconsistencies are examined. The approximation measures for functional dependencies and the complexity of inferring functional dependencies are also observed. Compression -Based Evaluation of partial determinations is portrayed. This survey would promote a lot of research in the area of mining functional dependencies from data

CiteSeerX

Efficient Discovery of Ontology Functional Dependencies

Author: Baskaran Sridevi
Chiang Fei
Keller Alexander
Lukasz Golab
Szlichta Jaroslaw
Publication venue
Publication date: 23/05/2017
Field of study

Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Language-independent link key-based data interlinking

Author: Atencia Manuel
David Jérôme
Euzenat Jérôme
Publication venue: HAL CCSD
Publication date: 20/03/2015
Field of study

david2015aInternational audienceLinks are important for the publication of RDF data on the web. Yet, establishing links between data sets is not an easy task. We develop an approach for that purpose which extracts weak link keys. Link keys extend the notion of a key to the case of different data sets. They are made of a set of pairs of properties belonging to two different classes. A weak link key holds between two classes if any resources having common values for all of these properties are the same resources. An algorithm is proposed to generate a small set of candidate link keys. Depending on whether some of the, valid or invalid, links are known, we define supervised and non supervised measures for selecting the appropriate link keys. The supervised measures approximate precision and recall, while the non supervised measures are the ratio of pairs of entities a link key covers (coverage), and the ratio of entities from the same data set it identifies (discrimination). We have experimented these techniques on two data sets, showing the accuracy and robustness of both approaches

Hal - Université Grenoble Alpes

Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

Author: Arch-int Ngamnij
Arch-int Somjit
Duy Tran Anh
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2017
Field of study

Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for more generalized dependencies, called approximate conditional functional dependencies (ACFDs). This paper analyzes the weaknesses of dependency degree, confidence and conviction measures for general CFDs (constant and variable CFDs). A new measure for general CFDs based on incomplete knowledge granularity is proposed to measure the approximation of these dependencies as well as the distribution of data tuples into the conditional equivalence classes. Finally, the effectiveness of stripped conditional partitions and this new measure are evaluated on synthetic and real data sets. These results are important to the study of theory of approximation dependencies and improvement of discovery algorithms of CFDs and ACFDs

Crossref

Institute of Advanced Engineering and Science

Mapeamento Automático em Bases de Dados Usando Data Mining

Author: Vítor Hugo Pereira Moreira da Silva
Publication venue
Publication date: 07/11/2013
Field of study

Repositório Aberto da Universidade do Porto

Conceptual Based Hidden Data Analytics and Reduction Method for System Interface Enhancement Through Handheld devices

Author: Babi Syrinne Adnan
Publication venue
Publication date: 01/01/2016
Field of study

With the increasing demand placed on online systems by users, many organizations and companies are seeking to enhance their online interfaces to facilitate the search process on their hidden databases. Usually, users issue queries to a hidden database by using the search template provided by the system. In this thesis, a new approach based mainly on hidden database reduction preserving functional dependencies is developed for enhancing the online system interface through a small screen device. The developed approach is applied to online market systems like eBay. Offline hidden data analysis is used to discover attributes and their domains and different functional dependencies. In this thesis, a comparative study between several methods for mining functional dependencies shows the advantage of conceptual methods for data reduction. In addition, by using online consecutive reductions on search results, we adopted a method of displaying results in order of decreasing relevance. The validation of the proposed designed and developed methods prove their generality and suitability for system interfacing through continuous data reductions.NPRP-07-794-1-145 grant from the Qatar National Research Fund (a member of Qatar foundation

Qatar University Institutional Repository

Détection de clefs pour l'interconnexion et le nettoyage de jeux de données

Author: David Jérôme
Scharffe François
Publication venue: HAL CCSD
Publication date: 25/06/2012
Field of study

International audienceCet article propose une méthode d'analyse de jeux de données du Web publiés en RDF basée sur les dépendances de clefs. Ce type particulier de dépendances fonctionnelles, largement étudié dans la théorie des bases de données, permet d'évaluer si un ensemble de propriétés constitue une clef pour l'ensemble de données considéré. Si c'est le cas, il n'y aura alors pas deux instances possédant les mêmes valeurs pour ces propriétés. Après avoir donné les définitions nécessaires, nous proposons un algorithme de détection des clefs minimales sur un jeu de données RDF. Nous utilisons ensuite cet algorithme pour détecter les clefs de plusieurs jeux de données publiées sur le Web et appliquons notre approche pour deux applications : (1) réduire le nombre de propriétés à comparer dans le but de détecter des ressources identiques entre deux jeux de données, et (2) détecter des erreurs à l'intérieur d'un jeu de données

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes