5 research outputs found

    SAKey: Scalable almost key discovery in RDF data

    Get PDF
    Exploiting identity links among RDF resources allows applications to efficiently integrate data. Keys can be very useful to discover these identity links. A set of properties is considered as a key when its values uniquely identify resources. However, these keys are usually not available. The approaches that attempt to automatically discover keys can easily be overwhelmed by the size of the data and require clean data. We present SAKey, an approach that discovers keys in RDF data in an efficient way. To prune the search space, SAKey exploits characteristics of the data that are dynamically detected during the process. Furthermore, our approach can discover keys in datasets where erroneous data or duplicates exist (i.e., almost keys). The approach has been evaluated on different synthetic and real datasets. The results show both the relevance of almost keys and the efficiency of discovering them

    KD2R: a Key Discovery method for semantic Reference Reconciliation in OWL

    Get PDF
    The reference reconciliation problem consists of deciding whether different identifiers refer to the same world entity. Some existing reference reconciliation approaches use key constraints to infer reconciliation decisions. In the context of the Linked Open Data, this knowledge is not available. In this master thesis we propose KD2R, a method which allows automatic discovery of key constraints associated to OWL2 classes. These keys are discovered from RDF data which can be incomplete. The proposed algorithm allows this discovery without having to scan all the data. KD2R has been tested on data sets of the international contest OAEI and obtains promising results.Le problème de réconciliation de référence consiste à décider si des identifiants différents référé à la même entité du monde réel. Certaines approches de réconciliation de référence utilisent des contraintes des clé pour déduire des décisions de réconciliation des références. Dans le contexte des données liées, cette connaissance n'est pas disponible. Dans ce stage de master nous proposons KD2R, une méthode qui permet la découverte automatique des contraintes de clé associées à des classes OWL2. Cette contraintes de cl'e sont découvertes a' partir de données RDF qui peuvent être incomplètes. L'algorithme propos'e permet cette découverte, sans avoir à passer en revue toutes les données. KD2R a été testé sur des jeux de données du concours international OAEI et obtient des résultats prometteurs

    Keys and Pseudo-keys Detection for Web Datasets Cleansing and Interlinking

    Get PDF
    scharffe2012bThis report introduces a novel method for analysing web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a set of properties constitutes a key for the set of data considered. When this is the case, there won't be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. We then use this algorithm to detect keys in datasets published as web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset

    Discovering keys in RDF/OWL dataset with KD2R

    Get PDF
    International audienceKD2R allows the automatic discovery of composite key constraints in RDF data sources that conform to a given ontol-ogy. We consider data sources for which the Unique Name Assumption is fulfilled. KD2R allows this discovery without having to scan all the data. Indeed, the proposed system looks for maximal non keys and derives minimal keys from this set of non keys. KD2R has been tested on several datasets available on the web of data and it has obtained promising results when the discovered keys are used to link data. In the demo, we will demonstrate the functionality of our tool and we will show on several datasets that the keys can be used in a datalinking tool
    corecore