14 research outputs found

    RefConcile – automated online reconciliation of bibliographic references

    Get PDF
    Comprehensive bibliographies often rely on community contributions. In such a setting, de-duplication is mandatory for the bibliography to be useful. Ideally, it works online, i.e., during the addition of new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aimed specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately

    A Latent Class Approach for Allocation of Employees to Local Units

    No full text

    Towards Evaluating an Ontology-Based Data Matching Strategy for Retrieval and Recommendation of Security Annotations for Business Process Models

    No full text
    International audienceIn the Trusted Architecture for Securely Shared Services (TAS3) EC FP7 project we have developed a method to provide semantic support to the process modeler during the design of secure business process models. Its supporting tool, called Knowledge Annotator (KA), is using ontology-based data matching algorithms and strategy in order to infer the recommendations the best fitted to the user design intent, from a dedicated knowledge base. The paper illustrates how the strategy is used to perform the similarity (matching) check in order to retrieve the best design recommendation. We select the security and privacy domain for trust policy specification for the concept illustration. Finally, the paper discusses the evaluation of the results using the Ontology-based Data Matching Framework evaluation benchmark

    Outlier Protection in Continuous Microdata Masking

    No full text
    Masking methods protect data sets against disclosure by perturbing the original values before publication. Masking causes some information loss (masked data are not exactly the same as original data) and does not completely suppress the risk of disclosure for the individuals behind the data set. Information loss can be measured by observing the di#erences between original and masked data while disclosure risk can be measured by means of record linkage and confidentiality intervals

    An Efficient Duplicate Record Detection Using q-Grams Array Inverted Index

    No full text
    Duplicate record detection is a crucial task for data cleaning process in data warehouse systems. Many approaches have been presented to address this problem: some of these rely on the accuracy of the resulted records, others focus on the efficiency of the comparison process. Following the first direction, we introduce two similarity functions based on the concept of q-grams that contribute to improve accuracy of duplicate detection process with respect to other well known measures. We also reduce the number and the running time of record comparisons by building an inverted index on a sorted list of q-grams, named q-grams array. Then, we extend this approach to perform a clustering process based on the proposed q-grams array. Finally, an experimental analysis on synthetic and real data shows the efficiency of the novel indexing method for both record comparison process and clustering
    corecore