5 research outputs found

    Managing uncertainty of XML schema matching

    Get PDF
    Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-andconquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. © 2010 IEEE.published_or_final_versionThe IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA., 1-6 March 2010. In International Conference on Data Engineering. Proceedings, 2010, p. 297-30

    Evaluating probabilistic queries over uncertain matching

    Get PDF
    A matching between two database schemas, generated by machine learning techniques (e.g., COMA++), is often uncertain. Handling the uncertainty of schema matching has recently raised a lot of research interest, because the quality of applications rely on the matching result. We study query evaluation over an inexact schema matching, which is represented as a set of 'possible mappings', as well as the probabilities that they are correct. Since the number of possible mappings can be large, evaluating queries through these mappings can be expensive. By observing the fact that the possible mappings between two schemas often exhibit a high degree of overlap, we develop two efficient solutions. We also present a fast algorithm to compute answers with the k highest probabilities. An extensive evaluation on real schemas shows that our approaches improve the query performance by almost an order of magnitude. © 2012 IEEE.published_or_final_versionThe IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, D.C., 1-5 April 2012. In International Conference on Data Engineering Proceedings, 2012, p. 1096-110

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    Managing Uncertainty of XML Schema Matching

    Get PDF
    Abstract — Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of “possible mappings ” between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-andconquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. I
    corecore