86,637 research outputs found

    Automated schema matching techniques: an exploratory study

    Get PDF
    Manual schema matching is a problem for many database applications that use multiple data sources including data warehousing and e-commerce applications. Current research attempts to address this problem by developing algorithms to automate aspects of the schema-matching task. In this paper, an approach using an external dictionary facilitates automated discovery of the semantic meaning of database schema terms. An experimental study was conducted to evaluate the performance and accuracy of five schema-matching techniques with the proposed approach, called SemMA. The proposed approach and results are compared with two existing semi-automated schema-matching approaches and suggestions for future research are made

    Using Element Clustering to Increase the Efficiency of XML Schema Matching

    Get PDF
    Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

    Nonparametric Bayesian Modeling for Automated Database Schema Matching

    Full text link
    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models

    The Role of Schema Salience in Ad Processing and Evaluation

    Get PDF
    Advertising grids such as the Rossiter-Percy grid (Rossiter & Percy 1991, 1997) propose that brand-matching advertising is more effective than brand-mismatching advertising. However, for the match hypothesis to hold the brand schema needs to be salient in ad processing and evaluation. In this study we test how schema salience affects ad processing and evaluation. Two separate experiments were conducted, employing the same brand descriptions and ad scenarios. In the first experiment, the brand schema was made salient in ad processing, whereas in the second experiment the ad schema was made salient. In the first experiment brand(ad combinations were evaluated in line with the Rossiter-Percy advertising grid. If the brand schema was salient, consumers evaluated matching combinations of ad type and brand purchase motivation more favorably than mismatching combinations. In the second experiment, brand(ad combinations were evaluated in accordance with the existing ad schema. This implies that when the ad schema was salient, evaluations of brand(ad combinations were not affected by matches or mismatches between ads and purchase motivations for the brands.The two studies show that evaluation of brand(ad combinations depends on the schema that is salient at the time of information processing. Consequently, brand-matching advertising is effective only if consumers consciously relate ad information to brand knowledge, i.e., if the brand schema is salient in ad processing.advertising;advertising grid;brand perception;matching hypothesis;purchase motivation

    Defining the XML schema matching problem for a personal schema based query answering system

    Get PDF
    In this report, we analyze the problem of personal schema matching. We define the ingredients of the XML schema matching problem using constraint logic programming. This allows us to thourougly investigate specific matching problems. We do not have the ambition to provide for a formalism that covers all kinds of schema matching problems. The target is specifically personal schema matching using XML. The report is organized as follows. Chapter 2 provides a detailed description of our research domain - the Personal Schema Query Answering System. In chapter 3, we introduce a framework for defining the XML schema matching problem. The XML schema matching problem is defined using this framework in chapter 4. An important component of the XML schema matching problem is the objective function, which is investigated in chapter 5. Chapter 6 presents the related research, with conclusions and further research being discussed in chapter 7. Throughout the report, we use expressions like 'schema matching', 'XML schema matching' and 'semantic XML schema matching'. Unless explicitly stated otherwise or strongly suggested by the context of the story, those expressions all refer to the same thing: semantic matching of XML schemas as used in personal schema querying. Furthermore, basic knowledge of the XML-schema language is assumed

    Multilingual Schema Matching for Wikipedia Infoboxes

    Full text link
    Recent research has taken advantage of Wikipedia's multilingualism as a resource for cross-language information retrieval and machine translation, as well as proposed techniques for enriching its cross-language structure. The availability of documents in multiple languages also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle different languages. As a step towards supporting such queries, in this paper, we propose a method for identifying mappings between attributes from infoboxes that come from pages in different languages. Our approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of our approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. We have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does our approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. We also present a case study which demonstrates that the multilingual mappings we derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.Comment: VLDB201

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    Full text link
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Schema Normalization for Improving Schema Matching

    Get PDF
    Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a \u201cmeaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy

    DYMS (Dynamic Matcher Selector) – Scenario-based Schema Matcher Selector

    Get PDF
    Schema matching is one of the main challenges in different information system integration contexts. Over the past 20 years, different schema matching methods have been proposed and shown to be successful in various situations. Although numerous advanced matching algorithms have emerged, schema matching research remains a critical issue. Different algorithms are implemented to resolve different types of schema heterogeneities, including differences in design methodologies, naming conventions, and the level of specificity of schemas, amongst others. The algorithms are usually too generic regardless of the schema matching scenario. This situation indicates that a single matcher cannot be optimized for all matching scenarios. In this research, I proposed a dynamic matcher selector (DYMS) as a probable solution to the aforementioned problem. The proposed DYMS analyzes the schema matching scenario and selects the most appropriate matchers for a given scenario. Selecting matchers are weighted based on the parameter optimization process, which adopts the heuristic learning approach. The DYMS returns the alignment result of input schemas
    corecore