7 research outputs found

    Survey: Models and Prototypes of Schema Matching

    Get PDF
    Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes

    Matcher Composition Methods for Automatic Schema Matching

    Full text link
    We address the problem of automating the process of deciding whether two data schema ele-ments match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component match-ers, and the careful selection of optimization switches can improve matching accuracy even further

    Multifaceted exploitation of metadata for attribute match discovery in information integration

    No full text
    Automating semantic matching of attributes for the purpose of information integration is challenging, and the dynamics of the Web further exacerbate this problem. Believing that many facets of metadata can contribute to a resolution, we present a framework for multifaceted exploitation of metadata in which we gather information about potential matches from various facets of metadata and combine this information to generate and place confidence values on potential attribute matches. To make the framework apply in the highly dynamic Web environment, we base our process largely on machine learning. Experiments we have conducted are encouraging, showing that when the combination of facets converges as expected, the results are highly reliable. 1

    Finding compositions of transformations for software re-use

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2007.Includes bibliographical references (leaves 77-83).As organizations collect and store more information, data integration is becoming increasingly problematic. For example, nearly 70% of respondents to a recent global survey of IT workers and business users called data integration a high inhibitor of new application implementation. A number of frameworks and tools have been developed to enable data integration tasks. The most prominent include schema matching, use of ontologies and logic-based techniques. A joint project by UFL and MIT, Morpheus, has attacked the same problem with a unique emphasis on re-use and sharing. In the first part of the thesis, we try to define software re-use and sharing in the context of data integration and contrast this approach with existing integration techniques. We synthesize previous work in the field with our experience demoing Morpheus to an audience of research labs and companies. At the heart of a system with re-usable components is browsing and searching capabilities. The second part of this thesis describes TransformScout, a transform composition search engine that automates composition of re-usable components. Similarity and quality metrics have been formulated for recommending the users with a ranked collection of composite transforms. In addition, the system learns from user feedback to improve the quality of the query results. We conducted a user study to both evaluate Morpheus as a system and to assess TransformScout's performance in helping completing programming tasks. Results indicate that software re-use with Morpheus and TransformScout has helped the user perform the programming tasks faster. Moreover, TransformScout was useful in aiding the users with completing the tasks more reliably.by Mujde Pamuk.S.M

    Gestion de métadonnées utilisant tissage et transformation de modèles

    Get PDF
    The interaction and interoperability between different data sources is a major concern in many organizations. The different formats of data, APIs, and architectures increases the incompatibilities, in a way that interoperability and interaction between components becomes a very difficult task. Model driven engineering (MDE) is a paradigm that enables diminishing interoperability problems by considering every entity as a model. MDE platforms are composed of different kinds of models. Some of the most important kinds of models are transformation models, which are used to define fixed operations between different models. In addition to fixed transformation operations, there are other kinds of interactions and relationships between models. A complete MDE solution must be capable of handling different kinds of relationships. Until now, most research has concentrated on studying transformation languages. This means additional efforts must be undertaken to study these relationships and their implications on a MDE platform. This thesis studies different forms of relationships between models elements. We show through extensive related work that the major limitation of current solutions is the lack of genericity, extensibility and adaptability. We present a generic MDE solution for relationship management called model weaving. Model weaving proposes to capture different kinds of relationships between model elements in a weaving model. A weaving model conforms to extensions of a core weaving metamodel that supports basic relationship management. After proposing the unification of the conceptual foundations related to model weaving, we show how weaving models and transformation models are used as a generic approach for data interoperability. The weaving models are used to produce model transformations. Moreover, we present an adaptive framework for creating weaving models in a semi-automatic way. We validate our approach by developing a generic and adaptive tool called ATLAS Model Weaver (AMW), and by implementing several use cases from different application scenarios.L'interaction et l'interopérabilité entre différentes sources de données sont une préoccupation majeure dans plusieurs organisations. Ce problème devient plus important encore avec la multitude de formats de données, APIs et architectures existants. L'ingénierie dirigée par modèles (IDM) est un paradigme relativement nouveau qui permet de diminuer ces problèmes d'interopérabilité. L'IDM considère toutes les entités d'un système comme un modèle. Les plateformes IDM sont composées par des types de modèles différents. Les modèles de transformation sont des acteurs majeurs de cette approche. Ils sont utilisés pour définir des opérations entre modèles. Par contre, il y existe d'autres types d'interactions qui sont définies sur la base des liens. Une solution d'IDM complète doit supporter des différents types de liens. Les recherches en IDM se sont centrées dans l'étude des transformations de modèles. Par conséquence, il y a beaucoup de travail concernant différents types des liens, ainsi que leurs implications dans une plateforme IDM. Cette thèse étudie des formes différentes de liens entre les éléments de modèles différents. Je montre, à partir d'une étude des nombreux travaux existants, que le point le plus critique de ces solutions est le manque de généricité, extensibilité et adaptabilité. Ensuite, je présente une solution d'IDM générique pour la gestion des liens entre les éléments de modèles. La solution s'appelle le tissage de modèles. Le tissage de modèles propose l'utilisation de modèles de tissage pour capturer des types différents de liens. Un modèle de tissage est conforme à un métamodèle noyau de tissage. J'introduis un ensemble des définitions pour les modèles de tissage et concepts liés. Ensuite, je montre comment les modèles de tissage et modèles de transformations sont une solution générique pour différents problèmes d'interopérabilité des données. Les modèles de tissage sont utilisés pour générer des modèles de transformations. Ensuite, je présente un outil adaptive et générique pour la création de modèles de tissage. L'approche sera validée en implémentant un outil de tissage appel

    Geospatial Computing: Architectures and Algorithms for Mapping Applications

    Get PDF
    Beginning with the MapTube website (1), which was launched in 2007 for crowd-sourcing maps, this project investigates approaches to exploratory Geographic Information Systems (GIS) using web-based mapping, or ‘web GIS’. Users can log in to upload their own maps and overlay different layers of GIS data sets. This work looks into the theory behind how web-based mapping systems function and whether their performance can be modelled and predicted. One of the important questions when dealing with different geospatial data sets is how they relate to one another. Internet data stores provide another source of information, which can be exploited if more generic geospatial data mining techniques are developed. The identification of similarities between thousands of maps is a GIS technique that can give structure to the overall fabric of the data, once the problems of scalability and comparisons between different geographies are solved. After running MapTube for nine years to crowd-source data, this would mark a natural progression from visualisation of individual maps to wider questions about what additional knowledge can be discovered from the data collected. In the new ‘data science’ age, the introduction of real-time data sets introduces a new challenge for web-based mapping applications. The mapping of real-time geospatial systems is technically challenging, but has the potential to show inter-dependencies as they emerge in the time series. Combined geospatial and temporal data mining of realtime sources can provide archives of transport and environmental data from which to accurately model the systems under investigation. By using techniques from machine learning, the models can be built directly from the real-time data stream. These models can then be used for analysis and experimentation, being derived directly from city data. This then leads to an analysis of the behaviours of the interacting systems. (1) The MapTube website: http://www.maptube.org