4,953 research outputs found

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

    Automated schema matching techniques: an exploratory study

    Get PDF
    Manual schema matching is a problem for many database applications that use multiple data sources including data warehousing and e-commerce applications. Current research attempts to address this problem by developing algorithms to automate aspects of the schema-matching task. In this paper, an approach using an external dictionary facilitates automated discovery of the semantic meaning of database schema terms. An experimental study was conducted to evaluate the performance and accuracy of five schema-matching techniques with the proposed approach, called SemMA. The proposed approach and results are compared with two existing semi-automated schema-matching approaches and suggestions for future research are made

    Data integration through service-based mediation for web-enabled information systems

    Get PDF
    The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Design of the shared Environmental Information System (SEIS) and development of a web-based GIS interface

    Get PDF
    Chapter 5The Shared Environmental Information System (SEIS) is a collaborative initiative of the European Commission (EC) and the European Environment Agency (EEA) aimed to establish an integrated and shared EU-wide environmental information system together with the Member States. SEIS presents the European vision on environmental information interoperability. It is a set of high-level principles & workflow-processes that organize the collection, exchange, and use of environmental data & information aimed to: • Modernise the way in which information required by environmental legislation is made available to member states or EC instruments; • Streamline reporting processes and repeal overlaps or obsolete reporting obligations; • Stimulate similar developments at international conventions; • Standardise according to INSPIRE when possible; and • Introduce the SDI (spatial database infrastructure) principle EU-wide. SEIS is a system and workflow of operations that offers technical capabilities geared to meet concept expectations. In that respect, SEIS shows the way and sets up the workflow effectively in a standardise way (e.g, INSPIRE) to: • Collect Data from Spatial Databases, in situ sensors, statistical databases, earth observation readings (e.g., EOS, GMES), marine observation using standard data transfer protocols (ODBC, SOS, ft p, etc). • Harmonise collected data (including data check/data integrity) according to best practices proven to perform well, according to the INSPIRE Directive 2007/2/EC (1) Annexes I: II: III: plus INSPIRE Implementation Rules for data not specified in above mentioned Annexes. • Harmonise collected data according to WISE (Water Information System from Europe) or Ozone-web. • Process, aggregate harmonise data so to extract information in a format understandable by wider audiences (e.g., Eurostat, enviro-indicators). • Document information to fulfi l national reporting obligations towards EU bodies (e.g., the JRC, EEA, DGENV, Eurostat) • Store and publish information for authorised end-users (e.g., citizens, institutions). This paper presents the development and integration of the SEIS-Malta Geoportal. The first section outlines EU Regulations on INSPIRE and Aarhus Directives. The second covers the architecture and the implementation of SEIS-Malta Geoportal. The third discusses the results and successful implementation of the Geoportal.peer-reviewe
    • …
    corecore