10 research outputs found

    A Survey on Data Integration in Data Warehouse

    Get PDF
    Data warehousing embraces technology of integrating data from multiple distributed data sources and using that at an in annotated and aggregated form to support business decision-making and enterprise management. Although many techniques have been revisited or newly  developed in the context of data warehouses, such as view maintenance and OLAP, little attention has been paid to data mining techniques for supporting the most important and costly tasks of data integration for data warehouse design

    Database Integration: an Overview of Issues and Approaches

    Get PDF
    In many large companies the widespread usage of computers has led a number of different application-specific databases to be installed. As company structures evolve, boundaries between departments move, creating new business units. Their new applications will use existing data from various data stores, rather than new data entering the organization. Henceforth, the ability to make data stores interoperable becomes a crucial factor for the development of new information systems. Data interoperability may come in various degrees. At the lowest level, commercial gateways connect specific pairs of database management systems (DBMSs). Software providing facilities for defining persistent views over different databases [6] simplifies access to distant data but does not support automatic enforcement of consistency constraints among different databases. Full interoperability is achieved by distributed or federated database systems, which support integration of existing data into virtual databases (i.e. databases which are logically defined but not physically materialized). The latter allow existing databases to remain under control of their respective owners, thus supporting a harmonious coexistence of scalable data integration and site autonomy requirements [9]. Federated systems are very popular today. However, before they become marketable, many issues remain to be solved. Design issues focus on either human-centered aspects (cooperative work, including autonomy issues and negotiation procedures) or database-centered aspects (data integration, schema/database evolution). Operational issues investigate system interoperability mainly in terms of support of new transaction types, new query processing algorithms, security concerns, etc. General overviews may be found elsewhere [4, 9]. This paper is devoted to database integration, possibly the most critical issue. Simply stated, database integration is the process which takes as input a set of databases, and produces as output a single unified description of the input schemas (the integrated schema) and the associated mapping information supporting integrated access to existing data through the integrated schema. As such, database integration is also used in the process of re-engineering an exist i ng l egacy system. Database integration has attracted many diverse and diverging contributions. The purpose, and the main intended contribution of this article is to provide a clear picture of what are the approaches and the current solutions and what remains to be achieved

    On Spatial Database Integration

    Get PDF
    This paper investigates the problems that arise when application requirements command that autonomous spatial databases be integrated into a federated one. The paper focuses on the most critical issues raised by the integration of databases of different scales. A short presentation of approaches to interoperability and of the main steps composing the integration process is given first. Next, a general format is proposed for precisely defining correspondences between objects of two databases. The format can deal with a wide range of discrepancies in GIS data. Last, a solution is presented for aggregation conflicts which arise when one object of one database corresponds to a set of objects in the other database, a very frequent case when the databases are of different scales. The method is applied to excerpts of real cartographic databases

    Database Integration: the Key to Data Interoperability

    Get PDF
    Most of new databases are no more built from scratch, but re-use existing data from several autonomous data stores. To facilitate application development, the data to be re-used should preferably be redefined as a virtual database, providing for the logical unification of the underlying data sets. This unification process is called database integration. This chapter provides a global picture of the issues raised and the approaches that have been proposed to tackle the problem

    Semantic validation in spatio-temporal schema integration

    Get PDF
    This thesis proposes to address the well-know database integration problem with a new method that combines functionality from database conceptual modeling techniques with functionality from logic-based reasoners. We elaborate on a hybrid - modeling+validation - integration approach for spatio-temporal information integration on the schema level. The modeling part of our methodology is supported by the spatio-temporal conceptual model MADS, whereas the validation part of the integration process is delegated to the description logics validation services. We therefore adhere to the principle that, rather than extending either formalism to try to cover all desirable functionality, a hybrid system, where the database component and the logic component would cooperate, each one performing the tasks for which it is best suited, is a viable solution for semantically rich information management. First, we develop a MADS-based flexible integration approach where the integrated schema designer has several viable ways to construct a final integrated schema. For different related schema elements we provide the designer with four general policies and with a set of structural solutions or structural patterns within each policy. To always guarantee an integrated solution, we provide for a preservation policy with multi-representation structural pattern. To state the inter-schema mappings, we elaborate on a correspondence language with explicit spatial and temporal operators. Thus, our correspondence language has three facets: structural, spatial, and temporal, allowing to relate the thematic representation as well as the spatial and temporal features. With the inter-schema mappings, the designer can state correspondences between related populations, and define the conditions that rule the matching at the instance level. These matching rules can then be used in query rewriting procedures or to match the instances within the data integration process. We associate a set of putative structural patterns to each type of population correspondence, providing a designer with a patterns' selection for flexible integrated schema construction. Second, we enhance our integration method by employing validation services of the description logic formalism. It is not guaranteed that the designer can state all the inter-schema mappings manually, and that they are all correct. We add the validation phase to ensure validity and completeness of the inter-schema mappings set. Inter-schema mappings cannot be validated autonomously, i.e., they are validated against the data model and the schemas they link. Thus, to implement our validation approach, we translate the data model, the source schemas and the inter-schema mappings into a description logic formalism, preserving the spatial and temporal semantics of the MADS data model. Thus, our modeling approach in description logic insures that the model designer will correctly define spatial and temporal schema elements and inter-schema mappings. The added value of the complete translation (i.e., including the data model and the source schemas) is that we validate not only the inter-schema mappings, but also the compliance of the source schemas to the data model, and infer implicit relationships within them. As the result of the validation procedure, the schema designer obtains the complete and valid set of inter-schema mappings and a set of valid (flexible) schematic patterns to apply to construct an integrated schema that meets application requirements. To further our work, we model a framework in which a schema designer is able to follow our integration method and realize the schema integration task in an assisted way. We design two models, UML and SEAM models, of a system that provides for integration functionalities. The models describe a framework where several tools are employed together, each involved in the service it is best suited for. We define the functionalities and the cooperation between the composing elements of the framework and detail the logics of the integration process in an UML activity diagram and in a SEAM operation model

    Resolving Fragmentation Conflicts in Schema Integration

    No full text
    Research on schema integration leads to the identification of many different conflict types. Some of them received much attention and many papers proposed solutions for their resolution. However, literature usually focuses on traditional problems, whilst new kinds of schema discrepancies, due to the object orientation or the generalization concept, are not really treated. Moreover, most of the proposed methodologies and strategies only allow binary comparisons between items to be integrated. This paper discusses n-ary (also called one-many) conflicts, and particularly the three fragmentation conflict types. These conflict types need specific operators for schema comparison. We propose a simple unified language for easy specification of these conflicts, and give many different techniques to solve them. We emphasize the benefit of separating the declaration of the correspondences from the choice of resolution technique. Our discourse is illustrated with the entity-relati..
    corecore