95,335 research outputs found

    Rule-based information integration

    Get PDF
    In this report, we show the process of information integration. We specifically discuss the language used for integration. We show that integration consists of two phases, the schema mapping phase and the data integration phase. We formally define transformation rules, conversion, evolution and versioning. We further discuss the integration process from a data point of view

    The schema coercion problem

    Get PDF
    Journal ArticleOver the past decade, the ability to incorporate data from a wide variety of sources has become increasingly important to database users. To meet this need, significant effort has been expended in automatic database schema manipulation. However, to date this effort has focused on two aspects of this problem: schema integration and schema evolution. Schema integration results in a unified view of several databases, while schema evolution enhances an existing database design to represent additional information. This work defines and addresses a third problem, schema coercion, which defines a mapping from one database to another. This paper presents an overview of the problems associated with schema coercion and how they correspond to the problems encountered by schema integration and schema evolution. In addition, our approach to this problem is outlined. The feasibility of this approach is demonstrated by a tool which reduces the human interaction required at all steps in the integration process. The database schemata are automatically read and converted into corresponding ER representations. Then, a correspondence identification heuristic is used to identify similar concepts, and create mappings between them. Finally, a program is generated to perform the data transfer. This tool has successfully been used to coerce the Haemophilus and Methanococcus genomes from the Genbank ASN.l database to the Utah Center for Human Genome Research database. Our comprehensive approach to addressing the schema coercion problem has proven extremely valuable in reducing the interaction required to define coercions, particularly when the heuristics are unsuccessful

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

    Distribution of the Object Oriented Databases. A Viewpoint of the MVDB Model's Methodology and Architecture

    Get PDF
    In databases, much work has been done towards extending models with advanced tools such as view technology, schema evolution support, multiple classification, role modeling and viewpoints. Over the past years, most of the research dealing with the object multiple representation and evolution has proposed to enrich the monolithic vision of the classical object approach in which an object belongs to one hierarchy class. In particular, the integration of the viewpoint mechanism to the conventional object-oriented data model gives it flexibility and allows one to improve the modeling power of objects. The viewpoint paradigm refers to the multiple descriptions, the distribution, and the evolution of object. Also, it can be an undeniable contribution for a distributed design of complex databases. The motivation of this paper is to define an object data model integrating viewpoints in databases and to present a federated database architecture integrating multiple viewpoint sources following a local-as-extended-view data integration approach.object-oriented data model, OQL language, LAEV data integration approach, MVDB model, federated databases, Local-As-View Strategy.

    An Approach to Conceptual Schema Evolution

    Get PDF
    In this work we will analyse conceptual foundations of user centric content management. Content management often involves integration of content that was created from different points of view. Current modeling techniques and especially current systems lack of a sufficient support of handling these situations. Although schema integration is undecideable in general, we will introduce a conceptual model together with a modeling and maintenance methodology that simplifies content integration in many practical situations. We will define a conceptual model based on the Higher-Order Entity Relationship Model that combines advantages of schema oriented modeling techniques like ER modeling with element driven paradims like approaches for semistructured data management. This model is ready to support contextual reasoning based on local model semantics. For the special case of schema evolution based on schema versioning we will derive the compatibility relation between local models by tracking dependencies of schema revisions. Additionally, we will discuss implementational facets, such as storage aspects for structurally flexible content or generation of adaptive user interfaces based on a conceptual interaction model

    SemLinker: automating big data integration for casual users

    Get PDF
    A data integration approach combines data from different sources and builds a unified view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and interposition from experts or skilled users. SemLinker, an ontology-based data integration system, is part of a metadata management framework for personal data lake (PDL), a personal store-everything architecture. PDL is for casual and unskilled users, therefore SemLinker adopts an automated data integration workflow to minimize manual input requirements. To support the flat architecture of a lake, SemLinker builds and maintains a schema metadata level without involving any physical transformation of data during integration, preserving the data in their native formats while, at the same time, allowing them to be queried and analyzed. Scalability, heterogeneity, and schema evolution are big data integration challenges that are addressed by SemLinker. Large and real-world datasets of substantial heterogeneities are used in evaluating SemLinker. The results demonstrate and confirm the integration efficiency and robustness of SemLinker, especially regarding its capability in the automatic handling of data heterogeneities and schema evolutions

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems

    Full text link
    Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.Comment: Preprint submitted to Information Systems. 35 page
    • …
    corecore