5,318 research outputs found

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

    Integration of Legacy and Heterogeneous Databases

    Get PDF

    The schema coercion problem

    Get PDF
    Journal ArticleOver the past decade, the ability to incorporate data from a wide variety of sources has become increasingly important to database users. To meet this need, significant effort has been expended in automatic database schema manipulation. However, to date this effort has focused on two aspects of this problem: schema integration and schema evolution. Schema integration results in a unified view of several databases, while schema evolution enhances an existing database design to represent additional information. This work defines and addresses a third problem, schema coercion, which defines a mapping from one database to another. This paper presents an overview of the problems associated with schema coercion and how they correspond to the problems encountered by schema integration and schema evolution. In addition, our approach to this problem is outlined. The feasibility of this approach is demonstrated by a tool which reduces the human interaction required at all steps in the integration process. The database schemata are automatically read and converted into corresponding ER representations. Then, a correspondence identification heuristic is used to identify similar concepts, and create mappings between them. Finally, a program is generated to perform the data transfer. This tool has successfully been used to coerce the Haemophilus and Methanococcus genomes from the Genbank ASN.l database to the Utah Center for Human Genome Research database. Our comprehensive approach to addressing the schema coercion problem has proven extremely valuable in reducing the interaction required to define coercions, particularly when the heuristics are unsuccessful

    A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases

    Get PDF
    PhDData stored in individual autonomous databases often needs to be combined and interrelated. For example, in the Inland Water (IW) environment monitoring domain, the spatial and temporal variation of measurements of different water quality indicators stored in different databases are of interest. Data from multiple data sources is more complex to combine when there is a lack of metadata in a computation forin and when the syntax and semantics of the stored data models are heterogeneous. The main types of information retrieval (IR) requirements are query transparency and data harmonisation for data interoperability and support for multiple user views. A combined Semantic Web based and Agent based distributed system framework has been developed to support the above IR requirements. It has been implemented using the Jena ontology and JADE agent toolkits. The semantic part supports the interoperability of autonomous data sources by merging their intensional data, using a Global-As-View or GAV approach, into a global semantic model, represented in DAML+OIL and in OWL. This is used to mediate between different local database views. The agent part provides the semantic services to import, align and parse semantic metadata instances, to support data mediation and to reason about data mappings during alignment. The framework has applied to support information retrieval, interoperability and multi-lateral viewpoints for four European environmental agency databases. An extended GAV approach has been developed and applied to handle queries that can be reformulated over multiple user views of the stored data. This allows users to retrieve data in a conceptualisation that is better suited to them rather than to have to understand the entire detailed global view conceptualisation. User viewpoints are derived from the global ontology or existing viewpoints of it. This has the advantage that it reduces the number of potential conceptualisations and their associated mappings to be more computationally manageable. Whereas an ad hoc framework based upon conventional distributed programming language and a rule framework could be used to support user views and adaptation to user views, a more formal framework has the benefit in that it can support reasoning about the consistency, equivalence, containment and conflict resolution when traversing data models. A preliminary formulation of the formal model has been undertaken and is based upon extending a Datalog type algebra with hierarchical, attribute and instance value operators. These operators can be applied to support compositional mapping and consistency checking of data views. The multiple viewpoint system was implemented as a Java-based application consisting of two sub-systems, one for viewpoint adaptation and management, the other for query processing and query result adjustment

    Proceedings of the ECSCW'95 Workshop on the Role of Version Control in CSCW Applications

    Full text link
    The workshop entitled "The Role of Version Control in Computer Supported Cooperative Work Applications" was held on September 10, 1995 in Stockholm, Sweden in conjunction with the ECSCW'95 conference. Version control, the ability to manage relationships between successive instances of artifacts, organize those instances into meaningful structures, and support navigation and other operations on those structures, is an important problem in CSCW applications. It has long been recognized as a critical issue for inherently cooperative tasks such as software engineering, technical documentation, and authoring. The primary challenge for versioning in these areas is to support opportunistic, open-ended design processes requiring the preservation of historical perspectives in the design process, the reuse of previous designs, and the exploitation of alternative designs. The primary goal of this workshop was to bring together a diverse group of individuals interested in examining the role of versioning in Computer Supported Cooperative Work. Participation was encouraged from members of the research community currently investigating the versioning process in CSCW as well as application designers and developers who are familiar with the real-world requirements for versioning in CSCW. Both groups were represented at the workshop resulting in an exchange of ideas and information that helped to familiarize developers with the most recent research results in the area, and to provide researchers with an updated view of the needs and challenges faced by application developers. In preparing for this workshop, the organizers were able to build upon the results of their previous one entitled "The Workshop on Versioning in Hypertext" held in conjunction with the ECHT'94 conference. The following section of this report contains a summary in which the workshop organizers report the major results of the workshop. The summary is followed by a section that contains the position papers that were accepted to the workshop. The position papers provide more detailed information describing recent research efforts of the workshop participants as well as current challenges that are being encountered in the development of CSCW applications. A list of workshop participants is provided at the end of the report. The organizers would like to thank all of the participants for their contributions which were, of course, vital to the success of the workshop. We would also like to thank the ECSCW'95 conference organizers for providing a forum in which this workshop was possible

    The mediated data integration (MeDInt) : An approach to the integration of database and legacy systems

    Get PDF
    The information required for decision making by executives in organizations is normally scattered across disparate data sources including databases and legacy systems. To gain a competitive advantage, it is extremely important for executives to be able to obtain one unique view of information in an accurate and timely manner. To do this, it is necessary to interoperate multiple data sources, which differ structurally and semantically. Particular problems occur when applying traditional integration approaches, for example, the global schema needs to be recreated when the component schema has been modified. This research investigates the following heterogeneities between heterogeneous data sources: Data Model Heterogeneities, Schematic Heterogeneities and Semantic Heterogeneities. The problems of existing integration approaches are reviewed and solved by introducing and designing a new integration approach to logically interoperate heterogeneous data sources and to resolve three previously classified heterogeneities. The research attempts to reduce the complexity of the integration process by maximising the degree of automation. Mediation and wrapping techniques are employed in this research. The Mediated Data Integration (MeDint) architecture has been introduced to integrate heterogeneous data sources. Three major elements, the MeDint Mediator, wrappers, and the Mediated Data Model (MDM) play important roles in the integration of heterogeneous data sources. The MeDint Mediator acts as an intermediate layer transforming queries to sub-queries, resolving conflicts, and consolidating conflict-resolved results. Wrappers serve as translators between the MeDint Mediator and data sources. Both the mediator and wrappers arc well-supported by MDM, a semantically-rich data model which can describe or represent heterogeneous data schematically and semantically. Some organisational information systems have been tested and evaluated using the MeDint architecture. The results have addressed all the research questions regarding the interoperability of heterogeneous data sources. In addition, the results also confirm that the Me Dint architecture is able to provide integration that is transparent to users and that the schema evolution does not affect the integration

    Ontological View-driven Semantic Integration in Open Environments

    Get PDF
    In an open computing environment, such as the World Wide Web or an enterprise Intranet, various information systems are expected to work together to support information exchange, processing, and integration. However, information systems are usually built by different people, at different times, to fulfil different requirements and goals. Consequently, in the absence of an architectural framework for information integration geared toward semantic integration, there are widely varying viewpoints and assumptions regarding what is essentially the same subject. Therefore, communication among the components supporting various applications is not possible without at least some translation. This problem, however, is much more than a simple agreement on tags or mappings between roughly equivalent sets of tags in related standards. Industry-wide initiatives and academic studies have shown that complex representation issues can arise. To deal with these issues, a deep understanding and appropriate treatment of semantic integration is needed. Ontology is an important and widely accepted approach for semantic integration. However, usually there are no explicit ontologies with information systems. Rather, the associated semantics are implied within the supporting information model. It reflects a specific view of the conceptualization that is implicitly defining an ontological view. This research proposes to adopt ontological views to facilitate semantic integration for information systems in open environments. It proposes a theoretical foundation of ontological views, practical assumptions, and related solutions for research issues. The proposed solutions mainly focus on three aspects: the architecture of a semantic integration enabled environment, ontological view modeling and representation, and semantic equivalence relationship discovery. The solutions are applied to the collaborative intelligence project for the collaborative promotion / advertisement domain. Various quality aspects of the solutions are evaluated and future directions of the research are discussed

    An extensible view system for supporting the integration and interoperation of heterogeneous, autonomous, and distributed database management systems

    Get PDF
    In this thesis the problem of integrating heterogeneous, autonomous and distributed database management systems (DBMSs) is addressed. To provide a solution, we have developed an approach, a design method, and a view system. Our approach is based on the invention of the abstract view constructs that have uniform and stable representations for supporting semantic relativism and distributed abstraction modeling. Our design method applies object-oriented techniques and software engineering concepts to manage the system complexity. Our view system has been constructed upon established experience with the development of large-scale distributed systems in a distributed object infrastructure provided by the Common Object Request Broker Architecture (CORBA). The scope of our research identifies the goals of Project Zeus in which we have created the Zeus View Mechanism ( ZVM) as the theoretical foundation of our approach. The notion of frameworks has been introduced as part of our design methodology to promote code/design reuse and enhance the portability/extensibility of the architectural design. A multidatabase system, the Zeus Multidatabase System ( ZMS), has provided a test bed for our concept. Project Zeus has exciting prospects. The foundation established in this research has created new directions in multidatabase research and will have a significant impact on future integration and interoperation technologies
    corecore