3,670 research outputs found

    Distributed Heterogeneous Relational Data Warehouse In A Grid Environment

    Get PDF
    This paper examines how a "Distributed Heterogeneous Relational Data Warehouse" can be integrated in a Grid environment that will provide physicists with efficient access to large and small object collections drawn from databases at multiple sites. This paper investigates the requirements of Grid-enabling such a warehouse, and explores how these requirements may be met by extensions to existing Grid middleware. We present initial results obtained with a working prototype warehouse of this kind using both SQLServer and Oracle9i, where a Grid-enabled web-services interface makes it easier for web-applications to access the distributed contents of the databases securely. Based on the success of the prototype, we proposes a framework for using heterogeneous relational data warehouse through the web-service interface and create a single "Virtual Database System" for users. The ability to transparently access data in this way, as shown in prototype, is likely to be a very powerful facility for HENP and other grid users wishing to collate and analyze information distributed over Grid.Comment: 4 pages, 6 figure

    Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

    Get PDF
    Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

    Full text link
    Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager ETL process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, Hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Obidos, and evaluate it in the context of data sharing for medical research. Obidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD Journa

    Heterogeneous data source integration for smart grid ecosystems based on metadata mining

    Get PDF
    The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de EconomĂ­a y Competitividad TEC2013-40767-
    • …
    corecore