39,055 research outputs found

    A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

    Get PDF
    In today’s scenario, Extraction–transformation– loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of –the- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse

    Improvement of information support for formation of management reporting on the example of the activities of an energy company

    Get PDF
    The purpose of the study is to develop a system for generating management reporting on the production activities of an energy company on the SAP Business Objects Platform. To build a reporting system, the following tasks has been set and solved: the conceptual and datalogical models of the data warehouse were created, data areas have been selected from the general data model, the data warehouse has been designed and developed, universes for report groups have been created, a mechanism for integrating data with the data warehouse has been implemented.The paper analyses the information and technological infrastructure of an energy company as well as formulates the basic requirements for the system being created for generating reporting. Two main subsystems have been designed: data storage and integration. The process of implementing the designed subsystems in physical form has been described using the appropriate software products: SAP Hana, SAP Universe Designer, SAP Data Services, SAP Business Intelligence.Thanks to the configured system through the corporate data bus, a relatively simple data integration mechanism became possible. In its information and technological architecture, the company managed to simulate an acceptable data warehouse model and set up the appropriate data flows. A complex data warehouse model has been implemented, and a convenient platform for further data processing has been provided. A clear data integration scheme is configured using SAP Data Services, with the ability to scale and configure the data loading schedule. The developed system has been put into operation and is used by employees to make management decisions within the framework of their professional activities

    Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

    Get PDF
    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)
    • …
    corecore