4 research outputs found

    A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

    Get PDF
    In todayā€™s scenario, Extractionā€“transformationā€“ loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now ā€“a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of ā€“the- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse

    Data quality management in a business intelligence environment : from the lens of metadata

    Full text link
    Business Intelligence is becoming more pervasive in many large and medium-sized organisations. Being a long term undertaking Business Intelligence raises many issues that an organisation has to deal with in order to improve its decision making processes. Data quality is one of the main issues exposed by Business Intelligence. Within the organisation data quality can affect attitudes to Business Intelligence itself, especially from the business users group. Comprehensive management of data quality is a crucial part of any Business Intelligence endeavour. It is important to address all types of data quality issues and come up with an all-in-one solution. We believe that extensive metadata infrastructure is the primary technical solution for management of data quality in Business Intelligence. Moreover, metadata has a more broad application for improving the Business Intelligence environment. Upon identifying the sources of data quality issues in Business Intelligence we propose a concept of data quality management by means of metadata framework and discuss the recommended solution.<br /

    Općeniti postupak za i ntegracijsko testiranje ETL procedura

    Get PDF
    In order to attain a certain degree of conļ¬dence in the quality of the data in the data warehouse it is necessary to perform a series of tests. There are many components (and aspects) of the data warehouse that can be tested, and in this paper we focus on the ETL procedures. Due to the complexity of ETL process, ETL procedure tests are usually custom written, having a very low level of reusability. In this paper we address this issue and work towards establishing a generic procedure for integration testing of certain aspects of ETL procedures. In this approach, ETL procedures are treated as a black box and are tested by comparing their inputs and outputs ā€“ datasets. Datasets from three locations are compared: datasets from the relational source(s), datasets from the staging area and datasets from the data warehouse. Proposed procedure is generic and can be implemented on any data warehouse employing dimensional model and having relational database(s) as a source. Our work pertains only to certain aspects of data quality problems that can be found in DW systems. It provides a basic testing foundation or augments existing data warehouse systemā€™s testing capabilities. We comment on proposed mechanisms both in terms of full reload and incremental loading.Kako bi se ostvarila određena razina povjerenja u kvalitetu podataka potrebno je obaviti niz provjera. Postoje brojne komponente (i aspekti) skladiÅ”ta podataka koji se mogu testirati. U ovom radu smo se usredotočili na testiranje ETL procedura. S obzirom na složenost sustava skladiÅ”ta podataka, testovi ETL procedura se piÅ”u posebno za svako skladiÅ”te podataka i rijetko se mogu ponovo upotrebljavati. Ovdje se obrađuje taj problem i predlaže općenita procedura za integracijsko testiranje određ enih aspekata ETL procedura. Predloženi pristup tretira ETL procedure kao crnu kutiju, te se procedure testiraju tako Å”to se uspoređuju ulazni i izlazni skupovi podataka. Uspoređuju se skupovi podataka s tri lokacije: podaci iz izvoriÅ”ta podataka, podaci iz konsolidiranog pripremnog područja te podaci iz skladiÅ”ta podataka. Predložena procedura je općenita i može se primijeniti na bilo koje skladiÅ”te podatka koje koristi dimenzijski model pri čemu podatke dobavlja iz relacijskih baza podataka. Predložene provjere se odnose samo na određene aspekte problema kvalitete podataka koji se mogu pojaviti u sustavu skladiÅ”ta podataka, te služe za uspostavljanje osnovnog skupa provjera ili uvećanje mogućnosti provjere postojećih sustava. Predloženi postupak se komentira u kontekstu potpunog i inkrementalnog učitavanja podataka u skladiÅ”te podataka
    corecore