4 research outputs found
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
In todayās scenario, Extractionātransformationā loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now āa-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of āthe- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse
Data quality management in a business intelligence environment : from the lens of metadata
Business Intelligence is becoming more pervasive in many large and medium-sized organisations. Being a long term undertaking Business Intelligence raises many issues that an organisation has to deal with in order to improve its decision making processes. Data quality is one of the main issues exposed by Business Intelligence. Within the organisation data quality can affect attitudes to Business Intelligence itself, especially from the business users group. Comprehensive management of data quality is a crucial part of any Business Intelligence endeavour. It is important to address all types of data quality issues and come up with an all-in-one solution. We believe that extensive metadata infrastructure is the primary technical solution for management of data quality in Business Intelligence. Moreover, metadata has a more broad application for improving the Business Intelligence environment. Upon identifying the sources of data quality issues in Business Intelligence we propose a concept of data quality management by means of metadata framework and discuss the recommended solution.<br /
OpÄeniti postupak za i ntegracijsko testiranje ETL procedura
In order to attain a certain degree of conļ¬dence in the quality of the data in the data warehouse it is necessary to perform a series of tests. There are many components (and aspects) of the data warehouse that can be tested, and in this paper we focus on the ETL procedures. Due to the complexity of ETL process, ETL procedure tests are usually custom written, having a very low level of reusability. In this paper we address this issue and work towards establishing a generic procedure for integration testing of certain aspects of ETL procedures. In this approach, ETL procedures are treated as a black box and are tested by comparing their inputs and outputs ā datasets. Datasets from three locations are compared: datasets from the relational source(s), datasets from the staging area and datasets from the data warehouse. Proposed procedure is generic and can be implemented on any data warehouse employing dimensional model and having relational database(s) as a source. Our work pertains only to certain aspects of data quality problems that can be found in DW systems. It provides a basic testing foundation or augments existing data warehouse systemās testing capabilities. We comment on proposed mechanisms both in terms of full reload and incremental loading.Kako bi se ostvarila odreÄena razina povjerenja u kvalitetu podataka potrebno je obaviti niz provjera. Postoje brojne komponente (i aspekti) skladiÅ”ta podataka koji se mogu testirati. U ovom radu smo se usredotoÄili na testiranje ETL procedura. S obzirom na složenost sustava skladiÅ”ta podataka, testovi ETL procedura se piÅ”u posebno za svako skladiÅ”te podataka i rijetko se mogu ponovo upotrebljavati. Ovdje se obraÄuje taj problem i predlaže opÄenita procedura za integracijsko testiranje odreÄ enih aspekata ETL procedura. Predloženi pristup tretira ETL procedure kao crnu kutiju, te se procedure testiraju tako Å”to se usporeÄuju ulazni i izlazni skupovi podataka. UsporeÄuju se skupovi podataka s tri lokacije: podaci iz izvoriÅ”ta podataka, podaci iz konsolidiranog pripremnog podruÄja te podaci iz skladiÅ”ta podataka. Predložena procedura je opÄenita i može se primijeniti na bilo koje skladiÅ”te podatka koje koristi dimenzijski model pri Äemu podatke dobavlja iz relacijskih baza podataka. Predložene provjere se odnose samo na odreÄene aspekte problema kvalitete podataka koji se mogu pojaviti u sustavu skladiÅ”ta podataka, te služe za uspostavljanje osnovnog skupa provjera ili uveÄanje moguÄnosti provjere postojeÄih sustava. Predloženi postupak se komentira u kontekstu potpunog i inkrementalnog uÄitavanja podataka u skladiÅ”te podataka