45,637 research outputs found
A comprehensive approach to data warehouse testing
Testing is an essential part of the design life-cycle of any software product. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. In this paper we introduce a number of data mart-specific testing activities, we classify them in terms of what is tested and how it is tested, and we discuss how they can be framed within a reference design methodology. Categories and Subject Descriptor
Approach for testing the extract-transform-load process in data warehouse systems, An
2018 Spring.Includes bibliographical references.Enterprises use data warehouses to accumulate data from multiple sources for data analysis and research. Since organizational decisions are often made based on the data stored in a data warehouse, all its components must be rigorously tested. In this thesis, we first present a comprehensive survey of data warehouse testing approaches, and then develop and evaluate an automated testing approach for validating the Extract-Transform-Load (ETL) process, which is a common activity in data warehousing. In the survey we present a classification framework that categorizes the testing and evaluation activities applied to the different components of data warehouses. These approaches include both dynamic analysis as well as static evaluation and manual inspections. The classification framework uses information related to what is tested in terms of the data warehouse component that is validated, and how it is tested in terms of various types of testing and evaluation approaches. We discuss the specific challenges and open problems for each component and propose research directions. The ETL process involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex one-to-one, many-to-one, and many-to-many transformations involving sources and targets that use different schemas, databases, and technologies. Since faulty implementations in any of the ETL steps can result in incorrect information in the target data warehouse, ETL processes must be thoroughly validated. In this thesis, we propose automated balancing tests that check for discrepancies between the data in the source databases and that in the target warehouse. Balancing tests ensure that the data obtained from the source databases is not lost or incorrectly modified by the ETL process. First, we categorize and define a set of properties to be checked in balancing tests. We identify various types of discrepancies that may exist between the source and the target data, and formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Next, we automatically identify source-to-target mappings from ETL transformation rules provided in the specifications. We identify one-to-one, many-to-one, and many-to-many mappings for tables, records, and attributes involved in the ETL transformations. We automatically generate test assertions to verify the properties for balancing tests. We use the source-to-target mappings to automatically generate assertions corresponding to each property. The assertions compare the data in the target data warehouse with the corresponding data in the sources to verify the properties. We evaluate our approach on a health data warehouse that uses data sources with different data models running on different platforms. We demonstrate that our approach can find previously undetected real faults in the ETL implementation. We also provide an automatic mutation testing approach to evaluate the fault finding ability of our balancing tests. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse when faulty ETL scripts execute on mock source data
HealthCare Partners: Building on a Foundation of Global Risk Management to Achieve Accountable Care
Describes the progress of a medical group and independent practice association in forming an accountable care organization by working with insurers as part of the Brookings-Dartmouth ACO Pilot Program. Lists lessons learned and elements of success
Recommended from our members
Evaluating the reading performance of semi-passive RFID tags to enhance locating of warehouse resources: An experiment design
Copyright @ 2011 8th European, Mediterranean and Middle Eastern Conference on Information Systems (EMCIS 2011)In the supply chain, a warehouse is a crucial component for linking all chain parties. It is necessary to track the real time resource location and status to support warehouse operations effectively. Therefore, RFID technology has been adopted to facilitate the collection and sharing of data in a warehouse environment. However, an essential decision should be made on the type of RFID tags the warehouse managers should adopt, because it is very important to implement RFID tags that work in warehouse environment. As a result, the warehouse resources will be easily tracked and accurately located which will improve the visibility of warehouse operations, enhance the productivity and reduce the operation costs of the warehouse. Therefore, it is crucial to evaluate the reading performance of all types of RFID tags in a warehouse environment in order to choose the most appropriate RFID tags which will enhance the operational efficiency of a warehouse. Reading performance of active and passive RFID tags have been evaluated before while, semi-passive RFID tag, which is battery-assisted with greater sensitivity than passive tags and cheaper than active tags, has not been examined yet in a warehouse environment. This research is in- progress research and it is aiming to perform tests for evaluating the reading performance of semi-passive RFID apparatus to provide an extensive RFID performance comparison for formulating an efficient RFID solution in warehousing environment
Recommended from our members
Radio frequency identification (RFID) technologies for locating warehouse resources: A conceptual framework
Copyright @ 2012 Information Technology SocietyIn the supply chain, a warehouse is a crucial component for linking all chain parties. It is necessary to track the real time resource location and status to support warehouse operations effectively. Therefore, RFID technology has been adopted to facilitate the collection and sharing of data in a warehouse environment. However, an essential decision should be made on the type of RFID tags the warehouse managers should adopt, because it is very important to implement RFID tags that work in warehouse environment. As a result, the warehouse resources will be easily tracked and accurately located which will improve the visibility of warehouse operations, enhance the productivity and reduce the operation costs of the warehouse. Therefore, it is crucial to evaluate the reading performance of all types of RFID tags in a warehouse environment in order to choose the most appropriate RFID tags which will enhance the operational efficiency of a warehouse. Reading performance of active and passive RFID tags have been evaluated before while, semi-passive RFID tag, which is battery-assisted with greater sensitivity than passive tags and cheaper than active tags, has not been examined yet in a warehouse environment. This research is in- progress research and it seeks to (i) provide a general overview of the existing real-time data management techniques in tracking warehouse resources location, (ii) provide an overall conceptual framework that can help warehouse managers to choose the best RFID technologies for a warehouse environment, (iii) Finally, the paper submits an experiment design for evaluating the reading performance of semi-passive RFID tags in a warehouse environment
High-Level Object Oriented Genetic Programming in Logistic Warehouse Optimization
DisertaÄnĂ prĂĄce je zamÄĹena na optimalizaci prĹŻbÄhu pracovnĂch operacĂ v logistickĂ˝ch skladech a distribuÄnĂch centrech. HlavnĂm cĂlem je optimalizovat procesy plĂĄnovĂĄnĂ, rozvrhovĂĄnĂ a odbavovĂĄnĂ. JelikoĹž jde o problĂŠm patĹĂcĂ do tĹĂdy sloĹžitosti NP-teĹžkĂ˝, je vĂ˝poÄetnÄ velmi nĂĄroÄnĂŠ nalĂŠzt optimĂĄlnĂ ĹeĹĄenĂ. MotivacĂ pro ĹeĹĄenĂ tĂŠto prĂĄce je vyplnÄnĂ pomyslnĂŠ mezery mezi metodami zkoumanĂ˝mi na vÄdeckĂŠ a akademickĂŠ pĹŻdÄ a metodami pouĹžĂvanĂ˝mi v produkÄnĂch komerÄnĂch prostĹedĂch. JĂĄdro optimalizaÄnĂho algoritmu je zaloĹženo na zĂĄkladÄ genetickĂŠho programovĂĄnĂ ĹĂzenĂŠho bezkontextovou gramatikou. HlavnĂm pĹĂnosem tĂŠto prĂĄce je a) navrhnout novĂ˝ optimalizaÄnĂ algoritmus, kterĂ˝ respektuje nĂĄsledujĂcĂ optimalizaÄnĂ podmĂnky: celkovĂ˝ Äas zpracovĂĄnĂ, vyuĹžitĂ zdrojĹŻ, a zahlcenĂ skladovĂ˝ch uliÄek, kterĂŠ mĹŻĹže nastat bÄhem zpracovĂĄnĂ ĂşkolĹŻ, b) analyzovat historickĂĄ data z provozu skladu a vyvinout sadu testovacĂch pĹĂkladĹŻ, kterĂŠ mohou slouĹžit jako referenÄnĂ vĂ˝sledky pro dalĹĄĂ vĂ˝zkum, a dĂĄle c) pokusit se pĹedÄit stanovenĂŠ referenÄnĂ vĂ˝sledky dosaĹženĂŠ kvalifikovanĂ˝m a trĂŠnovanĂ˝m operaÄnĂm manaĹžerem jednoho z nejvÄtĹĄĂch skladĹŻ ve stĹednĂ EvropÄ.This work is focused on the work-flow optimization in logistic warehouses and distribution centers. The main aim is to optimize process planning, scheduling, and dispatching. The problem is quite accented in recent years. The problem is of NP hard class of problems and where is very computationally demanding to find an optimal solution. The main motivation for solving this problem is to fill the gap between the new optimization methods developed by researchers in academic world and the methods used in business world. The core of the optimization algorithm is built on the genetic programming driven by the context-free grammar. The main contribution of the thesis is a) to propose a new optimization algorithm which respects the makespan, the utilization, and the congestions of aisles which may occur, b) to analyze historical operational data from warehouse and to develop the set of benchmarks which could serve as the reference baseline results for further research, and c) to try outperform the baseline results set by the skilled and trained operational manager of the one of the biggest warehouses in the middle Europe.
The Dag-Brucken ASRS Case Study
In 1996 an agreement was made between a well-known beverage manufacturer, Super-Cola Taiwan, (SCT) and a small Australian electrical engineering company, Dag-BrĂźcken ASRS Pty Ltd, (DB), to provide an automated storage and retrieval system (ASRS) facility as part of SCTâs production facilities in Asia. Recognising the potential of their innovative and technically advanced design, DB was awarded a State Premiers Export Award and was a finalist in that yearâs National Export Awards. The case tracks the development and subsequent implementation of the SCT ASRS project, setting out to highlight how the lack of appropriate IT development processes contributed to the ultimate failure of the project and the subsequent winding up of DB only one year after being honoured with these prestigious awards. The case provides compelling evidence of the types of project management incompetency that, from the literature, appears to contribute to the high failure rate in IT projects. For confidentiality reasons, the names of the principal parties are changed, but the case covers actual events documented by one of the project team members as part of his postgraduate studies, providing an example of the special mode of evidence collection that Yin (1994) calls âparticipant-observationâ
- âŚ