Search CORE

3,858 research outputs found

Quality measures for ETL processes: from goals to implementation

Author: Akkaoui
Batini
Bellatreche
Brereton
Bresciani
Chung
Dustdar
Frakes
Gill
Giorgini
Horkoff
Horkoff
Horkoff
Jarke
Jarke
Jogalekar
Kitchenham
Lamsweerde
Leite
Naumann
Romero
Simitsis
Sánchez-González
Thiele
Yu
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

Author: Mazón Jose Norberto
Muñoz Lilia
Trujillo Juan
Publication venue
Publication date: 11/01/2010
Field of study

In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures.In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures

Repositorio Institucional de la Universidad Tecnológica de Panamá

A quality-aware spatial data warehouse for querying hydroecological data

Author: Berrahou Soumia Lilia
Berti-Équille Laure
Bimonte Sandro
Bringay Sandra
Cernesson Flavie
Grac Corinne
Ienco Dino
Lalande Nathalie
Le Ber Florence
Molla Guilhem
Serrano Eva
Teisseire Maguelonne
Publication venue: 'Elsevier BV'
Publication date: 01/12/2015
Field of study

International audienceAddressing data quality issues in information systems remains a challenging task. Many approaches only tackle this issue at the extract, transform and load steps. Here we define a comprehensive method to gain greater insight into data quality characteristics within data warehouse. Our novel architecture was implemented for an hydroecological case study where massive French watercourse sampling data are collected. The method models and makes effective use of spatial, thematic and temporal accuracy, consistency and completeness for multidimensional data in order to offer analysts a âdata qualityâ oriented framework. The results obtained in experiments carried out on the SaÃ´ne River dataset demonstrated the relevance of our approac

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD

Hal-Diderot

An Integrated Framework to Assess ‘Leanness’ Performance in Distribution Centres

Author: Mahfouz Amr
Publication venue: Technological University Dublin
Publication date: 01/01/2011
Field of study

The theory behind lean philosophy is to create more value with less. Effective lean management enables organisations to exceed customer expectations while reducing costs. Despite the fact that numerous practices and approaches are used in the process of implementing lean philosophy and reducing waste within supply chain systems, little effort has been directed into assessing the leanness level of distribution and its impact on overall performance. Given the vital role of distribution units within supply chains, this research aims to develop a comprehensive lean assessment framework that integrates a selected set of statistical, analytical, and mathematical techniques in order to assess the ‘leanness’ level in the distribution business. Due to the limited number of published articles in the area of lean distribution, there are no clear definitions of the underlying factors and practices. Therefore, the primary phase of the proposed framework addresses the identification of lean distribution dimensional structure and practices. The other two phases of the framework discuss the development of a structured model for lean distribution and address the process to find a quantitative lean index for benchmarking lean implementation in distribution centres. Integrating the three phases provides the decision makers with an indicator of performance, subject to applying various lean practices. Incorporating the findings of a survey that sent to 700 distribution businesses in Ireland along with value stream mapping, modelling, simulation, and data envelopment analysis, has given the framework strength in the assessment of leanness. Research outcomes show that lean distribution consists of five key dimensions; workforce management, item replenishment, customers, transportation, and process quality. Lean practices associated with these dimensions are mainly focused on enhancing the communication channels with customers, simplifying the distribution networks structure, people participating in problem solving and a continuous improvement process, and increasing the reliability and efficiency of the distribution operations. The final output of the framework is two key leanness indices; one is set to measure the tactical leanness level, while the second index represents the leanness at the operational level. Both indices can effectively be used in evaluating the lean implementation process and conducting a benchmarking process based on the leanness level

Arrow@TUDublin

Energy and Carbon Dioxide Impacts from Lean Logistics and Retailing Systems: A Discrete-event Simulation Approach for the Consumer Goods Industry

Author
Publication venue
Publication date: 01/01/2011
Field of study

abstract: Consumer goods supply chains have gradually incorporated lean manufacturing principles to identify and reduce non-value-added activities. Companies implementing lean practices have experienced improvements in cost, quality, and demand responsiveness. However certain elements of these practices, especially those related to transportation and distribution may have detrimental impact on the environment. This study asks: What impact do current best practices in lean logistics and retailing have on environmental performance? The research hypothesis of this dissertation establishes that lean distribution of durable and consumable goods can result in an increased amount of carbon dioxide emissions, leading to climate change and natural resource depletion impacts, while lean retailing operations can reduce carbon emissions. Distribution and retailing phases of the life cycle are characterized in a two-echelon supply chain discrete-event simulation modeled after current operations from leading organizations based in the U.S. Southwest. By conducting an overview of critical sustainability issues and their relationship with consumer products, it is possible to address the environmental implications of lean logistics and retailing operations. Provided the waste reduction nature from lean manufacturing, four lean best practices are examined in detail in order to formulate specific research propositions. These propositions are integrated into an experimental design linking annual carbon dioxide equivalent emissions to: (1) shipment frequency between supply chain partners, (2) proximity between decoupling point of products and final customers, (3) inventory turns at the warehousing level, and (4) degree of supplier integration. All propositions are tested through the use of the simulation model. Results confirmed the four research propositions. Furthermore, they suggest synergy between product shipment frequency among supply chain partners and product management due to lean retailing practices. In addition, the study confirms prior research speculations about the potential carbon intensity from transportation operations subject to lean principles.Dissertation/ThesisPh.D. Sustainability 201

ASU Digital Repository

Adding semantic modules to improve goal-oriented analysis of data warehouses using I-star

Author: Franch Xavier
Maté Alejandro
Trujillo Juan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The success rate of data warehouse (DW) development is improved by performing a requirements elicitation stage in which the users’ needs are modeled. Currently, among the different proposals for modeling requirements, there is a special focus on goal-oriented models, and in particular on the i* framework. In order to adapt this framework for DW development, we previously developed a UML profile for DWs. However, as the general i* framework, the proposal lacks modularity. This has a specially negative impact for DW development, since DW requirement models tend to include a huge number of elements with crossed relationships between them. In turn, the readability of the models is decreased, harming their utility and increasing the error rate and development time. In this paper, we propose an extension of our i* profile for DWs considering the modularization of goals. We provide a set of guidelines in order to correctly apply our proposal. Furthermore, we have performed an experiment in order to assess the validity our proposal. The benefits of our proposal are an increase in the modularity and scalability of the models which, in turn, increases the error correction capability, and makes complex models easier to understand by DW developers and non expert users.This work has been partially supported by the ProS-Req (TIN2010-19130-C02-01) and by the MESOLAP (TIN2010-14860) and SERENIDAD (PEII-11-0327-7035) projects from the Spanish Ministry of Education and the Junta de Comunidades de Castilla La Mancha respectively. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Dimensional enrichment of statistical linked open data

Author: Bach Pedersen Torben
Etcheverry Lorena
Romero Moral Óscar
Thomsen Christian
Vaisman Alejandro
Varga Jovan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

VBN

Managing Warehouse Utilization: An Analysis of Key Warehouse Resources

Author: Brazhkin Vitaly
Publication venue: ScholarWorks@UARK
Publication date: 01/12/2014
Field of study

The warehousing industry is extremely important to businesses and the economy as a whole, and while there is a great deal of literature exploring individual operations within warehouses, such as warehouse layout and design, order picking, etc., there is very little literature exploring warehouse operations from a systems approach. This study uses the Theory of Constraints (TOC) to develop a focused resource management approach to increasing warehouse capacity and throughput, and thus overall warehouse performance, in an environment of limited warehouse resources. While TOC was originally developed for reducing operational bottlenecks in manufacturing, it has allowed companies in other industries, such as banking, health care, and the military, to save millions of dollars. However, the use of TOC has been limited to case studies and individual situations, which typically are not generalizable. Since the basic steps of TOC are iterative in nature and were not designed for survey research, modifications to the original theory are necessary in order to provide insight into industry-wide problems. This study further develops TOC\u27s logistics paradigm and modifies it for use with survey data, which was collected from a sample of warehouse managers. Additionally, it provides a process for identifying potentially constrained key warehouse resources, which served as a foundation of this study. The findings of the study confirm that TOC\u27s methods of focused resource capacity management and goods flow scheduling coordination with supply chain partners can be an important approach for warehouse managers to use in overcoming resource capacity constraints to increase warehouse performance

ScholarWorks@UARK

UARK (University of Arkansas )

Synthesis of Optimization and Simulation for Multi-Period Supply Chain Planning with Consideration of Risks

Author: XU LIANG
Publication venue: IRL @ UMSL
Publication date: 12/12/2016
Field of study

Solutions to deterministic optimizing models for supply chains can be very sensitive to the formulation of the objective function and the choice of planning horizon. We illustrate how multi-period optimizing models may be counterproductive if traditional accounting of revenue and costs is performed and planning occurs with too short a planning horizon. We propose a “value added” complement to traditional financial accounting that allows planning to occur with shorter horizons than previously thought necessary. This dissertation presents a simulation model with an embedded optimizer that can help organizations develop strategies that minimize expected costs or maximize expected contributions to profit while maintaining a designated level of service. Plans are developed with a deterministic optimizing model and each of the decisions for the first period in the planning horizon are implemented within the simulator. Random deviations in demands and in upstream and downstream shipping times are imposed and the state of the system is updated at the end of each simulated period of activity. This process continues iteratively for a chosen number of periods (90 days for this research). Multiple replications are performed using unique random number seeds for each replication. The simulation model generates detailed event logs for each period of simulated activity that are used to analyze supply-chain performance and supply-chain risk. Supply-chain performance is measured with eleven key performance indicators that reveal system behavior at the overall supply-chain level, as well as performance related to individual plants, warehouses, and products. There are three key findings from this research. First, a value-added complement in an optimization model’s objective function can allow planning to occur effectively with a significantly shorter horizon than required when traditional accounting of costs and revenues is employed. Second, solutions with the value-added complement are robust for situations where supply-chain disruptions cause unexpected depletions in inventories at production facilities and warehouses. Third, ceteris paribus, the hybrid multi-period planning approach generates solutions with higher service levels for products with greater revenue per average production-minute, shorter average upstream lead times, and lower coefficients of variation for daily demand

University of Missouri, St. Louis