3,858 research outputs found

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

    Get PDF
    In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures.In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures

    A quality-aware spatial data warehouse for querying hydroecological data

    Get PDF
    International audienceAddressing data quality issues in information systems remains a challenging task. Many approaches only tackle this issue at the extract, transform and load steps. Here we define a comprehensive method to gain greater insight into data quality characteristics within data warehouse. Our novel architecture was implemented for an hydroecological case study where massive French watercourse sampling data are collected. The method models and makes effective use of spatial, thematic and temporal accuracy, consistency and completeness for multidimensional data in order to offer analysts a âdata qualityâ oriented framework. The results obtained in experiments carried out on the Saône River dataset demonstrated the relevance of our approac

    An Integrated Framework to Assess ‘Leanness’ Performance in Distribution Centres

    Get PDF
    The theory behind lean philosophy is to create more value with less. Effective lean management enables organisations to exceed customer expectations while reducing costs. Despite the fact that numerous practices and approaches are used in the process of implementing lean philosophy and reducing waste within supply chain systems, little effort has been directed into assessing the leanness level of distribution and its impact on overall performance. Given the vital role of distribution units within supply chains, this research aims to develop a comprehensive lean assessment framework that integrates a selected set of statistical, analytical, and mathematical techniques in order to assess the ‘leanness’ level in the distribution business. Due to the limited number of published articles in the area of lean distribution, there are no clear definitions of the underlying factors and practices. Therefore, the primary phase of the proposed framework addresses the identification of lean distribution dimensional structure and practices. The other two phases of the framework discuss the development of a structured model for lean distribution and address the process to find a quantitative lean index for benchmarking lean implementation in distribution centres. Integrating the three phases provides the decision makers with an indicator of performance, subject to applying various lean practices. Incorporating the findings of a survey that sent to 700 distribution businesses in Ireland along with value stream mapping, modelling, simulation, and data envelopment analysis, has given the framework strength in the assessment of leanness. Research outcomes show that lean distribution consists of five key dimensions; workforce management, item replenishment, customers, transportation, and process quality. Lean practices associated with these dimensions are mainly focused on enhancing the communication channels with customers, simplifying the distribution networks structure, people participating in problem solving and a continuous improvement process, and increasing the reliability and efficiency of the distribution operations. The final output of the framework is two key leanness indices; one is set to measure the tactical leanness level, while the second index represents the leanness at the operational level. Both indices can effectively be used in evaluating the lean implementation process and conducting a benchmarking process based on the leanness level

    Energy and Carbon Dioxide Impacts from Lean Logistics and Retailing Systems: A Discrete-event Simulation Approach for the Consumer Goods Industry

    Get PDF
    abstract: Consumer goods supply chains have gradually incorporated lean manufacturing principles to identify and reduce non-value-added activities. Companies implementing lean practices have experienced improvements in cost, quality, and demand responsiveness. However certain elements of these practices, especially those related to transportation and distribution may have detrimental impact on the environment. This study asks: What impact do current best practices in lean logistics and retailing have on environmental performance? The research hypothesis of this dissertation establishes that lean distribution of durable and consumable goods can result in an increased amount of carbon dioxide emissions, leading to climate change and natural resource depletion impacts, while lean retailing operations can reduce carbon emissions. Distribution and retailing phases of the life cycle are characterized in a two-echelon supply chain discrete-event simulation modeled after current operations from leading organizations based in the U.S. Southwest. By conducting an overview of critical sustainability issues and their relationship with consumer products, it is possible to address the environmental implications of lean logistics and retailing operations. Provided the waste reduction nature from lean manufacturing, four lean best practices are examined in detail in order to formulate specific research propositions. These propositions are integrated into an experimental design linking annual carbon dioxide equivalent emissions to: (1) shipment frequency between supply chain partners, (2) proximity between decoupling point of products and final customers, (3) inventory turns at the warehousing level, and (4) degree of supplier integration. All propositions are tested through the use of the simulation model. Results confirmed the four research propositions. Furthermore, they suggest synergy between product shipment frequency among supply chain partners and product management due to lean retailing practices. In addition, the study confirms prior research speculations about the potential carbon intensity from transportation operations subject to lean principles.Dissertation/ThesisPh.D. Sustainability 201

    Adding semantic modules to improve goal-oriented analysis of data warehouses using I-star

    Get PDF
    The success rate of data warehouse (DW) development is improved by performing a requirements elicitation stage in which the users’ needs are modeled. Currently, among the different proposals for modeling requirements, there is a special focus on goal-oriented models, and in particular on the i* framework. In order to adapt this framework for DW development, we previously developed a UML profile for DWs. However, as the general i* framework, the proposal lacks modularity. This has a specially negative impact for DW development, since DW requirement models tend to include a huge number of elements with crossed relationships between them. In turn, the readability of the models is decreased, harming their utility and increasing the error rate and development time. In this paper, we propose an extension of our i* profile for DWs considering the modularization of goals. We provide a set of guidelines in order to correctly apply our proposal. Furthermore, we have performed an experiment in order to assess the validity our proposal. The benefits of our proposal are an increase in the modularity and scalability of the models which, in turn, increases the error correction capability, and makes complex models easier to understand by DW developers and non expert users.This work has been partially supported by the ProS-Req (TIN2010-19130-C02-01) and by the MESOLAP (TIN2010-14860) and SERENIDAD (PEII-11-0327-7035) projects from the Spanish Ministry of Education and the Junta de Comunidades de Castilla La Mancha respectively. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    Dimensional enrichment of statistical linked open data

    Get PDF
    On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.Peer ReviewedPostprint (author's final draft

    Managing Warehouse Utilization: An Analysis of Key Warehouse Resources

    Get PDF
    The warehousing industry is extremely important to businesses and the economy as a whole, and while there is a great deal of literature exploring individual operations within warehouses, such as warehouse layout and design, order picking, etc., there is very little literature exploring warehouse operations from a systems approach. This study uses the Theory of Constraints (TOC) to develop a focused resource management approach to increasing warehouse capacity and throughput, and thus overall warehouse performance, in an environment of limited warehouse resources. While TOC was originally developed for reducing operational bottlenecks in manufacturing, it has allowed companies in other industries, such as banking, health care, and the military, to save millions of dollars. However, the use of TOC has been limited to case studies and individual situations, which typically are not generalizable. Since the basic steps of TOC are iterative in nature and were not designed for survey research, modifications to the original theory are necessary in order to provide insight into industry-wide problems. This study further develops TOC\u27s logistics paradigm and modifies it for use with survey data, which was collected from a sample of warehouse managers. Additionally, it provides a process for identifying potentially constrained key warehouse resources, which served as a foundation of this study. The findings of the study confirm that TOC\u27s methods of focused resource capacity management and goods flow scheduling coordination with supply chain partners can be an important approach for warehouse managers to use in overcoming resource capacity constraints to increase warehouse performance

    Synthesis of Optimization and Simulation for Multi-Period Supply Chain Planning with Consideration of Risks

    Get PDF
    Solutions to deterministic optimizing models for supply chains can be very sensitive to the formulation of the objective function and the choice of planning horizon. We illustrate how multi-period optimizing models may be counterproductive if traditional accounting of revenue and costs is performed and planning occurs with too short a planning horizon. We propose a “value added” complement to traditional financial accounting that allows planning to occur with shorter horizons than previously thought necessary. This dissertation presents a simulation model with an embedded optimizer that can help organizations develop strategies that minimize expected costs or maximize expected contributions to profit while maintaining a designated level of service. Plans are developed with a deterministic optimizing model and each of the decisions for the first period in the planning horizon are implemented within the simulator. Random deviations in demands and in upstream and downstream shipping times are imposed and the state of the system is updated at the end of each simulated period of activity. This process continues iteratively for a chosen number of periods (90 days for this research). Multiple replications are performed using unique random number seeds for each replication. The simulation model generates detailed event logs for each period of simulated activity that are used to analyze supply-chain performance and supply-chain risk. Supply-chain performance is measured with eleven key performance indicators that reveal system behavior at the overall supply-chain level, as well as performance related to individual plants, warehouses, and products. There are three key findings from this research. First, a value-added complement in an optimization model’s objective function can allow planning to occur effectively with a significantly shorter horizon than required when traditional accounting of costs and revenues is employed. Second, solutions with the value-added complement are robust for situations where supply-chain disruptions cause unexpected depletions in inventories at production facilities and warehouses. Third, ceteris paribus, the hybrid multi-period planning approach generates solutions with higher service levels for products with greater revenue per average production-minute, shorter average upstream lead times, and lower coefficients of variation for daily demand
    corecore