20 research outputs found

    A Technique For Timeliness Measurement In Information Manufacturing System (IMS)

    Get PDF
    Timeliness is one of the major dimensions in the field of data quality. Freshness or obsoleteness of data is determined by timeliness data quality dimension. Generally, timeliness is calculated by currency and volatility. Currency is calculated by age, delivery time and input time. On the other side, volatility of data is the duration of the validity of data. Currency and volatility of IMS depend on the factors like refreshment period, waiting period of data in the system, expiry time of the data and the query response time for query requests. Therefore, development a technique for measuring the timeliness of data in IMS is the purpose of this paper

    Extending ETL framework using service oriented architecture

    Get PDF
    Extraction, Transformation and Loading (ETL) represent a big portion of a data warehouse project. Complexity of components extensibility is a main problem in the ETL area, because ETL components are tightly-coupled to each others in the current ETL framework.The missing extensibility feature causes impediments to add new components to the current ETL framework; to meet special business needs.This paper shows how to restructure the current ETL framework based on Service Oriented Architecture (SOA) to be easier to extend.This restructuring solution distributes the ETL into interoperable components. The distribution of Extraction, Transformation and Loading components while keeping interoperability amongst them; can be achieved by SOA.A Classified-Fragmentation component to enhance the report generation speed is added to the new framework; as a proof of the extensibility concept.The result of this work is an extensible ETL framework including Classified-Fragmentation component as an extension

    Framework for Interoperable and Distributed Extraction-Transformation-Loading (ETL) Based on Service Oriented Architecture

    Get PDF
    Extraction. Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, which is due to tightly-coupled components in the current ETL framework. This research discusses how to distribute the Extraction, Transformation and Loading components so as to achieve distribution and interoperability of these ETL components. In addition, it shows how the ETL framework can be extended. To achieve that, Service Oriented Architecture (SOA) is adopted to address the mentioned missing features of distribution and interoperability by restructuring the current ETL framework. This research contributes towards the field of ETL by adding the distribution and inter- operability concepts to the ETL framework. This Ieads to contributions towards the area of data warehousing and business intelligence, because ETL is a core concept in this area. The Design Science Approach (DSA) and Scrum methodologies were adopted for achieving the research goals. The integration of DSA and Scrum provides the suitable methods for achieving the research objectives. The new ETL framework is realized by developing and testing a prototype that is based on the new ETL framework. This prototype is successfully evaluated using three case studies that are conducted using the data and tools of three different organizations. These organizations use data warehouse solutions for the purpose of generating statistical reports that help their top management to take decisions. Results of the case studies show that distribution and interoperability can be achieved by using the new ETL framework

    An ETL Metadata Model for Data Warehousing

    Get PDF
    Metadata is essential for understanding information stored in data warehouses. It helps increase levels of adoption and usage of data warehouse data by knowledge workers and decision makers. A metadata model is important to the implementation of a data warehouse; the lack of a metadata model can lead to quality concerns about the data warehouse. A highly successful data warehouse implementation depends on consistent metadata. This article proposes adoption of an ETL (extracttransform-load) metadata model for the data warehouse that makes subject area refreshes metadata-driven, loads observation timestamps and other useful parameters, and minimizes consumption of database systems resources. The ETL metadata model provides developers with a set of ETL development tools and delivers a user-friendly batch cycle refresh monitoring tool for the production support team

    Feature Influence Based ETL for Efficient Big Data Management

    Get PDF
    The increased volume of big data introduces various challenges for its maintenance and analysis. There exist various approaches to the problem, but they fail to achieve the expected results. To improve the big data management performance, an efficient real time feature influence analysis based Extraction, Transform, and Loading (ETL) framework is presented in this article. The model fetches the big data and analyses the features to find noisy records by preprocessing the data set. Further, the method performs feature extraction and applies feature influence analysis to various data nodes and the data present in the data nodes. The method estimates Feature Specific Informative Influence (FSII) and Feature Specific Supportive Influence (FSSI). The value of FSII and FSSI are measured with the support of a data dictionary. The class ontology belongs to various classes of data. The value of FSII is measured according to the presence of a concrete feature on a tuple towards any data node, whereas the value of FSSI is measured based on the appearance of supportive features on any data point towards the data node. Using these measures, the method computes the Node Centric Transformation Score (NCTS). Based on the value of NCTS the method performs map reduction and merging of data nodes. The NCTS_FIA method achieves higher performance in the ETL process. By adapting feature influence analysis in big data management, the ETL performance is improved with the least amount of time complexity

    An economic order quantity stochastic dynamic optimization model in a logistic 4.0 environment

    Get PDF
    This paper proposes a stock dynamic sizing optimization under the Logistic 4.0 environment. The safety stock is conceived to fill up the demand variability, providing continuous stock availability. Logistic 4.0 and the smart factory topics are considered. It focuses on vertical integration to implement flexible and reconfigurable smart production systems using the information system integration in order to optimize material flow in a 4.0 full-service approach. The proposed methodology aims to reduce the occurring stock-out events through a link among the wear-out items rate and the downstream logistic demand. The failure rate items trend is obtained through life-cycle state detection by a curve fitting technique. Therefore, the optimal safety stock size is calculated and then validated by an auto-tuning iterative modified algorithm. In this study, the reorder time has been optimized. The case study refers to the material management of a very high-speed train

    Maintaining Internal Consistency of Report for Real-time OLAP with Layer-based View

    Get PDF
    Maintaining internal consistency of report is an important aspect in the field of real-time data warehouses. OLAP and Query tools were initially designed to operate on top of unchanging, static historical data. In a real-time environment, however, the results they produce are usually negatively influenced by data changes concurrent to query execution, which may result in some internal report inconsistency. In this paper, we propose a new method, called layer-based view approach, to appropriately and effectively maintain report data consistency. The core idea is to prevent the data involved in an OLAP query from being changed through using lock mechanism, and avoid the confliction between read and write operations with the help of layer mechanism. Our approach can effectively deal with report consistency issue, while at the same time avoiding the query contention between read and write operations under real-time OLAP environment

    Improvement of data quality with timeliness in information manufacturing system (IMS)

    Get PDF
    Nowadays in the digital world, organizations or enterprises like banks, hospitals, telecommunications or retail shops etc. has an information manufacturing system (IMS) for storing the organization’s data in digital format. Every day, a large quantity of data is manipulated (inserted, deleted and updated) to the information manufacturing system of those enterprises or organizations. To be successful, the IMS must maintain the data and transform it into useful information for decision makers or users. Much of the value will rest in the quality of the data, which may be divided into two classes; objective and time related. In seeking to maintain quality both these classes the completeness, accuracy and consistency of the data and the timeliness of the information generation may be required. As a further complication, Objective data quality class may not be independent. It could be dependent on timeliness of time related class. The main purpose of this research is the improvement of data quality with timeliness in IMS. This starts with observing the reasons for the change of objective data quality over time by using both theoretical and experimental data quality measurements. Novel approaches to ensuring the best possible information quality is developed and evaluated by observing the change of objective data quality scenario with timeliness in a purpose built IMS

    A distributed tree data structure for real-time OLAP on cloud architectures

    Get PDF
    In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as 'report the total sales in all stores located in California and New York during the months February-May of all years'. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response
    corecore