6 research outputs found

    A distributed tree data structure for real-time OLAP on cloud architectures

    Get PDF
    In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as 'report the total sales in all stores located in California and New York during the months February-May of all years'. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response

    CloudTree: A Library to Extend Cloud Services for Trees

    Full text link
    In this work, we propose a library that enables on a cloud the creation and management of tree data structures from a cloud client. As a proof of concept, we implement a new cloud service CloudTree. With CloudTree, users are able to organize big data into tree data structures of their choice that are physically stored in a cloud. We use caching, prefetching, and aggregation techniques in the design and implementation of CloudTree to enhance performance. We have implemented the services of Binary Search Trees (BST) and Prefix Trees as current members in CloudTree and have benchmarked their performance using the Amazon Cloud. The idea and techniques in the design and implementation of a BST and prefix tree is generic and thus can also be used for other types of trees such as B-tree, and other link-based data structures such as linked lists and graphs. Preliminary experimental results show that CloudTree is useful and efficient for various big data applications

    The challenges of extract, transform and load (ETL) for data integration in near real-time environment

    Get PDF
    Organization with considerable investment into data warehousing, the influx of various data types and forms require certain ways of prepping data and staging platform that support fast, efficient and volatile data to reach its targeted audiences or users of different business needs. Extract, Transform and Load (ETL) system proved to be a choice standard for managing and sustaining the movement and transactional process of the valued big data assets. However, traditional ETL system can no longer accommodate and effectively handle streaming or near real-time data and stimulating environment which demands high availability, low latency and horizontal scalability features for functionality. This paper identifies the challenges of implementing ETL system for streaming or near real-time data which needs to evolve and streamline itself with the different requirements. Current efforts and solution approaches to address the challenges are presented. The classification of ETL system challenges are prepared based on near real-time environment features and ETL stages to encourage different perspectives for future research

    A SYSTEMATIC REVIEW IDENTIFIES A LACK OF STANDARDIZATION IN OLAP QUERIES ON CLOUD COMPUTING

    Get PDF
    Due to storage and processing capacity of cloud computing, OLAP (OnLine Analitical Processing) systems have found a suitable environment in order to minimize issues of data storage and processing performance. Objective: This paper proposes to identify what guideline has been standardizing the implementation of OLAP query languages applied on cloud computing. Method: A systematic review was developed with three researchers, through Internet, whose method contains research question, search strategy, inclusion and exclusion criteria, selection process, data extraction and synthesis. Result: The result points to use of different OLAP query syntaxes and mechanisms, proprietary or not, meaning the lack of standardization. Conclusion: Considering the importance of standardization in any technology field, which is spurred by several international organizations worldwide, this paper identifies the necessity of works that propose a standardization to guide the building of OLAP query languages on cloud computing

    A SYSTEMATIC REVIEW IDENTIFIES A LACK OF STANDARDIZATION IN OLAP QUERIES ON CLOUD COMPUTING

    Get PDF
    Due to storage and processing capacity of cloud computing, OLAP (OnLine Analitical Processing) systems have found a suitable environment in order to minimize issues of data storage and processing performance. Objective: This paper proposes to identify what guideline has been standardizing the implementation of OLAP query languages applied on cloud computing. Method: A systematic review was developed with three researchers, through Internet, whose method contains research question, search strategy, inclusion and exclusion criteria, selection process, data extraction and synthesis. Result: The result points to use of different OLAP query syntaxes and mechanisms, proprietary or not, meaning the lack of standardization. Conclusion: Considering the importance of standardization in any technology field, which is spurred by several international organizations worldwide, this paper identifies the necessity of works that propose a standardization to guide the building of OLAP query languages on cloud computing

    Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)

    Get PDF
    Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data are usually multi-variant streams that are heterogeneous, sporadic, multi-modal, and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types, and qualities. Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery, and ranking mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data. However, the existing IoT data indexing and discovery approaches are complex or centralised, which hinders their scalability. The primary objective of this article is to provide a holistic overview of the state-of-the-art on indexing, discovery, and ranking of IoT data. The article aims to pave the way for researchers to design, develop, implement, and evaluate techniques and approaches for on-line large-scale distributed IoT applications and services
    corecore