6 research outputs found
A distributed tree data structure for real-time OLAP on cloud architectures
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as 'report the total sales in all stores located in California and New York during the months February-May of all years'. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response
CloudTree: A Library to Extend Cloud Services for Trees
In this work, we propose a library that enables on a cloud the creation and
management of tree data structures from a cloud client. As a proof of concept,
we implement a new cloud service CloudTree. With CloudTree, users are able to
organize big data into tree data structures of their choice that are physically
stored in a cloud. We use caching, prefetching, and aggregation techniques in
the design and implementation of CloudTree to enhance performance. We have
implemented the services of Binary Search Trees (BST) and Prefix Trees as
current members in CloudTree and have benchmarked their performance using the
Amazon Cloud. The idea and techniques in the design and implementation of a BST
and prefix tree is generic and thus can also be used for other types of trees
such as B-tree, and other link-based data structures such as linked lists and
graphs. Preliminary experimental results show that CloudTree is useful and
efficient for various big data applications
The challenges of extract, transform and load (ETL) for data integration in near real-time environment
Organization with considerable investment into data warehousing, the influx of various data types and forms require certain ways of prepping data and staging platform that support fast, efficient and volatile data to reach its targeted audiences or users of different business needs. Extract, Transform and Load (ETL) system proved to be a choice standard for managing and sustaining the movement and transactional process of the valued big data assets. However, traditional ETL system can no longer accommodate and effectively handle streaming or near real-time data and stimulating environment which demands high availability, low latency and horizontal scalability features for functionality. This paper identifies the challenges of implementing ETL system for streaming or near real-time data which needs to evolve and streamline itself with the different requirements. Current efforts and solution approaches to address the challenges are presented. The classification of ETL system challenges are prepared based on near real-time environment features and ETL stages to encourage different perspectives for future research
A SYSTEMATIC REVIEW IDENTIFIES A LACK OF STANDARDIZATION IN OLAP QUERIES ON CLOUD COMPUTING
Due to storage and processing capacity of cloud computing, OLAP (OnLine Analitical Processing) systems have found a suitable environment in order to minimize issues of data storage and processing performance. Objective: This paper proposes to identify what guideline has been standardizing the implementation of OLAP query languages applied on cloud computing. Method: A systematic review was developed with three researchers, through Internet, whose method contains research question, search strategy, inclusion and exclusion criteria, selection process, data extraction and synthesis. Result: The result points to use of different OLAP query syntaxes and mechanisms, proprietary or not, meaning the lack of standardization. Conclusion: Considering the importance of standardization in any technology field, which is spurred by several international organizations worldwide, this paper identifies the necessity of works that propose a standardization to guide the building of OLAP query languages on cloud computing
A SYSTEMATIC REVIEW IDENTIFIES A LACK OF STANDARDIZATION IN OLAP QUERIES ON CLOUD COMPUTING
Due to storage and processing capacity of cloud computing, OLAP (OnLine Analitical Processing) systems have found a suitable environment in order to minimize issues of data storage and processing performance. Objective: This paper proposes to identify what guideline has been standardizing the implementation of OLAP query languages applied on cloud computing. Method: A systematic review was developed with three researchers, through Internet, whose method contains research question, search strategy, inclusion and exclusion criteria, selection process, data extraction and synthesis. Result: The result points to use of different OLAP query syntaxes and mechanisms, proprietary or not, meaning the lack of standardization. Conclusion: Considering the importance of standardization in any technology field, which is spurred by several international organizations worldwide, this paper identifies the necessity of works that propose a standardization to guide the building of OLAP query languages on cloud computing
Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)
Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data are usually multi-variant streams that are heterogeneous, sporadic, multi-modal, and spatio-temporal. IoT data can be disseminated with different granularities and have diverse structures, types, and qualities. Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery, and ranking mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data. However, the existing IoT data indexing and discovery approaches are complex or centralised, which hinders their scalability. The primary objective of this article is to provide a holistic overview of the state-of-the-art on indexing, discovery, and ranking of IoT data. The article aims to pave the way for researchers to design, develop, implement, and evaluate techniques and approaches for on-line large-scale distributed IoT applications and services