Search CORE

12,396 research outputs found

Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

Author: Buyya Rajkumar
Islam Muhammed Tawfiqul
Publication venue
Publication date: 30/12/2018
Field of study

This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure

arXiv.org e-Print Archive

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Author: Dolev Shlomi
Florissi Patricia
Gudes Ehud
Sharma Shantanu
Singer Ido
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/07/2017
Field of study

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.Comment: IEEE Transactions on Big Data; Accepted June 2017. 20 page

arXiv.org e-Print Archive

IoT Stream Processing and Analytics in The Fog

Author: Yang Shusen
Publication venue
Publication date: 16/05/2017
Field of study

The emerging Fog paradigm has been attracting increasing interests from both academia and industry, due to the low-latency, resilient, and cost-effective services it can provide. Many Fog applications such as video mining and event monitoring, rely on data stream processing and analytics, which are very popular in the Cloud, but have not been comprehensively investigated in the context of Fog architecture. In this article, we present the general models and architecture of Fog data streaming, by analyzing the common properties of several typical applications. We also analyze the design space of Fog streaming with the consideration of four essential dimensions (system, data, human, and optimization), where both new design challenges and the issues arise from leveraging existing techniques are investigated, such as Cloud stream processing, computer networks, and mobile computing

arXiv.org e-Print Archive

iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in Internet of Things, Edge and Fog Computing Environments

Author: Buyya Rajkumar
Dastjerdi Amir Vahid
Ghosh Soumya K.
Gupta Harshit
Publication venue
Publication date: 06/06/2016
Field of study

Internet of Things (IoT) aims to bring every object (e.g. smart cameras, wearable, environmental sensors, home appliances, and vehicles) online, hence generating massive amounts of data that can overwhelm storage systems and data analytics applications. Cloud computing offers services at the infrastructure level that can scale to IoT storage and processing requirements. However, there are applications such as health monitoring and emergency response that require low latency, and delay caused by transferring data to the cloud and then back to the application can seriously impact their performances. To overcome this limitation, Fog computing paradigm has been proposed, where cloud services are extended to the edge of the network to decrease the latency and network congestion. To realize the full potential of Fog and IoT paradigms for real-time analytics, several challenges need to be addressed. The first and most critical problem is designing resource management techniques that determine which modules of analytics applications are pushed to each edge device to minimize the latency and maximize the throughput. To this end, we need a evaluation platform that enables the quantification of performance of resource management policies on an IoT or Fog computing infrastructure in a repeatable manner. In this paper we propose a simulator, called iFogSim, to model IoT and Fog environments and measure the impact of resource management techniques in terms of latency, network congestion, energy consumption, and cost. We describe two case studies to demonstrate modeling of an IoT environment and comparison of resource management policies. Moreover, scalability of the simulation toolkit in terms of RAM consumption and execution time is verified under different circumstances.Comment: Cloud Computing and Distributed Systems Laboratory, The University of Melbourne, June 6, 201

arXiv.org e-Print Archive

Towards Media Intercloud Standardization Evaluating Impact of Cloud Storage Heterogeneity

Author: Aazam Mohammad
Huh EuiNam
StHilaire Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/02/2016
Field of study

Digital media has been increasing very rapidly, resulting in cloud computing's popularity gain. Cloud computing provides ease of management of large amount of data and resources. With a lot of devices communicating over the Internet and with the rapidly increasing user demands, solitary clouds have to communicate to other clouds to fulfill the demands and discover services elsewhere. This scenario is called intercloud computing or cloud federation. Intercloud computing still lacks standard architecture. Prior works discuss some of the architectural blueprints, but none of them highlight the key issues involved and their impact, so that a valid and reliable architecture could be envisioned. In this paper, we discuss the importance of intercloud computing and present in detail its architectural components. Intercloud computing also involves some issues. We discuss key issues as well and present impact of storage heterogeneity. We have evaluated some of the most noteworthy cloud storage services, namely Dropbox, Amazon CloudDrive, GoogleDrive, Microsoft OneDrive (formerly SkyDrive), Box, and SugarSync in terms of Quality of Experience (QoE), Quality of Service (QoS), and storage space efficiency. Discussion on the results shows the acceptability level of these storage services and the shortcomings in their design.Comment: 13 pages. 14 figures, Springer Journal of Grid Computing, 201

arXiv.org e-Print Archive

All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey

Author: Fung Caleb
Jalali Fatemeh
Jue Jason P.
Kadiyala Krishna
Kong Jian
Nguyen Tam
Niakanlahiji Amirreza
Yousefpour Ashkan
Publication venue: 'Elsevier BV'
Publication date: 13/02/2019
Field of study

With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promising solutions for handling the large volume of security-critical and time-sensitive data that is being produced by the IoT. In this paper, we first provide a tutorial on fog computing and its related computing paradigms, including their similarities and differences. Next, we provide a taxonomy of research topics in fog computing, and through a comprehensive survey, we summarize and categorize the efforts on fog computing and its related computing paradigms. Finally, we provide challenges and future directions for research in fog computing.Comment: 48 pages, 7 tables, 11 figures, 450 references. The data (categories and features/objectives of the papers) of this survey are now available publicly. Accepted by Elsevier Journal of Systems Architectur

arXiv.org e-Print Archive

Internet of Things: An Overview

Author: Buyya Rajkumar
Dastjerdi Amir Vahid
Khodadadi Farzad
Publication venue
Publication date: 19/03/2017
Field of study

As technology proceeds and the number of smart devices continues to grow substantially, need for ubiquitous context-aware platforms that support interconnected, heterogeneous, and distributed network of devices has given rise to what is referred today as Internet-of-Things. However, paving the path for achieving aforementioned objectives and making the IoT paradigm more tangible requires integration and convergence of different knowledge and research domains, covering aspects from identification and communication to resource discovery and service integration. Through this chapter, we aim to highlight researches in topics including proposed architectures, security and privacy, network communication means and protocols, and eventually conclude by providing future directions and open challenges facing the IoT development.Comment: Keywords: Internet of Things; IoT; Web of Things; Cloud of Thing

arXiv.org e-Print Archive

Scheduling in distributed systems: A cloud computing perspective

Author: Bittencourt Luiz F.
da Fonseca Nelson L. S.
Goldman Alfredo
Madeira Edmundo R. M.
Sakellariou Rizos
Publication venue: 'Elsevier BV'
Publication date: 10/01/2019
Field of study

Scheduling is essentially a decision-making process that enables resource sharing among a number of activities by determining their execution order on the set of available resources. The emergence of distributed systems brought new challenges on scheduling in computer systems, including clusters, grids, and more recently clouds. On the other hand, the plethora of research makes it hard for both newcomers researchers to understand the relationship among different scheduling problems and strategies proposed in the literature, which hampers the identification of new and relevant research avenues. In this paper we introduce a classification of the scheduling problem in distributed systems by presenting a taxonomy that incorporates recent developments, especially those in cloud computing. We review the scheduling literature to corroborate the taxonomy and analyze the interest in different branches of the proposed taxonomy. Finally, we identify relevant future directions in scheduling for distributed systems

arXiv.org e-Print Archive

Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives

Author: Alam Mansaf
Khan Samiya
Shakil Kashish Ara
Publication venue
Publication date: 24/11/2017
Field of study

The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and data analytics techniques have found use in this sphere. This has given rise to the development of computing models and tools for big data computing. However, the storage and processing requirements are overwhelming for traditional systems and technologies. Therefore, there is a need for infrastructures that can adjust the storage and processing capability in accordance with the changing data dimensions. Cloud Computing serves as a potential solution to this problem. However, big data computing in the cloud has its own set of challenges and research issues. This chapter surveys the big data concept, discusses the mathematical and data analytics techniques that can be used for big data and gives taxonomy of the existing tools, frameworks and platforms available for different big data computing models. Besides this, it also evaluates the viability of cloud-based big data computing, examines existing challenges and opportunities, and provides future research directions in this field

arXiv.org e-Print Archive

GPU PaaS Computation Model in Aneka Cloud Computing Environment

Author: Buyya Rajkumar
Ilager Shashikant
Kune Raghavendra
Wankar Rajeev
Publication venue
Publication date: 20/08/2018
Field of study

Due to the surge in the volume of data generated and rapid advancement in Artificial Intelligence (AI) techniques like machine learning and deep learning, the existing traditional computing models have become inadequate to process an enormous volume of data and the complex application logic for extracting intrinsic information. Computing accelerators such as Graphics processing units (GPUs) have become de facto SIMD computing system for many big data and machine learning applications. On the other hand, the traditional computing model has gradually switched from conventional ownership-based computing to subscription-based cloud computing model. However, the lack of programming models and frameworks to develop cloud-native applications in a seamless manner to utilize both CPU and GPU resources in the cloud has become a bottleneck for rapid application development. To support this application demand for simultaneous heterogeneous resource usage, programming models and new frameworks are needed to manage the underlying resources effectively. Aneka is emerged as a popular PaaS computing model for the development of Cloud applications using multiple programming models like Thread, Task, and MapReduce in a single container .NET platform. Since, Aneka addresses MIMD application development that uses CPU based resources and GPU programming like CUDA is designed for SIMD application development, here, the chapter discusses GPU PaaS computing model for Aneka Clouds for rapid cloud application development for .NET platforms. The popular opensource GPU libraries are utilized and integrated it into the existing Aneka task programming model. The scheduling policies are extended that automatically identify GPU machines and schedule respective tasks accordingly. A case study on image processing is discussed to demonstrate the system, which has been built using PaaS Aneka SDKs and CUDA library.Comment: Submitted as book chapter, under processing, 32 page

arXiv.org e-Print Archive