12,396 research outputs found
Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments
This chapter presents software architectures of the big data processing
platforms. It will provide an in-depth knowledge on resource management
techniques involved while deploying big data processing systems on cloud
environment. It starts from the very basics and gradually introduce the core
components of resource management which we have divided in multiple layers. It
covers the state-of-art practices and researches done in SLA-based resource
management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure
A Survey on Geographically Distributed Big-Data Processing using MapReduce
Hadoop and Spark are widely used distributed processing frameworks for
large-scale data processing in an efficient and fault-tolerant manner on
private or public clouds. These big-data processing systems are extensively
used by many industries, e.g., Google, Facebook, and Amazon, for solving a
large class of problems, e.g., search, clustering, log analysis, different
types of join operations, matrix multiplication, pattern matching, and social
network analysis. However, all these popular systems have a major drawback in
terms of locally distributed computations, which prevent them in implementing
geographically distributed data processing. The increasing amount of
geographically distributed massive data is pushing industries and academia to
rethink the current big-data processing systems. The novel frameworks, which
will be beyond state-of-the-art architectures and technologies involved in the
current system, are expected to process geographically distributed data at
their locations without moving entire raw datasets to a single location. In
this paper, we investigate and discuss challenges and requirements in designing
geographically distributed data processing frameworks and protocols. We
classify and study batch processing (MapReduce-based systems), stream
processing (Spark-based systems), and SQL-style processing geo-distributed
frameworks, models, and algorithms with their overhead issues.Comment: IEEE Transactions on Big Data; Accepted June 2017. 20 page
IoT Stream Processing and Analytics in The Fog
The emerging Fog paradigm has been attracting increasing interests from both
academia and industry, due to the low-latency, resilient, and cost-effective
services it can provide. Many Fog applications such as video mining and event
monitoring, rely on data stream processing and analytics, which are very
popular in the Cloud, but have not been comprehensively investigated in the
context of Fog architecture. In this article, we present the general models and
architecture of Fog data streaming, by analyzing the common properties of
several typical applications. We also analyze the design space of Fog streaming
with the consideration of four essential dimensions (system, data, human, and
optimization), where both new design challenges and the issues arise from
leveraging existing techniques are investigated, such as Cloud stream
processing, computer networks, and mobile computing
iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in Internet of Things, Edge and Fog Computing Environments
Internet of Things (IoT) aims to bring every object (e.g. smart cameras,
wearable, environmental sensors, home appliances, and vehicles) online, hence
generating massive amounts of data that can overwhelm storage systems and data
analytics applications. Cloud computing offers services at the infrastructure
level that can scale to IoT storage and processing requirements. However, there
are applications such as health monitoring and emergency response that require
low latency, and delay caused by transferring data to the cloud and then back
to the application can seriously impact their performances. To overcome this
limitation, Fog computing paradigm has been proposed, where cloud services are
extended to the edge of the network to decrease the latency and network
congestion. To realize the full potential of Fog and IoT paradigms for
real-time analytics, several challenges need to be addressed. The first and
most critical problem is designing resource management techniques that
determine which modules of analytics applications are pushed to each edge
device to minimize the latency and maximize the throughput. To this end, we
need a evaluation platform that enables the quantification of performance of
resource management policies on an IoT or Fog computing infrastructure in a
repeatable manner. In this paper we propose a simulator, called iFogSim, to
model IoT and Fog environments and measure the impact of resource management
techniques in terms of latency, network congestion, energy consumption, and
cost. We describe two case studies to demonstrate modeling of an IoT
environment and comparison of resource management policies. Moreover,
scalability of the simulation toolkit in terms of RAM consumption and execution
time is verified under different circumstances.Comment: Cloud Computing and Distributed Systems Laboratory, The University of
Melbourne, June 6, 201
Towards Media Intercloud Standardization Evaluating Impact of Cloud Storage Heterogeneity
Digital media has been increasing very rapidly, resulting in cloud
computing's popularity gain. Cloud computing provides ease of management of
large amount of data and resources. With a lot of devices communicating over
the Internet and with the rapidly increasing user demands, solitary clouds have
to communicate to other clouds to fulfill the demands and discover services
elsewhere. This scenario is called intercloud computing or cloud federation.
Intercloud computing still lacks standard architecture. Prior works discuss
some of the architectural blueprints, but none of them highlight the key issues
involved and their impact, so that a valid and reliable architecture could be
envisioned. In this paper, we discuss the importance of intercloud computing
and present in detail its architectural components. Intercloud computing also
involves some issues. We discuss key issues as well and present impact of
storage heterogeneity. We have evaluated some of the most noteworthy cloud
storage services, namely Dropbox, Amazon CloudDrive, GoogleDrive, Microsoft
OneDrive (formerly SkyDrive), Box, and SugarSync in terms of Quality of
Experience (QoE), Quality of Service (QoS), and storage space efficiency.
Discussion on the results shows the acceptability level of these storage
services and the shortcomings in their design.Comment: 13 pages. 14 figures, Springer Journal of Grid Computing, 201
All One Needs to Know about Fog Computing and Related Edge Computing Paradigms: A Complete Survey
With the Internet of Things (IoT) becoming part of our daily life and our
environment, we expect rapid growth in the number of connected devices. IoT is
expected to connect billions of devices and humans to bring promising
advantages for us. With this growth, fog computing, along with its related edge
computing paradigms, such as multi-access edge computing (MEC) and cloudlet,
are seen as promising solutions for handling the large volume of
security-critical and time-sensitive data that is being produced by the IoT. In
this paper, we first provide a tutorial on fog computing and its related
computing paradigms, including their similarities and differences. Next, we
provide a taxonomy of research topics in fog computing, and through a
comprehensive survey, we summarize and categorize the efforts on fog computing
and its related computing paradigms. Finally, we provide challenges and future
directions for research in fog computing.Comment: 48 pages, 7 tables, 11 figures, 450 references. The data (categories
and features/objectives of the papers) of this survey are now available
publicly. Accepted by Elsevier Journal of Systems Architectur
Internet of Things: An Overview
As technology proceeds and the number of smart devices continues to grow
substantially, need for ubiquitous context-aware platforms that support
interconnected, heterogeneous, and distributed network of devices has given
rise to what is referred today as Internet-of-Things. However, paving the path
for achieving aforementioned objectives and making the IoT paradigm more
tangible requires integration and convergence of different knowledge and
research domains, covering aspects from identification and communication to
resource discovery and service integration. Through this chapter, we aim to
highlight researches in topics including proposed architectures, security and
privacy, network communication means and protocols, and eventually conclude by
providing future directions and open challenges facing the IoT development.Comment: Keywords: Internet of Things; IoT; Web of Things; Cloud of Thing
Scheduling in distributed systems: A cloud computing perspective
Scheduling is essentially a decision-making process that enables resource
sharing among a number of activities by determining their execution order on
the set of available resources. The emergence of distributed systems brought
new challenges on scheduling in computer systems, including clusters, grids,
and more recently clouds. On the other hand, the plethora of research makes it
hard for both newcomers researchers to understand the relationship among
different scheduling problems and strategies proposed in the literature, which
hampers the identification of new and relevant research avenues. In this paper
we introduce a classification of the scheduling problem in distributed systems
by presenting a taxonomy that incorporates recent developments, especially
those in cloud computing. We review the scheduling literature to corroborate
the taxonomy and analyze the interest in different branches of the proposed
taxonomy. Finally, we identify relevant future directions in scheduling for
distributed systems
Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives
The excessive amounts of data generated by devices and Internet-based sources
at a regular basis constitute, big data. This data can be processed and
analyzed to develop useful applications for specific domains. Several
mathematical and data analytics techniques have found use in this sphere. This
has given rise to the development of computing models and tools for big data
computing. However, the storage and processing requirements are overwhelming
for traditional systems and technologies. Therefore, there is a need for
infrastructures that can adjust the storage and processing capability in
accordance with the changing data dimensions. Cloud Computing serves as a
potential solution to this problem. However, big data computing in the cloud
has its own set of challenges and research issues. This chapter surveys the big
data concept, discusses the mathematical and data analytics techniques that can
be used for big data and gives taxonomy of the existing tools, frameworks and
platforms available for different big data computing models. Besides this, it
also evaluates the viability of cloud-based big data computing, examines
existing challenges and opportunities, and provides future research directions
in this field
GPU PaaS Computation Model in Aneka Cloud Computing Environment
Due to the surge in the volume of data generated and rapid advancement in
Artificial Intelligence (AI) techniques like machine learning and deep
learning, the existing traditional computing models have become inadequate to
process an enormous volume of data and the complex application logic for
extracting intrinsic information. Computing accelerators such as Graphics
processing units (GPUs) have become de facto SIMD computing system for many big
data and machine learning applications. On the other hand, the traditional
computing model has gradually switched from conventional ownership-based
computing to subscription-based cloud computing model. However, the lack of
programming models and frameworks to develop cloud-native applications in a
seamless manner to utilize both CPU and GPU resources in the cloud has become a
bottleneck for rapid application development. To support this application
demand for simultaneous heterogeneous resource usage, programming models and
new frameworks are needed to manage the underlying resources effectively. Aneka
is emerged as a popular PaaS computing model for the development of Cloud
applications using multiple programming models like Thread, Task, and MapReduce
in a single container .NET platform. Since, Aneka addresses MIMD application
development that uses CPU based resources and GPU programming like CUDA is
designed for SIMD application development, here, the chapter discusses GPU PaaS
computing model for Aneka Clouds for rapid cloud application development for
.NET platforms. The popular opensource GPU libraries are utilized and
integrated it into the existing Aneka task programming model. The scheduling
policies are extended that automatically identify GPU machines and schedule
respective tasks accordingly. A case study on image processing is discussed to
demonstrate the system, which has been built using PaaS Aneka SDKs and CUDA
library.Comment: Submitted as book chapter, under processing, 32 page
- …