361 research outputs found
Hadoop Based Data Intensive Computation on IAAS Cloud Platforms
Cloud computing is a relatively new form of computing which uses virtualized resources. It is dynamically scalable and is often provided as pay for use service over the Internet or Intranet or both. With increasing demand for data storage in the cloud, the study of data-intensive applications is becoming a primary focus. Data intensive applications are those which involve high CPU usage, processing large volumes of data typically in size of hundreds of gigabytes, terabytes or petabytes. The research in this thesis is focused on the Amazon’s Elastic Cloud Compute (EC2) and Amazon Elastic Map Reduce (EMR) using HiBench Hadoop Benchmark suite. HiBench is a Hadoop benchmark suite and is used for performing and evaluating Hadoop based data intensive computation on both these cloud platforms. Both quantitative and qualitative comparisons of Amazon EC2 and Amazon EMR are presented. Also presented are their pricing models and suggestions for future research
Elastic Dataflow Processing on the Cloud
Τα νεφη εχουν μετατραπει σε μια ελκυστικη πλατφορμα για την πολυπλοκη
επεξεργασια δεδομενων μεγαλης κλιμακας, ειδικα εξαιτιας της εννοιας της
ελαστικοτητας, η οποια και τα χαρακτηριζει: οι υπολογιστικοι ποροι
μπορουν να εκμισθωθουν δυναμικα και να χρησιμοποιουνται για οσο χρονο
ειναι απαραιτητο. Αυτο δινει την δυνατοτητα να δημιουργηθει μια εικονικη
υποδομη η οποια μπορει να αλλαζει δυναμικα στο χρονο. Οι συγχρονες
εφαρμογες απαιτουν την εκτελεση πολυπλοκων ερωτηματων σε Μεγαλα Δεδομενα
για την εξορυξη γνωσης και την υποστηριξη επιχειρησιακων αποφασεων. Τα
πολυπλοκα αυτα ερωτηματα, εκφραζονται σε γλωσσες υψηλου επιπεδου και
τυπικα μεταφραζονται σε ροες επεξεργασιας δεδομενων, η απλα ροες
δεδομενων. Ενα λογικο ερωτημα που τιθεται ειναι κατα ποσον η
ελαστικοτητα επηρεαζει την εκτελεση των ροων δεδομενων και με πιο τροπο.
Ειναι λογικο οτι η εκτελεση να ειναι πιθανον γρηγοροτερη αν
χρησιμοποιηθουν περισ- σοτεροι υπολογιστικοι ποροι, αλλα το κοστος θα
ειναι υψηλοτερο. Αυτο δημιουργει την εννοια της οικο-ελαστικοτητας, ενος
επιπλεον τυπου ελαστικοτητας ο οποιος προερχεται απο την οικονο- μικη
θεωρια, και συλλαμβανει τις εναλλακτικες μεταξυ του χρονου εκτελεσης και
του χρηματικου κοστους οπως προκυπτει απο την χρηση των πορων.
Στα πλαισια αυτης της διδακτορικης διατριβης, προσεγγιζουμε την
ελαστικοτητα με ενα ενοποιημενο μοντελο που περιλαμβανει και τις δυο
ειδων ελαστικοτητες που υπαρχουν στα υπολογιστικα νεφη. Αυτη η
ενοποιημενη προσεγγιση της ελαστικοτητας ειναι πολυ σημαντικη στην
σχεδιαση συστηματων που ρυθμιζονται αυτοματα (auto-tuned) σε περιβαλλοντα
νεφους. Αρχικα δειχνουμε οτι η οικο-ελαστικοτητα υπαρχει σε αρκετους
τυπους υπολογισμου που εμφανιζονται συχνα στην πραξη και οτι μπορει να
βρεθει χρησιμοποιωντας εναν απλο, αλλα ταυτοχρονα αποδοτικο και ε-
πεκτασιμο αλγοριθμο. Επειτα, παρουσιαζουμε δυο εφαρμογες που
χρησιμοποιουν αλγοριθμους οι οποιοι χρησιμοποιουν το ενοποιημενο μοντελο
ελαστικοτητας που προτεινουμε για να μπορουν να προσαρμοζουν δυναμικα το
συστημα στα ερωτηματα της εισοδου: 1) την ελαστικη επεξεργασια αναλυτικων
ερωτηματων τα οποια εχουν πλανα εκτελεσης με μορφη δεντρων με σκοπο την
μεγι- στοποιηση του κερδους και 2) την αυτοματη διαχειριση χρησιμων
ευρετηριων λαμβανοντας υποψη το χρηματικο κοστος των υπολογιστικων και
των αποθηκευτικων πορων. Τελος, παρουσιαζουμε το EXAREME, ενα συστημα για
την ελαστικη επεξεργασια μεγαλου ογκου δεδομενων στο νεφος το οποιο
εχει χρησιμοποιηθει και επεκταθει σε αυτην την δουλεια. Το συστημα
προσφερει δηλωτικες γλωσσες που βασιζονται στην SQL επεκταμενη με
συναρτησεις οι οποιες μπορει να οριστουν απο χρηστες (User-Defined
Functions, UDFs). Επιπλεον, το συντακτικο της γλωσσας εχει επεκταθει με
στοιχεια παραλληλισμου. Το EXAREME εχει σχεδιαστει για να εκμεταλλευεται
τις ελαστικοτη- τες που προσφερουν τα νεφη, δεσμευοντας και αποδεσμευοντας
υπολογιστικους πορους δυναμικα με σκοπο την προσαρμογη στα ερωτηματα.Clouds have become an attractive platform for the large-scale processing of
modern applications on Big Data, especially due to the concept of elasticity,
which characterizes them: resources can be leased on demand and used for as
much time as needed, offering the ability to create virtual infrastructures
that change dynamically over time. Such applications often require processing
of complex queries that are expressed in a high-level language and are
typically transformed into data processing flows (dataflows). A logical
question that arises is whether elasticity affects dataflow execution and in
which way. It seems reasonable that the execution is faster when more resources
are used, however the monetary cost is higher. This gives rise to the concept
eco-elasticity, an additional kind of elasticity that comes from economics, and
captures the trade-offs between the response time of the system and the amount
of money we pay for it as influenced by the use of different amounts of
resources.
In this thesis, we approach the elasticity of clouds in a unified way that
combines both the traditional notion and eco-elasticity. This unified
elasticity concept is essential for the development of auto-tuned systems in
cloud environments. First, we demonstrate that eco-elasticity exists in several
common tasks that appear in practice and that can be discovered using a simple,
yet highly scalable and efficient algorithm. Next, we present two cases of
auto-tuned algorithms that use the unified model of elasticity in order to
adapt to the query workload: 1) processing analytical queries in the form of
tree execution plans in order to maximize profit and 2) automated index
management taking into account compute and storage re- sources. Finally, we
describe EXAREME, a system for elastic data processing on the cloud that has
been used and extended in this work. The system offers declarative languages
that are based on SQL with user-defined functions (UDFs) extended with
parallelism primi- tives. EXAREME exploits both elasticities of clouds by
dynamically allocating and deallocating compute resources in order to adapt to
the query workload
Recommended from our members
Elastic Resource Management in Distributed Clouds
The ubiquitous nature of computing devices and their increasing reliance on remote resources have driven and shaped public cloud platforms into unprecedented large-scale, distributed data centers. Concurrently, a plethora of cloud-based applications are experiencing multi-dimensional workload dynamics---workload volumes that vary along both time and space axes and with higher frequency.
The interplay of diverse workload characteristics and distributed clouds raises several key challenges for efficiently and dynamically managing server resources. First, current cloud platforms impose certain restrictions that might hinder some resource management tasks. Second, an application-agnostic approach might not entail appropriate performance goals, therefore, requires numerous specific methods. Third, provisioning resources outside LAN boundary might incur huge delay which would impact the desired agility.
In this dissertation, I investigate the above challenges and present the design of automated systems that manage resources for various applications in distributed clouds. The intermediate goal of these automated systems is to fully exploit potential benefits such as reduced network latency offered by increasingly distributed server resources. The ultimate goal is to improve end-to-end user response time with novel resource management approaches, within a certain cost budget.
Centered around these two goals, I first investigate how to optimize the location and performance of virtual machines in distributed clouds. I use virtual desktops, mostly serving a single user, as an example use case for developing a black-box approach that ranks virtual machines based on their dynamic latency requirements. Those with high latency sensitivities have a higher priority of being placed or migrated to a cloud location closest to their users. Next, I relax the assumption of well-provisioned virtual machines and look at how to provision enough resources for applications that exhibit both temporal and spatial workload fluctuations. I propose an application-agnostic queueing model that captures the resource utilization and server response time. Building upon this model, I present a geo-elastic provisioning approach---referred as geo-elasticity---for replicable multi-tier applications that can spin up an appropriate amount of server resources in any cloud locations. Last, I explore the benefits of providing geo-elasticity for database clouds, a popular platform for hosting application backends. Performing geo-elastic provisioning for backend database servers entails several challenges that are specific to database workload, and therefore requires tailored solutions. In addition, cloud platforms offer resources at various prices for different locations. Towards this end, I propose a cost-aware geo-elasticity that combines a regression-based workload model and a queueing network capacity model for database clouds.
In summary, hosting a diverse set of applications in an increasingly distributed cloud makes it interesting and necessary to develop new, efficient and dynamic resource management approaches
InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services
Cloud computing providers have setup several data centers at different
geographical locations over the Internet in order to optimally serve needs of
their customers around the world. However, existing systems do not support
mechanisms and policies for dynamically coordinating load distribution among
different Cloud-based data centers in order to determine optimal location for
hosting application services to achieve reasonable QoS levels. Further, the
Cloud computing providers are unable to predict geographic distribution of
users consuming their services, hence the load coordination must happen
automatically, and distribution of services must change in response to changes
in the load. To counter this problem, we advocate creation of federated Cloud
computing environment (InterCloud) that facilitates just-in-time,
opportunistic, and scalable provisioning of application services, consistently
achieving QoS targets under variable workload, resource and network conditions.
The overall goal is to create a computing environment that supports dynamic
expansion or contraction of capabilities (VMs, services, storage, and database)
for handling sudden variations in service demands.
This paper presents vision, challenges, and architectural elements of
InterCloud for utility-oriented federation of Cloud computing environments. The
proposed InterCloud environment supports scaling of applications across
multiple vendor clouds. We have validated our approach by conducting a set of
rigorous performance evaluation study using the CloudSim toolkit. The results
demonstrate that federated Cloud computing model has immense potential as it
offers significant performance gains as regards to response time and cost
saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape
ACTiCLOUD: Enabling the Next Generation of Cloud Applications
Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings
- …