361 research outputs found

    Hadoop Based Data Intensive Computation on IAAS Cloud Platforms

    Get PDF
    Cloud computing is a relatively new form of computing which uses virtualized resources. It is dynamically scalable and is often provided as pay for use service over the Internet or Intranet or both. With increasing demand for data storage in the cloud, the study of data-intensive applications is becoming a primary focus. Data intensive applications are those which involve high CPU usage, processing large volumes of data typically in size of hundreds of gigabytes, terabytes or petabytes. The research in this thesis is focused on the Amazon’s Elastic Cloud Compute (EC2) and Amazon Elastic Map Reduce (EMR) using HiBench Hadoop Benchmark suite. HiBench is a Hadoop benchmark suite and is used for performing and evaluating Hadoop based data intensive computation on both these cloud platforms. Both quantitative and qualitative comparisons of Amazon EC2 and Amazon EMR are presented. Also presented are their pricing models and suggestions for future research

    Elastic Dataflow Processing on the Cloud

    Get PDF
    Τα νεφη εχουν μετατραπει σε μια ελκυστικη πλατφορμα για την πολυπλοκη επεξεργασια δεδομενων μεγαλης κλιμακας, ειδικα εξαιτιας της εννοιας της ελαστικοτητας, η οποια και τα χαρακτηριζει: οι υπολογιστικοι ποροι μπορουν να εκμισθωθουν δυναμικα και να χρησιμοποιουνται για οσο χρονο ειναι απαραιτητο. Αυτο δινει την δυνατοτητα να δημιουργηθει μια εικονικη υποδομη η οποια μπορει να αλλαζει δυναμικα στο χρονο. Οι συγχρονες εφαρμογες απαιτουν την εκτελεση πολυπλοκων ερωτηματων σε Μεγαλα Δεδομενα για την εξορυξη γνωσης και την υποστηριξη επιχειρησιακων αποφασεων. Τα πολυπλοκα αυτα ερωτηματα, εκφραζονται σε γλωσσες υψηλου επιπεδου και τυπικα μεταφραζονται σε ροες επεξεργασιας δεδομενων, η απλα ροες δεδομενων. Ενα λογικο ερωτημα που τιθεται ειναι κατα ποσον η ελαστικοτητα επηρεαζει την εκτελεση των ροων δεδομενων και με πιο τροπο. Ειναι λογικο οτι η εκτελεση να ειναι πιθανον γρηγοροτερη αν χρησιμοποιηθουν περισ- σοτεροι υπολογιστικοι ποροι, αλλα το κοστος θα ειναι υψηλοτερο. Αυτο δημιουργει την εννοια της οικο-ελαστικοτητας, ενος επιπλεον τυπου ελαστικοτητας ο οποιος προερχεται απο την οικονο- μικη θεωρια, και συλλαμβανει τις εναλλακτικες μεταξυ του χρονου εκτελεσης και του χρηματικου κοστους οπως προκυπτει απο την χρηση των πορων. Στα πλαισια αυτης της διδακτορικης διατριβης, προσεγγιζουμε την ελαστικοτητα με ενα ενοποιημενο μοντελο που περιλαμβανει και τις δυο ειδων ελαστικοτητες που υπαρχουν στα υπολογιστικα νεφη. Αυτη η ενοποιημενη προσεγγιση της ελαστικοτητας ειναι πολυ σημαντικη στην σχεδιαση συστηματων που ρυθμιζονται αυτοματα (auto-tuned) σε περιβαλλοντα νεφους. Αρχικα δειχνουμε οτι η οικο-ελαστικοτητα υπαρχει σε αρκετους τυπους υπολογισμου που εμφανιζονται συχνα στην πραξη και οτι μπορει να βρεθει χρησιμοποιωντας εναν απλο, αλλα ταυτοχρονα αποδοτικο και ε- πεκτασιμο αλγοριθμο. Επειτα, παρουσιαζουμε δυο εφαρμογες που χρησιμοποιουν αλγοριθμους οι οποιοι χρησιμοποιουν το ενοποιημενο μοντελο ελαστικοτητας που προτεινουμε για να μπορουν να προσαρμοζουν δυναμικα το συστημα στα ερωτηματα της εισοδου: 1) την ελαστικη επεξεργασια αναλυτικων ερωτηματων τα οποια εχουν πλανα εκτελεσης με μορφη δεντρων με σκοπο την μεγι- στοποιηση του κερδους και 2) την αυτοματη διαχειριση χρησιμων ευρετηριων λαμβανοντας υποψη το χρηματικο κοστος των υπολογιστικων και των αποθηκευτικων πορων. Τελος, παρουσιαζουμε το EXAREME, ενα συστημα για την ελαστικη επεξεργασια μεγαλου ογκου δεδομενων στο νεφος το οποιο εχει χρησιμοποιηθει και επεκταθει σε αυτην την δουλεια. Το συστημα προσφερει δηλωτικες γλωσσες που βασιζονται στην SQL επεκταμενη με συναρτησεις οι οποιες μπορει να οριστουν απο χρηστες (User-Defined Functions, UDFs). Επιπλεον, το συντακτικο της γλωσσας εχει επεκταθει με στοιχεια παραλληλισμου. Το EXAREME εχει σχεδιαστει για να εκμεταλλευεται τις ελαστικοτη- τες που προσφερουν τα νεφη, δεσμευοντας και αποδεσμευοντας υπολογιστικους πορους δυναμικα με σκοπο την προσαρμογη στα ερωτηματα.Clouds have become an attractive platform for the large-scale processing of modern applications on Big Data, especially due to the concept of elasticity, which characterizes them: resources can be leased on demand and used for as much time as needed, offering the ability to create virtual infrastructures that change dynamically over time. Such applications often require processing of complex queries that are expressed in a high-level language and are typically transformed into data processing flows (dataflows). A logical question that arises is whether elasticity affects dataflow execution and in which way. It seems reasonable that the execution is faster when more resources are used, however the monetary cost is higher. This gives rise to the concept eco-elasticity, an additional kind of elasticity that comes from economics, and captures the trade-offs between the response time of the system and the amount of money we pay for it as influenced by the use of different amounts of resources. In this thesis, we approach the elasticity of clouds in a unified way that combines both the traditional notion and eco-elasticity. This unified elasticity concept is essential for the development of auto-tuned systems in cloud environments. First, we demonstrate that eco-elasticity exists in several common tasks that appear in practice and that can be discovered using a simple, yet highly scalable and efficient algorithm. Next, we present two cases of auto-tuned algorithms that use the unified model of elasticity in order to adapt to the query workload: 1) processing analytical queries in the form of tree execution plans in order to maximize profit and 2) automated index management taking into account compute and storage re- sources. Finally, we describe EXAREME, a system for elastic data processing on the cloud that has been used and extended in this work. The system offers declarative languages that are based on SQL with user-defined functions (UDFs) extended with parallelism primi- tives. EXAREME exploits both elasticities of clouds by dynamically allocating and deallocating compute resources in order to adapt to the query workload

    InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services

    Full text link
    Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape

    ACTiCLOUD: Enabling the Next Generation of Cloud Applications

    Get PDF
    Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings
    corecore