199 research outputs found

    Autoscaling Hadoop Clusters

    Get PDF
    Pilve arvutused on viimaste aastate jooksul palju kõneainet pakkunud. Alates sellest, et tegemist ei ole millegi muuga kui virtualiseerimine ilusa nimega, kuni selleni, et tulevik on pilve arvutuste p aralt. Juba 4 aastat on virtuaalsed serverid, andmehoidlad, andmebaasid ja muud infrastruktuuri elemendid olnud k attesaadavad veebiteenustena. Antud töös me ehitame ise sklaleeruva MapReduce platvormi, mis baseerub vabalähtekoodiga tarkvara Apache Hadoop projektil. Antud platvorm skaleerib end ise, vastavalt serverite koormatusele k aivitab uusi servereid, et kiirendada arvutusprotsessi.Cloud computing, specifically Infrastructure as a Service model provides us with the facilities to provision new servers at will and increase the computing power of a cluster almost in real time. This provisioning and deprovisioning of servers can happen automatically based on some performance metrics of the cluster. We introduce a framework of autoscaling clusters in the private and public cloud ecosystem using the Eucalyptus and AWS software stack and use MapReduce as the service provided by the cluster

    Large Scale Data Analysis Using Apache Pig

    Get PDF
    Käesolev magistritöö kirjeldab andmete paralleeltöötluseks mõeldud tarkvararaamistiku Apache Pig kasutamist. Esitatud on konkreetne andmeanalüüsi ülesanne, mille lahendamiseks raamistikku kasutati. Selle töö eesmärk on näidata Pig-i kasulikkust suuremahuliseks andmeanalüüsiks. Raamistik Pig on loodud töötama koos paralleelarvutuste tegemise infrastruktuuriga Hadoop. Hadoop realiseerib MapReduce programmeerimismudelit. Pig käitub lisa-abstraktsioonitasemena MapReduce-i kohal, esitades andmeid relatsiooniliste tabelitena ning lubades programmeerijatel teha päringuid, kasutades Pig Latin päringukeelt. Pig-i testimiseks püstitati andmeanalüüsi ülesanne, mis oli vaja lahendada. Üheks osaks ülesandest oli RSS veebivoogudest kogutud uudistest päevade kaupa levinumate sõnade tuvastamine. Teine osa oli, suvalise sõnade hulga puhul, kogutud uudistest leidmine, kuidas muutus päevade kaupa selle sõnade hulga koosesinemiste arv uudistes. Lisaks tuli Pig-i kasutades realiseerida regulaaravaldisi rakendav teksti otsing kogutud uudiste seast. Probleemi lahendusena realiseeriti hulk Pig Latin keelseid skripte, mis töötlevad ja analüüsivad kogutud andmeid. Funktsionaalsuse kokku sidumiseks loodi programmeerimiskeeles Java raamprogramm, mis käivitab erinevaid Pig skripte vastavalt kasutaja sisendile. Andmete kogumiseks loodi eraldi rakendus, mida kasutati regulaarsete intervallide järel uudisvoogude failide alla laadimiseks. Loodud rakendust kasutati kogutud andmete analüüsiks ja töös on esitatud ka mõned analüüsi tulemused. Tulemustest võib näha, kuidas teatud sõnade ja sõnakombinatsioonide esinemissagedused muutuvad seoses sellega, kuidas sündmuste, mida need sõnad kirjeldavad, aktuaalsus suureneb ja väheneb.This work describes Apache Pig, a software framework designed for parallel data processing. An example data analysis problem is presented and solved using the framework. The objective of the work is to demonstrate the usefulness of Pig for large scale data analysis. Pig is built to work with the parallel computing framework Hadoop, which implements the MapReduce programming model. Pig acts as a layer of abstraction on top of MapReduce, presenting data as relational tables and allowing for data manipulation and queries in the Pig Latin query language. The data analysis problem used to test Pig involved collecting news stories from on-line RSS web feeds and identifying trends in the topics covered. As the solution, a number of Pig scripts were created to perform the necessary tasks and a Java application was implemented as a user interface wrapper for the Pig scripts

    Building a Cloud Storage Service System

    Get PDF
    AbstractCloud Storage services are increasingly noticed as they promise elastic capability and high reliability at low cost. In such services, you can store most of your files to authenticated Cloud Storage Service center, and you do not worry about your space being inadequate or wasted because the storage being able to be adjusted dynamically is the most important feature of the Cloud Storage. In this paper, we present a solution about how to build a Cloud Storage Service System based on the open-source distributed database, it follows a stratum design that includes Web service front-end, transformation processing layer and data storing layer. Terminal users can access their own data in this system through three Web service interfaces. More over, a complete prototype system based on this architecture is demonstrated

    Analisis Kerentanan Dan Kehandalan Layanan Jaringan Cloud Berbasis Platform Eucalyptus

    Get PDF
    Cloud computing is a computing paradigm that evolves from existing technology, such as grid computing, virtualization and the Internet. Cloud computing provides an illusion of unlimited computing resources, which can be accessed from anywhere, anytime. Despite the potential gains achieved from the cloud computing, the model security is still questionable which hindered adoption. The security problem becomes more complicated under the cloud model as new dimensions have entered into the problem scope related to the model architecture, multi-tenancy, elasticity, and layers dependency stack. Eucalyptus-based cloud network services widely deployed as private cloud infrastructure. Experiment on this paper focused on finding potential denial-of-service (DOS) and the impact on ability to provide services during attack. We observe an increase on response time up to 2863.22% during attack to the web-based management service. Reducing average system load to an acceptable level, help prevents disruption of the service, by implementing rate control and rate limit on cloud controller
    corecore