16 research outputs found

    Pembuatan Program Analisis “File Log” dengan Menggunakan Sistem “Mapreduce” untuk Mengidentifikasi Serangan Pada Aplikasi Web

    Get PDF
    Serangan Query Injection dan Cross Site Scripting pada aplikasi web bisa diidentifikasi melalui file log web server. Situs-situs yang memiliki pengunjung dengan jumlah besar menghadapi suatu permasalahan yaitu membengkaknya ukuran file log web server sehingga untuk mengolah file tersebut membutuhkan waktu yang lama. Pemrosesan paralel dapat membantu dalam memproses data-data berukuran besar sehingga waktu eksekusinya bisa lebih cepat, tetapi hal-hal yang mengatur proses paralel seperti pendistribusian data dan komputasi, pengendalian sistem, dan penanganan kegagalan perangkat keras sangat rumit dalam pengimplementasiannya. Sistem Mapreduce merupakan suatu pustaka yang secara otomatis menangani pendistribusian data dan komputasi, pengendalian sistem, dan penanganan kegagalan perangkat keras sehingga pembuatan program yang berjalan secara paralel tidak rumit. Tujuan penelitian ini adalah membuat program analisis file log untuk mengidentifikasi serangan pada aplikasi web yang mengimplementasikan sistem Mapreduce. Framework Mapreduce yang penulis gunakan sebagai sistem Mapreduce adalah Hadoop. Aplikasi yang penulis buat dibandingkan waktu eksekusinya dengan program analisis file log yang sudah ada yaitu Webalizer dan Awstat. Program analisis file log yang penulis buat berjalan secara paralel pada empat buah komputer yang saling terhubung, sedangkan aplikasi pembanding berjalan pada satu buah komputer. Masukan untuk program ini adalah file log web server dengan berbagai ukuran dan akan menghasilkan keluaran berupa hasil analisis adanya serangan pada aplikasi web. Dari hasil perbandingan waktu eksekusi didapat bahwa Mapreduce memerlukan waktu eksekusi yang lebih lama dibandingkan dengan Webalizer dan Awstats. Hal ini dikarenakan sistem Mapreduce lebih cocok dijalankan pada jumlah komputer yang besar dengan ukuran file yang tidak cukup disimpan dan dieksekusi pada satu buah komputer

    Introducing Cloud Computing Topics in Curricula

    Get PDF
    The demand for graduates with exposure in Cloud Computing is on the rise. For many educational institutions, the challenge is to decide on how to incorporate appropriate cloud-based technologies into their curricula. In this paper, we describe our design and experiences of integrating Cloud Computing components into seven third/fourth-year undergraduate-level information system, computer science, and general science courses that are related to large-scale data processing and analysis at the University of Queensland, Australia. For each course, we aimed at finding the best-available and cost-effective cloud technologies that fit well in the existing curriculum. The cloud related technologies discussed in this paper include open-source distributed computing tools such as Hadoop, Mahout, and Hive, as well as cloud services such as Windows Azure and Amazon Elastic Computing Cloud (EC2). We anticipate that our experiences will prove useful and of interest to fellow academics wanting to introduce Cloud Computing modules to existing courses

    A semi-automatic parallelization tool for Java based on fork-join synchronization patterns

    Get PDF
    Because of the increasing availability of multi-core machines, clusters, Grids, and combinations of these environments, there is now plenty of computational power available for executing compute intensive applications. However, because of the overwhelming and rapid advances in distributed and parallel hardware and environments, today?s programmers are not fully prepared to exploit distribution and parallelism. In this sense, the Java language has helped in handling the heterogeneity of such environments, but there is a lack of facilities and tools to easily distributing and parallelizing applications. One solution to mitigate this problem and make some progress towards producing general tools seems to be the synthesis of semi-automatic parallelism and Parallelism as a Concern (PaaC), which allows parallelizing applications along with as little modifications on sequential codes as possible. In this paper, we discuss a new approach that aims at overcoming the drawbacks of current Java-based parallel and distributed development tools, which precisely exploit these new conceptsFil: Hirsch, Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software; Argentina;Fil: Zunino, Alejandro. Consejo Nacional de Invest.cientif.y Tecnicas. Ctro Cientifico Tecnologico Conicet - Tandil. Instituto Superior de Ingenieria del Software;Fil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - CONICET - Tandil. Instituto Superior de Ingenieria del Software

    A semi-automatic parallelization tool for Java based on fork-join synchronization patterns

    Get PDF
    Because of the increasing availability of multi-core machines, clusters, Grids, and combinations of these environments, there is now plenty of computational power available for executing compute intensive applications. However, because of the overwhelming and rapid advances in distributed and parallel hardware and environments, today’s programmers are not fully prepared to exploit distribution and parallelism. In this sense, the Java language has helped in handling the heterogeneity of such environments, but there is a lack of facilities and tools to easily distributing and parallelizing applications. One solution to mitigate this problem and make some progress towards producing general tools seems to be the synthesis of semi-automatic parallelism and Parallelism as a Concern (PaaC), which allows parallelizing applications along with as little modifications on sequential codes as possible. In this paper, we discuss a new approach that aims at overcoming the drawbacks of current Java-based parallel and distributed development tools, which precisely exploit these new concepts.Sociedad Argentina de Informática e Investigación Operativ

    El impacto de las aplicaciones intensivas de E/S en la planificación de trabajos en clusters no-dedicados

    Get PDF
    Con la mayor capacidad de los nodos de procesamiento en relación a la potencia de cómputo, cada vez más aplicaciones intensivas de datos como las aplicaciones de la bioinformática, se llevarán a ejecutar en clusters no dedicados. Los clusters no dedicados se caracterizan por su capacidad de combinar la ejecución de aplicaciones de usuarios locales con aplicaciones, científicas o comerciales, ejecutadas en paralelo. Saber qué efecto las aplicaciones con acceso intensivo a dados producen respecto a la mezcla de otro tipo (batch, interativa, SRT, etc) en los entornos no-dedicados permite el desarrollo de políticas de planificación más eficientes. Algunas de las aplicaciones intensivas de E/S se basan en el paradigma MapReduce donde los entornos que las utilizan, como Hadoop, se ocupan de la localidad de los datos, balanceo de carga de forma automática y trabajan con sistemas de archivos distribuidos. El rendimiento de Hadoop se puede mejorar sin aumentar los costos de hardware, al sintonizar varios parámetros de configuración claves para las especificaciones del cluster, para el tamaño de los datos de entrada y para el procesamiento complejo. La sincronización de estos parámetros de sincronización puede ser demasiado compleja para el usuario y/o administrador pero procura garantizar prestaciones más adecuadas. Este trabajo propone la evaluación del impacto de las aplicaciones intensivas de E/S en la planificación de trabajos en clusters no-dedicados bajo los paradigmas MPI y Mapreduce.Amb la major capacitat dels nodes de processament en relació a potència de còmput, cada vegada més aplicacions intensives de dades com les aplicacions de la bioinformàtica, es duran a executar en clusters no dedicats. Els clusters no dedicats es caracteritzen per la seva capacitat de combinar l'execució d'aplicacions d'usuaris locals amb aplicacions, científiques o comercials, executades en paral·lel. Saber quin efecte les aplicacions amb accés intensiu a daus produeixen respecte a la barreja d'un altre tipus (batch, interès, SRT, etc) en els entorns no-dedicats permet el desenvolupament de polítiques de planificació més eficient. Algunes de les aplicacions intensives d'E/S es basen en el paradigma MapReduce on els entorns que les utilitzen, com Hadoop, s'ocupen de la localitat de les dades, balanceig de càrrega de forma automàtica i treballen amb sistemes d'arxius distribuïts. L'exercici de Hadoop es pot millorar sense augmentar els costos de maquinari, en sintonitzar diversos paràmetres de configuració claus per a les especificacions del cluster, per la mida de les dades d'entrada i per al processament complex. La sincronització d'aquests paràmetres de sincronització pot ser massa complexa per a l'usuari i/o administrador però procura garantir prestacions més adequades. Aquest treball proposa l'avaluació de l'impacte de les aplicacions intensives d'E/S en la planificació de treballs en clusters no-dedicats sota els paradigmes MPI i MapReduce.With the increased capacity of processing nodes in relation to computing power, increasingly data-intensive applications such as applications of bioinformatics, will be run on non-dedicated clusters. The non-dedicated clusters are characterized by their ability to combine the implementation of local user applications with applications, scientific or commercial, executed in parallel. Learn what effect intensive applications to access given for mixed produce other (batch, interest, SRT, etc) in the non-dedicated environment allows the development of more efficient planning policies. Some intensive applications E/S are based on the MapReduce paradigm where environments that use them, such as Hadoop, dealing with data locality, load balancing automatically and work with distributed file systems. Hadoop's performance can be improved without increasing the costs of hardware, tune several key settings to the specifications of the cluster, for the size of the input data and complex processing. The timing of these timing parameters may be too complex for the user or administrator but seeks to ensure more adequate benefits. This master thesis proposes the evaluation of the impact of intensive applications E/S in planning work on non-dedicated clusters under the MPI, MapReduce paradigm

    Running parallel applications on a heterogeneous environment with accessible development practices and automatic scalability

    Get PDF
    Grid computing makes it possible to gather large quantities of resources to work on a problem. In order to exploit this potential, a framework that presents the resources to the user programmer in a form that maintains productivity is necessary. The framework must not only provide accessible development, but it must make efficient use of the resources. The Seeds framework is proposed. It uses the current Grid and distributed computing middleware to provide a parallel programming environment to a wider community of programmers. The framework was used to investigate the feasibility of scaling skeleton/pattern parallel programming into Grid computing. The research accomplished two goals: it made parallel programming on the Grid more accessible to domain­specific programmers, and it made parallel programs scale on a heterogeneous resource environ­ ment. Programming is made easier to the programmer by using skeleton and pat­ tern­based programming approaches that effectively isolate the program from the envi­ ronment. To extend the pattern approach, the pattern adder operator is proposed, imple­ mented and tested. The results show the pattern operator can reduce the number of lines of code when compared with an MPJ­Express implementation for a stencil algorithm while having an overhead of at most ten microseconds per iteration. The research in scal­ ability involved adapting existing load­balancing techniques to skeletons and patterns re­ quiring little additional configuration on the part of the programmer. The hierarchical de­ pendency concept is proposed as well, which uses a streamed data flow programming model. The concept introduces data flow computation hibernation and dependencies that can split to accommodate additional processors. The results from implementing skeleton/patterns on hierarchical dependencies show an 18.23% increase in code is neces­ sary to enable automatic scalability. The concept can increase speedup depending on the algorithm and grain size
    corecore