20 research outputs found

    Volunteer computing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-216).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis presents the idea of volunteer computing, which allows high-performance parallel computing networks to be formed easily, quickly, and inexpensively by enabling ordinary Internet users to share their computers' idle processing power without needing expert help. In recent years, projects such as SETI@home have demonstrated the great potential power of volunteer computing. In this thesis, we identify volunteer computing's further potentials, and show how these can be achieved. We present the Bayanihan system for web-based volunteer computing. Using Java applets, Bayanihan enables users to volunteer their computers by simply visiting a web page. This makes it possible to set up parallel computing networks in a matter of minutes compared to the hours, days, or weeks required by traditional NOW and metacomputing systems. At the same time, Bayanihan provides a flexible object-oriented software framework that makes it easy for programmers to write various applications, and for researchers to address issues such as adaptive parallelism, fault-tolerance, and scalability. Using Bayanihan, we develop a general-purpose runtime system and APIs, and show how volunteer computing's usefulness extends beyond solving esoteric mathematical problems to other, more practical, master-worker applications such as image rendering, distributed web-crawling, genetic algorithms, parametric analysis, and Monte Carlo simulations. By presenting a new API using the bulk synchronous parallel (BSP) model, we further show that contrary to popular belief and practice, volunteer computing need not be limited to master-worker applications, but can be used for coarse-grain message-passing programs as well. Finally, we address the new problem of maintaining reliability in the presence of malicious volunteers. We present and analyze traditional techniques such as voting, and new ones such as spot-checking, encrypted computation, and periodic obfuscation. Then, we show how these can be integrated in a new idea called credibility-based fault-tolerance, which uses probability estimates to limit and direct the use of redundancy. We validate this new idea with parallel Monte Carlo simulations, and show how it can achieve error rates several orders-of-magnitude smaller than traditional voting for the same slowdown.by Luis F.G. Sarmenta.Ph.D

    Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations

    Get PDF
    Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance

    Graphics Processing Units: Abstract Modelling and Applications in Bioinformatics

    Get PDF
    The Graphical Processing Unit is a specialised piece of hardware that contains many low powered cores, available on both the consumer and industrial market. The original Graphical Processing Units were designed for processing high quality graphical images, for presentation to the screen, and were therefore marketed to the computer games market segment. More recently, frameworks such as CUDA and OpenCL allowed the specialised highly parallel architecture of the Graphical Processing Unit to be used for not just graphical operations, but for general computation. This is known as General Purpose Programming on Graphical Processing Units, and it has attracted interest from the scientific community, looking for ways to exploit this highly parallel environment, which was cheaper and more accessible than the traditional High Performance Computing platforms, such as the supercomputer. This interest in developing algorithms that exploit the parallel architecture of the Graphical Processing Unit has highlighted the need for scientists to be able to analyse proposed algorithms, just as happens for proposed sequential algorithms. In this thesis, we study the abstract modelling of computation on the Graphical Processing Unit, and the application of Graphical Processing Unit-based algorithms in the field of bioinformatics, the field of using computational algorithms to solve biological problems. We show that existing abstract models for analysing parallel algorithms on the Graphical Processing Unit are not able to sufficiently and accurately model all that is required. We propose a new abstract model, called the Abstract Transferring Graphical Processing Unit Model, which is able to provide analysis of Graphical Processing Unit-based algorithms that is more accurate than existing abstract models. It does this by capturing the data transfer between the Central Processing Unit and the Graphical Processing Unit. We demonstrate the accuracy and applicability of our model with several computational problems, showing that our model provides greater accuracy than the existing models, verifying these claims using experiments. We also contribute novel Graphics Processing Unit-base solutions to two bioinformatics problems: DNA sequence alignment, and Protein spectral identification, demonstrating promising levels of improvement against the sequential Central Processing Unit experiments

    An optimal scheduling scheme for tiling in distributed systems

    Full text link

    An analysis of key generation efficiency of RSA cryptosystem in distributed environments

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2005Includes bibliographical references (leaves: 68)Text in English Abstract: Turkish and Englishix, 74 leavesAs the size of the communication through networks and especially through Internet grew, there became a huge need for securing these connections. The symmetric and asymmetric cryptosystems formed a good complementary approach for providing this security. While the asymmetric cryptosystems were a perfect solution for the distribution of the keys used by the communicating parties, they were very slow for the actual encryption and decryption of the data flowing between them. Therefore, the symmetric cryptosystems perfectly filled this space and were used for the encryption and decryption process once the session keys had been exchanged securely. Parallelism is a hot research topic area in many different fields and being used to deal with problems whose solutions take a considerable amount of time. Cryptography is no exception and, computer scientists have discovered that parallelism could certainly be used for making the algorithms for asymmetric cryptosystems go faster and the experimental results have shown a good promise so far. This thesis is based on the parallelization of a famous public-key algorithm, namely RSA

    Uma abordagem de reserva antecipada de recursos em ambientes oportunistas

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2013.A grade computacional é muito utilizada quando se deseja alto desempenho para resolução de problemas que requerem alto poder de processamento. As grades oportunistas, um tipo de grade computacional, possuem o diferencial de utilizar recursos computacionais ociosos de máquinas pessoais para a resolução destes problemas, o que torna esse ambiente mais barato e, consequentemente, mais interessante, principalmente para a comunidade acadêmica. No entanto, nos ambientes oportunistas a disputa por recursos torna-se maior devido à instabilidade e constante uso de seus recursos. Problemas como o excesso de solicitações de alocação de recursos em um mesmo período podem ser recorrentes, o que pode tanto prejudicar o desempenho do sistema quanto tornar o processo de solicitação de recursos trabalhoso para o usuário, uma vez que este terá que repetir o processo até que haja recursos disponíveis para a sua execução. Uma maneira eficiente de resolver tal problema é com a utilização da reserva antecipada de recursos. Este mecanismo permite que o usuário selecione um conjunto de recursos para que sejam utilizados em um período no futuro, considerando oportunisticamente os recursos disponíveis. Diante disso, esta dissertação propôs a utilização do mecanismo de reserva antecipada em um ambiente de grade oportunista. O objetivo foi melhorar a vazão do uso de recursos oportunistas, de modo a oferecer a possibilidade de alocação dos recursos durante um longo período de tempo, e não apenas no momento da solicitação. Estudos de caso foram realizados para ilustrar o comportamento de um ambiente oportunista com a abordagem proposta, bem como para comparar os ambientes que utilizam e não utilizam reserva antecipada. Os resultados mostraram a eficiência e validade da utilização de tal abordagem em um ambiente distribuído.Abstract : Grid computing is widely used when high performance is desired to resolve problems that require high processing power. Opportunistic grids, a type of grid computing system, have the differential to use idle computing resources of personal machines to solving these problems, making the solution cheaper and consequently more interesting, specially for the academic community. However, opportunistic environments have more competition for resources due to instability and constant use of their resources. Problems such as excessive requests of resources allocation can be recurrent and can both degrade the performance of the system and make the process of resource requests harder to user, once the user will have to repeat this process until there are available resources to the execution. An efficient way to solve this problem is through the use of advanced reservation of resources. This mechanism allows user to select a set of resources to be used in the future opportunistically considering available resources. Therefore, this dissertation proposed the use of the advanced reservation mechanism in an opportunistic grid environment. The goal was to improve the use flow of the opportunistic resources in order to offer the resources allocation possibility for a long period of time and not only at the time of request. Case studies were carried out to illustrate the behaviour of an opportunistic environment with the proposed approach, as well as to compare environments that use with others that do not use advanced reservation. The results show the efficiency and validity of the approach in a distributed environment

    ATCOM: Automatically tuned collective communication system for SMP clusters.

    Get PDF
    Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind

    Factores de rendimiento asociados a SPMD

    Get PDF
    Actualmente existen muchas aplicaciones paralelas/distribuidas en las cuales SPMD es el paradigma más usado. Obtener un buen rendimiento en una aplicación paralela de este tipo es uno de los principales desafíos dada la gran cantidad de aplicaciones existentes. Este objetivo no es fácil de resolver ya que existe una gran variedad de configuraciones de hardware, y también la naturaleza de los problemas pueden ser variados así como la forma de implementarlos. En consecuencia, si no se considera adecuadamente la combinación "software/hardware" pueden aparecer problemas inherentes a una aplicación iterativa sin una jerarquía de control definida de acuerdo a este paradigma. En SPMD todos los procesos ejecutan el mismo código pero computan una sección diferente de los datos de entrada. Una solución a un posible problema del rendimiento es proponer una estrategia de balance de carga para homogeneizar el cómputo entre los diferentes procesos. En este trabajo analizamos el benchmark CG con cargas heterogéneas con la finalidad de detectar los posibles problemas de rendimiento en una aplicación real. Un factor que determina el rendimiento en esta aplicación es la cantidad de elementos nonzero contenida en la sección de matriz asignada a cada proceso. Determinamos que es posible definir una estrategia de balance de carga que puede ser implementada de forma dinámica y demostramos experimentalmente que el rendimiento de la aplicación puede mejorarse de forma significativa con dicha estrategia.There currently are many 'parallel/distributed' applications that use the SPMD paradigm. Getting a good performance in a parallel application of this type is a major challenge because of the large number of existing applications. This objective is not easily achieved because there are many hardware configurations possible, and also the nature of the problems can be varied as well as its implementation. Consequently, if not adequately consider the combination 'software/hardware' inherent problems can occur without an iterative application defined control hierarchy according to this paradigm. In SPMD all processes execute the same code but they compute a different section of the input data. In this paper we analyze the benchmark CG with heterogeneous loads in order to detect possible performance problems in a real application. One factor that determines the performance in this application is the number of elements nonzero contained in the array section assigned to each process. We determined that it is possible to define a load balancing strategy, which can be implemented dynamically, and we demonstrate experimentally that the application performance can be significantly improved with this approach.Actualment existeixen moltes aplicacions paral·leles/distribuïdes en les quals SPMD és el paradigma més emprat. Obtenir un bon rendiment en una aplicació paral·lela d'aquest tipus és un dels principals reptes donada la gran quantitat d'aplicacions existents. Aquest objectiu no és fàcil de resoldre donat que existeixen una gran varietat de configuracions de hardware, i també la naturalesa dels problemes pot ser variada així com la forma d'implementar-los. En conseqüència, si no es considera adequadament la combinació "software/hardware" poden aparèixer problemes inherents a una aplicació iterativa sense una jerarquia de control definida d'acord a aquest paradigma. En SPMD tots els processos executen el mateix codi però computen una secció diferent de les dades d'entrada. Una solució a un possible problema de rendiment es proposar una estratègia de balanceig de càrrega per homogeneïtzar el còmput entre els diferents processos. En aquest treball analitzem el benchmark CG amb càrregues heterogènies amb la finalitat de detectar els possibles problemes de rendiment en una aplicació real. Un factor que determina el rendiment en aquesta aplicació és la quantitat d'elements nonzero continguda en la secció de la matriu assignada a cada procés. Es determina que és possible definir una estratègia de balanceig de càrrega que pot ser implementada de forma dinàmica i es demostra de forma experimental que el rendiment de la aplicació pot millorar-se de forma significativa amb aquesta estratègia

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    ATCOM: Automatically Tuned Collective Communication System for SMP Clusters

    Full text link
    corecore