16 research outputs found

    Communication Architectures for Parallel-Programming Systems

    Get PDF
    Bal, H.E. [Promotor]Tanenbaum, A.S. [Promotor

    IT-Management in der Praxis. Seminar WS 2004/05

    Get PDF

    Langley Aerospace Research Summer Scholars

    Get PDF
    The Langley Aerospace Research Summer Scholars (LARSS) Program was established by Dr. Samuel E. Massenberg in 1986. The program has increased from 20 participants in 1986 to 114 participants in 1995. The program is LaRC-unique and is administered by Hampton University. The program was established for the benefit of undergraduate juniors and seniors and first-year graduate students who are pursuing degrees in aeronautical engineering, mechanical engineering, electrical engineering, material science, computer science, atmospheric science, astrophysics, physics, and chemistry. Two primary elements of the LARSS Program are: (1) a research project to be completed by each participant under the supervision of a researcher who will assume the role of a mentor for the summer, and (2) technical lectures by prominent engineers and scientists. Additional elements of this program include tours of LARC wind tunnels, computational facilities, and laboratories. Library and computer facilities will be available for use by the participants

    Efficient Passive Clustering and Gateways selection MANETs

    Get PDF
    Passive clustering does not employ control packets to collect topological information in ad hoc networks. In our proposal, we avoid making frequent changes in cluster architecture due to repeated election and re-election of cluster heads and gateways. Our primary objective has been to make Passive Clustering more practical by employing optimal number of gateways and reduce the number of rebroadcast packets

    Communication Architectures for Scalable GPU-centric Computing Systems

    Get PDF
    In recent years, power consumption has become the main concern in High Performance Computing (HPC). This has lead to heterogeneous computing systems in which Central Processing Units (CPUs) are supported by accelerators, such as Graphics Processing Units (GPUs). While GPUs used to be seen as slave devices to which the main processor offloads computation, today’s systems tend to deploy more GPUs than CPUs. Eventually, the GPU will become a first-class processor, bearing increasing responsibilities. Promoting the GPU to a first-class processor comes with many challenges, such as progress guarantees, dynamic memory management, and scheduling. However, one of the main challenges is the GPU’s inability to orchestrate communication, which is currently entirely handled by the CPU. This work addresses that issue and presents solutions to allow GPUs to source and sink network traffic independently. Many important aspects are addressed, ranging from the application level to how networking hardware is accessed. First, important and large scale exascale applications are studied to further understand their communication behavior and applications’ requirements. Several metrics are presented, including time spent for communication, message sizes, and the length of queues that are required to match messages with receive requests. One aspect the analysis revealed is that messages are becoming smaller at scale, which renders the matching of messages and receive requests an important problem to address. The next part analyzes how the GPU can directly access the network with various communication models being presented and benchmarked. It is shown that a flat address space of distributed GPU memories shows superior bandwidth than put/get communication or CPU-controlled message passing, but less communication can be overlapped with computation. Overall, GPU-controlled communication is always superior, both in terms of time-to-solution and energy spending. The final part addresses communication management on GPUs, which is required to provide high-level communication abstractions. Besides other fundamental building blocks, an algorithm for the message matching is presented that yields similar performance as CPUs. However, it is also shown that the messaging protocol can be relaxed to improve performance significantly, leveraging the massive amount of parallelism provided by the GPU’s architecture

    Distributed Simulation of High-Level Algebraic Petri Nets

    Get PDF
    In the field of Petri nets, simulation is an essential tool to validate and evaluate models. Conventional simulation techniques, designed for their use in sequential computers, are too slow if the system to simulate is large or complex. The aim of this work is to search for techniques to accelerate simulations exploiting the parallelism available in current, commercial multicomputers, and to use these techniques to study a class of Petri nets called high-level algebraic nets. These nets exploit the rich theory of algebraic specifications for high-level Petri nets: Petri nets gain a great deal of modelling power by representing dynamically changing items as structured tokens whereas algebraic specifications turned out to be an adequate and flexible instrument for handling structured items. In this work we focus on ECATNets (Extended Concurrent Algebraic Term Nets) whose most distinctive feature is their semantics which is defined in terms of rewriting logic. Nevertheless, ECATNets have two drawbacks: the occultation of the aspect of time and a bad exploitation of the parallelism inherent in the models. Three distributed simulation techniques have been considered: asynchronous conservative, asynchronous optimistic and synchronous. These algorithms have been implemented in a multicomputer environment: a network of workstations. The influence that factors such as the characteristics of the simulated models, the organisation of the simulators and the characteristics of the target multicomputer have in the performance of the simulations have been measured and characterised. It is concluded that synchronous distributed simulation techniques are not suitable for the considered kind of models, although they may provide good performance in other environments. Conservative and optimistic distributed simulation techniques perform well, specially if the model to simulate is complex or large - precisely the worst case for traditional, sequential simulators. This way, studies previously considered as unrealisable, due to their exceedingly high computational cost, can be performed in reasonable times. Additionally, the spectrum of possibilities of using multicomputers can be broadened to execute more than numeric applications

    ClusterRAID: Architecture and Prototype of a Distributed Fault-Tolerant Mass Storage System for Clusters

    Get PDF
    During the past few years clusters built from commodity off-the-shelf (COTS) components have emerged as the predominant supercomputer architecture. Typically comprising a collection of standard PCs or workstations and an interconnection network, they have replaced the traditionally used integrated systems due to their better price/performance ratio. As paradigms shift from mere computing intensive to I/O intensive applications, mass storage solutions for cluster installations become a more and more crucial aspect of these systems. The inherent unreliability of the underlying components is one of the reasons why no system has been established as a standard storage solution for clusters yet. This thesis sets out the architecture and prototype implementation of a novel distributed mass storage system for commodity off-the-shelf clusters and addresses the issue of the unreliable constituent components. The key concept of the presented system is the conversion of the local hard disk drive of a cluster node into a reliable device while preserving the block device interface. By the deployment of sophisticated erasure-correcting codes, the system allows the adjustment of the number of tolerable failures and thus the overall reliability. In addition, the applied data layout considers the access behaviour of a broad range of applications and minimizes the number of required network transactions. Extensive measurements and functionality tests of the prototype, both stand-alone and in conjunction with local or distributed file systems, show the validity of the concept

    Massivel y parallel declarative computational models

    Get PDF
    Current computer archictectures are parallel, with an increasing number of processors. Parallel programming is an error-prone task and declarative models such as those based on constraints relieve the programmer from some of its difficult aspects, because they abstract control away. In this work we study and develop techniques for declarative computational models based on constraints using GPI, aiming at large scale parallel execution. The main contributions of this work are: A GPI implementation of a scalable dynamic load balancing scheme based on work stealing, suitable for tree shaped computations and effective for systems with thousands of threads. A parallel constraint solver, MaCS, implemented to take advantage of the GPI programming model. Experimental evaluation shows very good scalability results on systems with hundreds of cores. A GPI parallel version of the Adaptive Search algorithm, including different variants. The study on different problems advances the understanding of scalability issues known to exist with large numbers of cores; ### SUMÁRIO: Actualmente as arquitecturas de computadores são paralelas, com um crescente número de processadores. A programação paralela é uma tarefa propensa a erros e modelos declarativos baseados em restrições aliviam o programador de aspectos difíceis dado que abstraem o controlo. Neste trabalho estudamos e desenvolvemos técnicas para modelos de computação declarativos baseados em restrições usando o GPI, uma ferramenta e modelo de programação recente. O Objectivo é a execução paralela em larga escala. As contribuições deste trabalho são as seguintes: a implementação de um esquema dinâmico para balanceamento da computação baseado no GPI. O esquema é adequado para computações em árvores e efectiva em sistemas compostos por milhares de unidades de computação. Uma abordagem à resolução paralela de restrições denominadas de MaCS, que tira partido do modelo de programação do GPI. A Avaliação experimental revelou boa escalabilidade num sistema com centenas de processadores. Uma versão paralela do algoritmo Adaptive Search baseada no GPI, que inclui diferentes variantes. O estudo de diversos problemas aumenta a compreensão de aspectos relacionados com a escalabilidade e presentes na execução deste tipo de algoritmos num grande número de processadores

    East Lancashire Research 2008

    Get PDF
    East Lancashire Research 200
    corecore