50 research outputs found
An empirical evaluation of techniques for parallel simulation of message passing networks
209 p.[EN]In the field of computer design, simulation is an essential tool to validate and evaluate architectural proposals. Conventional simulation techniques, designed for their use in sequential computers, are too slow if the system to simulate is large or complex. The aim of this work is to search for techniques to accelerate simulations exploiting the parallelism available in current, commercial multicomputers, and to use these techniques to study a model of a message router. This router has been designed to constitute the communication infrastructure of a (hypothetical) massively parallel computer.
Three parallel simulation techniques have been considered: synchronous, asynchronous-conservative and asynchronous-optimistic. These algorithms have been implemented in three multicomputers: a transputer-based Supernode, an Intel Paragon and a network of workstations. The influence that factors such as the characteristics of the simulated models, the organization of the simulators and the characteristics of the target multicomputers have in the performance of the simulations has been measured and characterized.
It is concluded that optimistic parallel simulation techniques are not suitable for the considered kind of models, although they may provide good performance in other environments. A network of workstations is not the right platform for our experiments, because the communication demands of the parallel simulators surpass the abilities of local area networks—the granularity is too fine. Synchronous and conservative parallel simulation techniques perform very well in the Supernode and in the Paragon, specially if the model to simulate is complex or large—precisely the worst case for traditional, sequential simulators. This way, studies previously considered as unrealizable, due to their exceedingly high computational cost, can be performed in reasonable times. Additionally, the spectrum of possibilities of using multicomputers can be broadened to execute more than numeric applications.[ES]En el ámbito del diseño de computadores, la simulación es una herramienta imprescindible para la validación y evaluación de cualquier propuesta arquitectónica. Las ténicas convencionales de simulación, diseñadas para su utilización en computadores secuenciales, son demasiado lentas si el sistema a simular es grande o complejo. El objetivo de esta tesis es buscar técnicas para acelerar estas simulaciones, aprovechando el paralelismo disponible en multicomputadores comerciales, y usar esas técnicas para el estudio de un modelo de encaminador de mensajes. Este encaminador está diseñado para formar infraestructura de comunicaciones de un hipotético computador masivamente paralelo.
En este trabajo se consideran tres técnicas de simulación paralela: síncrona, asíncrona-conservadora y asíncrona-optimista. Estos algoritmos se han implementado en tres multicomputadores: un Supernode basado en Transputers, un Intel Paragon y una red de estaciones de trabajo. Se caracteriza la influencia que tienen en las prestaciones de los simuladores aspectos tales como los parámetros del modelo simulado, la organización del simulador y las características del multicomputador utilizado.
Se concluye que las técnicas de simulación paralela optimista no resultan adecuadas para trabajar con el modelo considerado, aunque pueden ofrecer un buen rendimiento en otros entornos. La red de estaciones de trabajo no resulta una plataforma apropiada para estas simulaciones, ya que una red local no reúne condiciones para la ejecución de aplicaciones paralelas de grano fino. Las técnicas de simulación paralela síncrona y conservadora dan muy buenos resultados en el Supernode y en el Paragon, especialmente si el modelo a simular es complejo o grande—precisamente el peor caso para los algoritmos secuenciales. De esta forma, estudios previamente considerados inviables, por ser demasiado costosos computacionalmente, pueden realizarse en tiempos razonables. Además, se amplía el espectro de posibilidades de los multicomputadores, utilizándolos para algo más que aplicaciones numéricas.Este trabajo ha sido parcialmente subvencionado por la Comisión Interministerial de Ciencia y Tecnología, bajo contrato TIC95-037
Improving the Scalability of High Performance Computer Systems
Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Network control for a multi-user transputer-based system.
A dissertation submitted to the Faculty of Engineering, University of the
Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of
Master of Science in EngineeringThe MC2/64 system is a configureable multi-user transputer- based system which was
designed using a modular approach. The MC2/64 consists of MC2 Clusters which are
connected using a modified Clos network. The MC2 Clusters were designed and
realised as completely configurable modules using and extending an algorithm based on
Eulerian cycles through a requested graph. This dissertation discusses the configuration
algorithm and the extensions made to the algorithm for the MC2 Clusters.
The total MC2/64 system is not completely configurable as a MC2 Cluster releases only
a limited number of links for inter-cluster connections. This dissertation analyses the
configurability of MC2/64, but also presents algorithms which enhance the usability of
the system from the user's point of view.
The design and the implementation of the network control software are also submitted
as topics in this dissertation. The network control software must allow multiple users to
use the system, but without them influencing each other's transputer domains.
This dissertation therefore seeks to give an overview of network control problems and
the solutions implemented in current MC2/64 systems. The results of the research
done for this dissertation will hopefully aid in the design of future MC2 systems which
will provide South Africa with much needed, low cost, high performance computing
power.Andrew Chakane 201
Simple and Efficient Local Codes for Distributed Stable Network Construction
In this work, we study protocols so that populations of distributed processes
can construct networks. In order to highlight the basic principles of
distributed network construction we keep the model minimal in all respects. In
particular, we assume finite-state processes that all begin from the same
initial state and all execute the same protocol (i.e. the system is
homogeneous). Moreover, we assume pairwise interactions between the processes
that are scheduled by an adversary. The only constraint on the adversary
scheduler is that it must be fair. In order to allow processes to construct
networks, we let them activate and deactivate their pairwise connections. When
two processes interact, the protocol takes as input the states of the processes
and the state of the their connection and updates all of them. Initially all
connections are inactive and the goal is for the processes, after interacting
and activating/deactivating connections for a while, to end up with a desired
stable network. We give protocols (optimal in some cases) and lower bounds for
several basic network construction problems such as spanning line, spanning
ring, spanning star, and regular network. We provide proofs of correctness for
all of our protocols and analyze the expected time to convergence of most of
them under a uniform random scheduler that selects the next pair of interacting
processes uniformly at random from all such pairs. Finally, we prove several
universality results by presenting generic protocols that are capable of
simulating a Turing Machine (TM) and exploiting it in order to construct a
large class of networks.Comment: 43 pages, 7 figure