232 research outputs found

    An empirical evaluation of techniques for parallel simulation of message passing networks

    Get PDF
    209 p.[EN]In the field of computer design, simulation is an essential tool to validate and evaluate architectural proposals. Conventional simulation techniques, designed for their use in sequential computers, are too slow if the system to simulate is large or complex. The aim of this work is to search for techniques to accelerate simulations exploiting the parallelism available in current, commercial multicomputers, and to use these techniques to study a model of a message router. This router has been designed to constitute the communication infrastructure of a (hypothetical) massively parallel computer. Three parallel simulation techniques have been considered: synchronous, asynchronous-conservative and asynchronous-optimistic. These algorithms have been implemented in three multicomputers: a transputer-based Supernode, an Intel Paragon and a network of workstations. The influence that factors such as the characteristics of the simulated models, the organization of the simulators and the characteristics of the target multicomputers have in the performance of the simulations has been measured and characterized. It is concluded that optimistic parallel simulation techniques are not suitable for the considered kind of models, although they may provide good performance in other environments. A network of workstations is not the right platform for our experiments, because the communication demands of the parallel simulators surpass the abilities of local area networks—the granularity is too fine. Synchronous and conservative parallel simulation techniques perform very well in the Supernode and in the Paragon, specially if the model to simulate is complex or large—precisely the worst case for traditional, sequential simulators. This way, studies previously considered as unrealizable, due to their exceedingly high computational cost, can be performed in reasonable times. Additionally, the spectrum of possibilities of using multicomputers can be broadened to execute more than numeric applications.[ES]En el ámbito del diseño de computadores, la simulación es una herramienta imprescindible para la validación y evaluación de cualquier propuesta arquitectónica. Las ténicas convencionales de simulación, diseñadas para su utilización en computadores secuenciales, son demasiado lentas si el sistema a simular es grande o complejo. El objetivo de esta tesis es buscar técnicas para acelerar estas simulaciones, aprovechando el paralelismo disponible en multicomputadores comerciales, y usar esas técnicas para el estudio de un modelo de encaminador de mensajes. Este encaminador está diseñado para formar infraestructura de comunicaciones de un hipotético computador masivamente paralelo. En este trabajo se consideran tres técnicas de simulación paralela: síncrona, asíncrona-conservadora y asíncrona-optimista. Estos algoritmos se han implementado en tres multicomputadores: un Supernode basado en Transputers, un Intel Paragon y una red de estaciones de trabajo. Se caracteriza la influencia que tienen en las prestaciones de los simuladores aspectos tales como los parámetros del modelo simulado, la organización del simulador y las características del multicomputador utilizado. Se concluye que las técnicas de simulación paralela optimista no resultan adecuadas para trabajar con el modelo considerado, aunque pueden ofrecer un buen rendimiento en otros entornos. La red de estaciones de trabajo no resulta una plataforma apropiada para estas simulaciones, ya que una red local no reúne condiciones para la ejecución de aplicaciones paralelas de grano fino. Las técnicas de simulación paralela síncrona y conservadora dan muy buenos resultados en el Supernode y en el Paragon, especialmente si el modelo a simular es complejo o grande—precisamente el peor caso para los algoritmos secuenciales. De esta forma, estudios previamente considerados inviables, por ser demasiado costosos computacionalmente, pueden realizarse en tiempos razonables. Además, se amplía el espectro de posibilidades de los multicomputadores, utilizándolos para algo más que aplicaciones numéricas.Este trabajo ha sido parcialmente subvencionado por la Comisión Interministerial de Ciencia y Tecnología, bajo contrato TIC95-037

    Performance visualization for parallel programs: task-based, object-oriented approach

    Get PDF
    Developing and analyzing the performance of concurrent programs on distributed memory concurrent systems is normally a challenging task. Recently, performance visualization gains its importance as a critical tool for programmers. Programmers can have an insight into the development of parallel programs through a performance visualization. Most of the visualization tool to date have been developed for ad-hoc environments in hardware and software, and therefore its lifetime is limited. Since, however, new architectures keep emerging and application domains for distributed memory concurrent computer systems keep growing, the visualization tool should be flexible enough to accommodate unknown future demands of users (eg. new performance perspectives, application-specific views and disparate trace record formats);The Concurrent Object-Oriented ParaGraph (COOPG) is a prototype, general-purpose performance visualization package developed using an object-oriented approach. An object-oriented approach, both in design and implementation, provides a mechanism to build a simple, flexible, effective, and extensible performance visualization tool. The salient features of the COOPG include its flexible adaptability to disparate trace record formats and the incremental extensibility for incorporating user\u27s special-purpose views

    Efficient processor management strategies for multicomputer systems

    Get PDF
    Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed

    Static allocation of computation to processors in multicomputers

    Get PDF

    Execution models for mapping programs onto distributed memory parallel computers

    Get PDF
    The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program

    High Performance Air Quality Simulation in the European CrossGrid Project

    Get PDF
    This paper focuses on one of the applications involved into the CrossGrid project, the STEM-II air pollution model used to simulate the environment of As Pontes Power Plant in A Coruna (Spain). The CrossGrid project offers us a Grid environment oriented towards computation- and data-intensive applications that need interaction with an external user. The air pollution model needs the interaction of an expert in order to make decisions about modifications in the industrial process to fulfil the European standard on emissions and air quality. The benefits of using different CrossGrid components for running the application on a Grid infrastructure are shown in this paper, and some preliminary results on the CrossGrid testbed are displayed

    Reactive-Process Programming and Distributed Discrete-Event Simulation

    Get PDF
    The same forces that spurred the development of multicomputers - the demand for better performance and economy - are driving the evolution of multicomputers in the direction of more abundant and less expensive computing nodes - the direction of fine-grain multicomputers. This evolution in multicomputer architecture derives from advances in integrated circuit, packaging, and message-routing technologies, and carries far-reaching implications in programming and applications. This thesis pursues that trend with a balanced treatment of multicomputer programming and applications. First, a reactive- process programming system - Reactive-C - is investigated; then, a model application - discreteevent simulation - is developed; finally, a number of logic-circuit simulators written in the Reactive-C notation are evaluated. One difficulty in multicomputer applications is the inefficiency of many distributed algorithms compared to their sequential counterparts. When better formulations are developed, they often scale poorly with increasing numbers of nodes, and their beneficial effects eventually vanish when many nodes are used. However, rules for programming are quite different when nodes are plentiful and cheap: The primary concern is to utilize all of the concurrency available in an application, rather than to utilize all of the computing cycles available in a machine. We have shown in our research that it is possible to extract the maximum concurrency of a simulation subject, even one as difficult as a logic circuit, when one simulation element is assigned to each node. Despite the initial inefficiency of a straightforward algorithm, as the the number of nodes increases, the computation time decreases linearly until there are only a few elements in each node. We conclude by suggesting a, technique to further increase the available concurrency when there are many more nodes than simulation elements
    • …
    corecore