167 research outputs found

    The effect of real workloads and stochastic workloads on the performance of allocation and scheduling algorithms in 2D mesh multicomputers

    Get PDF
    The performance of the existing non-contiguous processor allocation strategies has been traditionally carried out by means of simulation based on a stochastic workload model to generate a stream of incoming jobs. To validate the performance of the existing algorithms, there has been a need to evaluate the algorithms' performance based on a real workload trace. In this paper, we evaluate the performance of several well-known processor allocation and job scheduling strategies based on a real workload trace and compare the results against those obtained from using a stochastic workload. Our results reveal that the conclusions reached on the relative performance merits of the allocation strategies when a real workload trace is used are in general compatible with those obtained when a stochastic workload is used

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    Contention and achieved performance in multicomputer wormhole routing networks

    Get PDF

    On the performance of broadcast algorithms in interconnection networks

    Get PDF
    Broadcast Communication is among the most primitive collective capabilities of any message passing network. Broadcast algorithms for the mesh have been widely reported in the literature. However, most existing algorithms have been studied within limited conditions, such as light traffic load and fixed network sizes. In other words, most of these algorithms have not been studied at different Quality of Service (QoS) levels. In contrast, this study examines the broadcast operation, taking into account the scalability, parallelism, a wide range of traffic loads through the propagation of broadcast messages. To the best of our knowledge, this study is the first to consider the issue of broadcast latency at both the network and node levels across different traffic loads. Results are shown from a comparative analysis confirming that the coded-path based broadcast algorithms exhibit superior performance characteristics over some existing algorithms

    An empirical evaluation of techniques for parallel simulation of message passing networks

    Get PDF
    209 p.[EN]In the field of computer design, simulation is an essential tool to validate and evaluate architectural proposals. Conventional simulation techniques, designed for their use in sequential computers, are too slow if the system to simulate is large or complex. The aim of this work is to search for techniques to accelerate simulations exploiting the parallelism available in current, commercial multicomputers, and to use these techniques to study a model of a message router. This router has been designed to constitute the communication infrastructure of a (hypothetical) massively parallel computer. Three parallel simulation techniques have been considered: synchronous, asynchronous-conservative and asynchronous-optimistic. These algorithms have been implemented in three multicomputers: a transputer-based Supernode, an Intel Paragon and a network of workstations. The influence that factors such as the characteristics of the simulated models, the organization of the simulators and the characteristics of the target multicomputers have in the performance of the simulations has been measured and characterized. It is concluded that optimistic parallel simulation techniques are not suitable for the considered kind of models, although they may provide good performance in other environments. A network of workstations is not the right platform for our experiments, because the communication demands of the parallel simulators surpass the abilities of local area networks—the granularity is too fine. Synchronous and conservative parallel simulation techniques perform very well in the Supernode and in the Paragon, specially if the model to simulate is complex or large—precisely the worst case for traditional, sequential simulators. This way, studies previously considered as unrealizable, due to their exceedingly high computational cost, can be performed in reasonable times. Additionally, the spectrum of possibilities of using multicomputers can be broadened to execute more than numeric applications.[ES]En el ĂĄmbito del diseño de computadores, la simulaciĂłn es una herramienta imprescindible para la validaciĂłn y evaluaciĂłn de cualquier propuesta arquitectĂłnica. Las tĂ©nicas convencionales de simulaciĂłn, diseñadas para su utilizaciĂłn en computadores secuenciales, son demasiado lentas si el sistema a simular es grande o complejo. El objetivo de esta tesis es buscar tĂ©cnicas para acelerar estas simulaciones, aprovechando el paralelismo disponible en multicomputadores comerciales, y usar esas tĂ©cnicas para el estudio de un modelo de encaminador de mensajes. Este encaminador estĂĄ diseñado para formar infraestructura de comunicaciones de un hipotĂ©tico computador masivamente paralelo. En este trabajo se consideran tres tĂ©cnicas de simulaciĂłn paralela: sĂ­ncrona, asĂ­ncrona-conservadora y asĂ­ncrona-optimista. Estos algoritmos se han implementado en tres multicomputadores: un Supernode basado en Transputers, un Intel Paragon y una red de estaciones de trabajo. Se caracteriza la influencia que tienen en las prestaciones de los simuladores aspectos tales como los parĂĄmetros del modelo simulado, la organizaciĂłn del simulador y las caracterĂ­sticas del multicomputador utilizado. Se concluye que las tĂ©cnicas de simulaciĂłn paralela optimista no resultan adecuadas para trabajar con el modelo considerado, aunque pueden ofrecer un buen rendimiento en otros entornos. La red de estaciones de trabajo no resulta una plataforma apropiada para estas simulaciones, ya que una red local no reĂșne condiciones para la ejecuciĂłn de aplicaciones paralelas de grano fino. Las tĂ©cnicas de simulaciĂłn paralela sĂ­ncrona y conservadora dan muy buenos resultados en el Supernode y en el Paragon, especialmente si el modelo a simular es complejo o grande—precisamente el peor caso para los algoritmos secuenciales. De esta forma, estudios previamente considerados inviables, por ser demasiado costosos computacionalmente, pueden realizarse en tiempos razonables. AdemĂĄs, se amplĂ­a el espectro de posibilidades de los multicomputadores, utilizĂĄndolos para algo mĂĄs que aplicaciones numĂ©ricas.Este trabajo ha sido parcialmente subvencionado por la ComisiĂłn Interministerial de Ciencia y TecnologĂ­a, bajo contrato TIC95-037

    On the design and implementation of broadcast and global combine operations using the postal model

    Get PDF
    There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

    Visualization of program performance on concurrent computers

    Get PDF
    A distributed memory concurrent computer (such as a hypercube computer) is inherently a complex system involving the collective and simultaneous interaction of many entities engaged in computation and communication activities. Program performance evaluation in concurrent computer systems requires methods and tools for observing, analyzing, and displaying system performance. This dissertation describes a methodology for collecting and displaying, via a unique graphical approach, performance measurement information from (possibly large) concurrent computer systems. Performance data are generated and collected via instrumentation. The data are then reduced via conventional cluster analysis techniques and converted into a pictorial form to highlight important aspects of program states during execution. Local and summary statistics are calculated. Included in the suite of defined metrics are measures for quantifying and comparing amounts of computation and communication. A novel kind of data plot is introduced to visually display both temporal and spatial information describing system activity. Phenomena such as hot spots of activity are easily observed, and in some cases, patterns inherent in the application algorithms being studied are highly visible. The approach also provides a framework for a visual solution to the problem of mapping a given parallel algorithm to an underlying parallel machine. A prototype implementation applied to several case studies is presented to demonstrate the feasibility and power of the approach
