12 research outputs found

    Effects of injection pressure on network throughput

    Get PDF
    漏2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Recent parallel systems use multiple injection ports and various injection policies, but little is known about their impact on network performance. This paper evaluates the influence that these injection interfaces have on maximum sustained throughput in adaptive cut-through torus networks by modeling the number of injection queues (1 or 4), and the allocation of new packets to those queues. Network evaluations for medium to large size 2D tori show that designs with multiple injection ports do not improve performance under uniform traffic. On the contrary, they result in more pressure from the injection interface to acquire the scarce network resources of an already clogged system. Interestingly, for small networks, a single injection FIFO queue, with the HOLB it entails, indirectly provides the much needed injection control. For networks with thousands of nodes and multiple injection channels, as those being implemented in current massively parallel processors, this implicit form of congestion control is not enough. In such systems, restrictive injection policies are required to prevent routers from being flooded with new packets for loads beyond saturation.C. Izu, J. Miguel-Alonso, J.A. Gregori

    Performance evaluation of distributed crossbar switch hypermesh

    Get PDF
    The interconnection network is one of the most crucial components in any multicomputer as it greatly influences the overall system performance. Several recent studies have suggested that hypergraph networks, such as the Distributed Crossbar Switch Hypermesh (DCSH), exhibit superior topological and performance characteristics over many traditional graph networks, e.g. k-ary n-cubes. Previous work on the DCSH has focused on issues related to implementation and performance comparisons with existing networks. These comparisons have so far been confined to deterministic routing and unicast (one-to-one) communication. Using analytical models validated through simulation experiments, this thesis extends that analysis to include adaptive routing and broadcast communication. The study concentrates on wormhole switching, which has been widely adopted in practical multicomputers, thanks to its low buffering requirement and the reduced dependence of latency on distance under low traffic. Adaptive routing has recently been proposed as a means of improving network performance, but while the comparative evaluation of adaptive and deterministic routing has been widely reported in the literature, the focus has been on graph networks. The first part of this thesis deals with adaptive routing, developing an analytical model to measure latency in the DCSH, and which is used throughout the rest of the work for performance comparisons. Also, an investigation of different routing algorithms in this network is presented. Conventional k-ary n-cubes have been the underlying topology of contemporary multicomputers, but it is only recently that adaptive routing has been incorporated into such systems. The thesis studies the relative performance merits of the DCSH and k-ary n-cubes under adaptive routing strategy. The analysis takes into consideration real-world factors, such as router complexity and bandwidth constraints imposed by implementation technology. However, in any network, the routing of unicast messages is not the only factor in traffic control. In many situations (for example, parallel iterative algorithms, memory update and invalidation procedures in shared memory systems, global notification of network errors), there is a significant requirement for broadcast traffic. The DCSH, by virtue of its use of hypergraph links, can implement broadcast operations particularly efficiently. The second part of the thesis examines how the DCSH and k-ary n-cube performance is affected by the presence of a broadcast traffic component. In general, these studies demonstrate that because of their relatively high diameter, k-ary n-cubes perform poorly when message lengths are short. This is consistent with earlier more simplistic analyses which led to the proposal for the express-cube, an enhancement of the basic k-ary n-cube structure, which provides additional express channels, allowing messages to bypass groups of nodes along their paths. The final part of the thesis investigates whether this "partial bypassing" can compete with the "total bypassing" capability provided inherently by the DCSH topology

    High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports

    Full text link
    Las redes dentro de un chip se est谩n convirtiendo en el elemento principal de los sistemas multiprocesador. A medida que aumenta la escala de integraci贸n, m谩s elementos de c贸mputo (procesadores) se incluyen en el mismo chip. Estos componentes se interconectan con una red dentro del chip que debe ofrecer latencias de transmisi贸n ultra bajas (orden de nanosegundos) y anchos de banda elevados. El dise帽o, pues, de una red eficiente dentro del chip juega un papel fundamental. En la presente tesis se analizan diferentes alternativas de dise帽o de las redes en el chip. En particular, se hace uso de la posibilidad de utilizar diferentes puertos de inyecci贸n desde los procesadores con el fin de obtener diferentes mejoras. En primer lugar, las prestaciones aumentan al tener procesadores con distintas alternativas de inyecci贸n de tr谩fico. En segundo lugar, adem谩s aumenta la tolerancia a fallos frente a defectos de fabricaci贸n (mas importantes conforme avanza la tecnolog铆a). Y en tercer lugar, permite una pol铆tica de apagado de componentes m谩s agresiva que nos permita un ahorro significativo de energ铆a. Hemos evaluado diferentes topolog铆as derivadas del mecanismo de inyecci贸n en t茅rminos de prestaciones, coste de implementaci贸n, y ahorro de consumo. Adem谩s, hemos desarrollado simuladores espec铆ficos para las distintas t茅cnicas utilizadas. Cada topolog铆a dise帽ada supone una mejora respecto a la anterior, y por supuesto, teniendo en cuenta las topolog铆as existentes. En resumen, nuestro esfuerzo se centra en conseguir un excelente compromiso entre prestaciones, consumo y tolerancia a fallos dentro de una red en chip. Para la primera propuesta (topolog铆a NR-Mesh), se alcanzan mejoras en prestaciones de un 7\% y hasta de un 75\% en reducci贸n de consumo de media, comparado con la malla 2D o malla de 2 dimensiones. Para la siguiente propuesta, la malla concentrada paralela (PC-Mesh), el beneficio en prestaciones que se obtiene es de hasta un 20\%, as铆 c贸mo de un 60\% en reducci贸n deCamacho Villanueva, J. (2012). High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/18235Palanci

    DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS

    Full text link
    El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el inter茅s y esfuerzo de la comunidad investigadora en desarrollar nuevas t茅cnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas t茅cnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando as铆 la productividad de la red. Sin embargo, una alta tasa de utilizaci贸n de la red podr铆a conllevar el que se conoce como "congesti贸n de red", el cual puede causar una degradaci贸n del rendimiento. El control de la congesti贸n en redes multietapa es un problema importante que no est谩 completamente resuelto. Con el fin de evitar la degradaci贸n del rendimiento de la red cuando aparece congesti贸n, se han propuesto diferentes mecanismos para el control de la congesti贸n. Muchos de estos mecanismos est谩n basados en notificaci贸n expl铆cita de la congesti贸n. Para este prop贸sito, los switches detectan congesti贸n y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyecci贸n de paquetes. El prop贸sito de esta tesis es analizar las diferentes estrat茅gias de detecci贸n y correcci贸n de la congesti贸n en redes multietapa, y proponer nuevos mecanismos de control de la congesti贸n encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas est谩n basadas en una estrategia m谩s refinada de marcaje de paquetes en combinaci贸n con un conjunto de acciones correctivas justas que har谩n al mecanismo capaz de controlar la congesti贸n de manera efectiva con independencia del grado de congesti贸n y de las condiciones de tr谩fico.Ferrer P茅rez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/18197Palanci

    Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-cube Systems

    No full text
    This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. However, most of these results are based on infinite message consumption capacity leading to unrealistic design guidelines. This paper indicates that the advantages associated with these schemes can not be realized with limited consumption capacity. To alleviate such performance bottleneck, a new solution using multiple consumption channels is proposed. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual networks have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual c..

    System level modelling and design of hypergraph based wireless system area networks for multi-computer systems

    Get PDF
    This thesis deals with issues pertaining the wireless multicomputer interconnection networks namely topology and Medium Access Control (MAC). It argues that new channel assignment technique based on regular low-dimensional hypergraph networks, the dual radio wireless hypermesh, represents a promising alternative high-performance wireless interconnection network for the future multicomputers to shared communication medium networks and/or ordinary wireless mesh networks, which have been widely used in current wireless networks. The focus of this work is on improving the network throughput while maintaining a relatively low latency of a wireless network system. By means of a Carrier Sense Multiple Access (CSMA) based design of the MAC protocol and based on the desirable features of hypermesh network topology a relatively high performance network has been introduced. Compared to the CSMA shared communication channel model, which is currently the de facto MAC protocol for most of wireless networks, our design is shown to achieve a significant increase in network throughput with less average network latency for large number of communication nodes. SystemC model of the proposed wireless hypermesh, validated through mathematical models, are then introduced. The analysis has been incorporated in the proper SystemC design methodology which facilitates the integration of communication modelling into the design modelling at the early stages of the system development. Another important application of SystemC modelling techniques is to perform meaningful comparative studies of different protocols, or new implementations to determine which communication scenario performs better and the ability to modify models to test system sensitivity and tune performance. Effects of different design parameters (e.g., packet sizes, number of nodes) has been carried out throughout this work. The results shows that the proposed structure has out perform the existing shared medium network structure and it can support relatively high number of wireless connected computers than conventional networks

    PRODUCTIVELY SCALING HARDWARE DESIGNS OVER INCREASING RESOURCES USING A SYSTEMATIC DESIGN ANALYSIS APPROACH

    Get PDF
    As processor development shifts from strict single core frequency scaling to het- erogeneous resource scaling two important considerations require evaluation. First, how to design systems with an increasing amount of heterogeneous resources, and second, how to maintain a designer鈥檚 productivity as the number of possible con- figurations grows. Therefore, it is necessary to determine what useful information can be gathered from existing designs to help predict or identify a design鈥檚 potential scalability, as well as, identifying which routine tasks can be automated to improve a designer鈥檚 productivity. Moreover, once this information is collected, how can this information be conveyed to the designer such that it can be used to increase overall productivity when implementing the design over increasing amounts of resources? This research looks at various approaches to analyze designs and attempts to distribute an application efficiently across a heterogeneous cluster of computing re- sources through the use of a Systematic Design Analysis flow and an assortment of productivity tools. These tools provide the designer with projections on the amount of resources needed to scale an existing design to a specified performance, as well as, projecting the performance based on a specified amount of resources. This is accomplished through the combination of static HDL profiling, component synthesis resource utilization, and runtime performance monitoring. For evaluation, four case studies are presented to demonstrate the proposed flow鈥檚 scalability on a small scale cluster of FPGAs. The results are highly favorable, providing orders of magnitude speedup with minimal intervention from the designer
    corecore