72 research outputs found

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    Control Plane for Embedded DSP

    Get PDF
    This project is sponsored by MITRE Corporation to develop a scalable and reusable control plane architecture for VLSI design. The main goal of this project is to develop a communication platform for a wide range of applications to reduce the development and testing time associated with the design of a interconnect system. Thorough research has been conducted in the area of network-on-chip designs that are suitable for these types of applications. The necessary components are built and verified in hardware description language. The deliverable components are packaged as reusable and parameterized SystemVerilog code

    Cost Effective Routing Implementations for On-chip Networks

    Full text link
    Arquitecturas de múltiples núcleos como multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) actuales se basan en la eficacia de las redes dentro del chip (NoC) para la comunicación entre los diversos núcleos. Un diseño eficiente de red dentro del chip debe ser escalable y al mismo tiempo obtener valores ajustados de área, latencia y consumo de energía. Para diseños de red dentro del chip de propósito general se suele usar topologías de malla 2D ya que se ajustan a la distribución del chip. Sin embargo, la aparición de nuevos retos debe ser abordada por los diseñadores. Una mayor probabilidad de defectos de fabricación, la necesidad de un uso optimizado de los recursos para aumentar el paralelismo a nivel de aplicación o la necesidad de técnicas eficaces de ahorro de energía, puede ocasionar patrones de irregularidad en las topologías. Además, el soporte para comunicación colectiva es una característica buscada para abordar con eficacia las necesidades de comunicación de los protocolos de coherencia de caché. En estas condiciones, un encaminamiento eficiente de los mensajes se convierte en un reto a superar. El objetivo de esta tesis es establecer las bases de una nueva arquitectura para encaminamiento distribuido basado en lógica que es capaz de adaptarse a cualquier topología irregular derivada de una estructura de malla 2D, proporcionando así una cobertura total para cualquier caso resultado de soportar los retos mencionados anteriormente. Para conseguirlo, en primer lugar, se parte desde una base, para luego analizar una evolución de varios mecanismos, y finalmente llegar a una implementación, que abarca varios módulos para alcanzar el objetivo mencionado anteriormente. De hecho, esta última implementación tiene por nombre eLBDR (effective Logic-Based Distributed Routing). Este trabajo cubre desde el primer mecanismo, LBDR, hasta el resto de mecanismos que han surgido progresivamente.Rodrigo Mocholí, S. (2010). Cost Effective Routing Implementations for On-chip Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8962Palanci

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast

    Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters

    Full text link
    Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos. En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones. Entre las tecnologías de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters. De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500). A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece. Junto con el aumento del número de componentes la probabilidad de averías aumenta dramáticamente, y así, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad. Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos. Por lo tanto, son necesarias nuevas estrategias para tolerar fallos. Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA. En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters. El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA. Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos. Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientesMontañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603Palanci

    Self-stabilizing wormhole routing in hypercubes

    Full text link
    Wormhole routing is an efficient technique used to communicate message packets between processors when they are not completely connected. To the best of our knowledge, this is the first attempt at designing a self-stabilizing wormhole routing algorithm for hypercubes. Our first algorithm handles all types of faults except for node/link failures. This algorithm achieves optimality in terms of routing path length by following only the preferred dimensions. In an n-dimensional hypercube, those dimensions in which source and destination address bits differ are called preferred dimensions. Our second algorithm handles topological changes. We propose an efficient scheme of rerouting flits in case of node/link failures. Similar to the first algorithm, this algorithm also tries to follow preferred dimensions if they are nonfaulty at the time of transmitting the flits. However, due to topological faults it is necessary to take non-preferred dimensions resulting in suboptimality of path selection. Formal proof of correctness for both solutions is given. (Abstract shortened by UMI.)

    Analysis of wormhole routings in cayley graphs of permutation groups.

    Get PDF
    Over a decade, a new class of switching technology, called wormhole routing, has been investigated in the multicomputer interconnection network field. Several classes of wormhole routing algorithms have been proposed. Most of the algorithms have been centered on the traditional binary hypercube, k-ary n-cube mesh, and torus networks. In the design of a wormhole routing algorithm, deadlock avoidance scheme is the main concern. Recently, new classes of networks called Cayley graphs of permutation groups are considered very promising alternatives. Although proposed Cayley networks have superior topological properties over the traditional network topologies, the design of the deadlock-free wormhole routing algorithm in these networks is not simple. In this dissertation, we investigate deadlock free wormhole routing algorithms in the several classes of Cayley networks, such as complete-transposition and star networks. We evaluate several classes of routing algorithms on these networks, and compare the performance of each algorithm to the simulation study. Also, the performances of these networks are compared to the traditional networks. Through extensive simulation we found that adaptive algorithm outperformed deterministic algorithm in general with more virtual channels. On the network performance comparison, the complete transposition network showed the best performance among the similar sized networks, and the binary hypercube performed better compared to the star graph
    corecore