9 research outputs found

    Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures

    Full text link
    Abstract鈥擳he growing system size of high performance com-puters results in a steady decrease of the mean time between failures. Exchanging network components often requires whole system downtime which increases the cost of failures. In this work, we study a fail-in-place strategy where broken network elements remain untouched. We show, that a fail-in-place strategy is feasible for todays networks and the degradation is manageable, and provide guidelines for the design. Our network failure simulation toolchain allows system designers to extrapolate the performance degradation based on expected failure rates, and it can be used to evaluate the current state of a system. In a case study of real-world HPC systems, we will analyze the performance degradation throughout the systems lifetime under the assumption that faulty network components are not repaired, which results in a recommendation to change the used routing algorithm to improve the network performance as well as the fail-in-place characteristic. Keywords鈥擭etwork design, network simulations, network man-agement, fail-in-place, routing protocols, fault tolerance, availability I

    Networks on Chips: Structure and Design Methodologies

    Get PDF

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Designing Customizable Network-on-Chip with support for Embedded Private Memory for Multi-Processor System-on-Chips

    Get PDF
    The computer industry\u27s transition to multiprocessor systems on chip (MPSoC) architectures is increasing the need for new scalable high-bandwidth on-chip communication backbones. Network-on-Chip (NoC) interconnects are gaining interest for serving as the on-chip communication infrastructure. The most important issues to be considered in designing a NoC are topology, routing algorithm, flow control, and buffering and also the trade-offs between performance, power, and area. This research proposes a custom-designed NoC specifically for MPSoCs on FPGAs. The proposed design allows the communication infrastructure to seamlessly scale as the numbers of processors within the chip increases. The design adds a new level of abstraction to remote-access transactions. The design also considers support for the partitioned global address space model with support for optional embedded local memories embedded in the network interface. The network was designed as a mesh topology to allow a reasonable communication capacity in 2-Dimensional space. The communication protocol between source and destination is AMBA AXI4, and the communication between each two adjacent nodes, is typical AXI type valid/ready handshake. The nodes are distinguished by their user specified address range. Each node is assigned a range of addresses, and in each transaction, based on the destination address, the routers decide the the next node, until the transaction reaches the destination. The design has been implemented on a Xilinx Virtex7 FPGA. However, there is no platform dependency to any brand or any model of FPGAs. %In the first chapter in this research, we give an introduction of the work. In chapter 2, we talk about the background of MPSoCs and interconnections. We discuss the AXI protocl, and then we specifically talk about different Network-on-Chip projects. In chapter 3, we describe the design details for different component an also the high level design of the system, we also, discuss the implementation details of the design. In chapter 4, we show the experimental results for both verification phase and the analysis of the system. Finally, chapter 5 concludes the research

    Arquitectura asim茅trica multicore con procesador de Petri

    Get PDF
    Se ha determinado, en una arquitectura multi-Core SMP, el lugar donde incorporar el PP o el HPP sin alterar el ISA del resto de los core. Se ha obtenido una familia de procesadores que ejecutan los algoritmos de Petri para dar soluci贸n a sistemas reactivos y concurrentes, con una s贸lida verificaci贸n formal que permite la programaci贸n directa de los procesadores. Para esto, se ha construido el hardware de un PP y un HPP, con un IP-Core en una FPGA, integrado a un sistema multi-Core SMP, que ejecuta distintos tipo de RdP. Esta familia de procesadores es configurable en distintos aspectos: - Tama帽o del procesador (cantidad de plazas y transiciones). - Procesadores con tiempo y procesadores temporales. - Arquitectura heterog茅nea, que permite distribuir los recursos empleados para instanciar el procesador seg煤n se requiera, y obtener un ahorro sustancial. - La posibilidad de configurar el procesador en pos de obtener los requerimientos y minimizar los recursos. Muy valorado en la construcci贸n de sistemas embebidos. En los sistemas con alta necesidad de concurrencia y sincronizaci贸n, donde se ha evaluado este procesador, las prestaciones han mostrado una importante mejora en el desempe帽o. El procesador tiene la capacidad de resolver simult谩neamente, por conjuntos m煤ltiples disparos, lo que disminuye los tiempos de consulta y decisi贸n, adem谩s los programas ejecutados cumplen con los formalismos de las RdP extendidas y sincronizadas, y los resultados de su ejecuci贸n son determin铆sticos. Los tiempos de respuesta para determinar una sincronizaci贸n son de dos ciclos por consulta (entre la solicitud de un disparo y la respuesta).Facultad de Inform谩tic
    corecore