135 research outputs found

    Deadlock-free routing in a faulty hypercube

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 41-42).by Eric Lehman.M.S

    Compact Oblivious Routing

    Get PDF
    Oblivious routing is an attractive paradigm for large distributed systems in which centralized control and frequent reconfigurations are infeasible or undesired (e.g., costly). Over the last almost 20 years, much progress has been made in devising oblivious routing schemes that guarantee close to optimal load and also algorithms for constructing such schemes efficiently have been designed. However, a common drawback of existing oblivious routing schemes is that they are not compact: they require large routing tables (of polynomial size), which does not scale. This paper presents the first oblivious routing scheme which guarantees close to optimal load and is compact at the same time - requiring routing tables of polylogarithmic size. Our algorithm maintains the polylogarithmic competitive ratio of existing algorithms, and is hence particularly well-suited for emerging large-scale networks

    The Effect Of Hot Spots On The Performance Of Mesh--Based Networks

    Get PDF
    Direct network performance is affected by different design parameters which include number of virtual channels, number of ports, routing algorithm, switching technique, deadlock handling technique, packet size, and buffer size. Another factor that affects network performance is the traffic pattern. In this thesis, we study the effect of hotspot traffic on system performance. Specifically, we study the effect of hotspot factor, hotspot number, and hot spot location on the performance of mesh-based networks. Simulations are run on two network topologies, both the mesh and torus. We pay more attention to meshes because they are widely used in commercial machines. Comparisons between oblivious wormhole switching and chaotic packet switching are reported. Overall packet switching proved to be more efficient in terms of throughput when compared to wormhole switching. In the case of uniform random traffic, it is shown that the differences between chaotic and oblivious routing are indistinguishable. Networks with low number of hotspots show better performance. As the number of hotspots increases network latency tends to increase. It is shown that when the hotspot factor increases, performance of packet switching is better than that of wormhole switching. It is also shown that the location of hotspots affects network performance particularly with the oblivious routers since their achieved latencies proved to be more vulnerable to changes in the hotspot location. It is also shown that the smaller the size of the network the earlier network saturation occurs. Further, it is shown that the chaos router’s adaptivity is useful in this case. Finally, for tori, performance is not greatly affected by hotspot presence. This is mostly due to the symmetric nature of tori

    The efficiency of greedy routing in hypercubes and butterflies

    Get PDF
    Includes bibliographical references (p. 24-26).Cover title. "October 1990".Research supported by the ARO. DAAL03-86-K-0171 Research supported by the NSF. ECS-8552419by George D. Stamoulis and John N. Tsitsiklis

    An Efficient Routing Algorithm for Mesh-Hypercube (M-H) Networks

    Get PDF
    Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'08, ISBN Set # 1-60132-084-1), Editors: Hamid R. Arabnia and Youngsong Mun, 2008.This paper presents an efficient routing algorithm for the Mesh-Hypercube (M-H) network. The M-H network is one of the new interconnection networking techniques use to build high performance parallel computers. The combination of M-H networks offers high connectivity among multiple nodes, fault-tolerance, and load scalability. However, the performance of M-H networks may degrade significantly in the presence of frequent link or node failures. When a link or node failure occurs, neither the hardware schemes nor point to point and multistage routing algorithms can be used without adding extra links. This paper presents an efficient single bit store and forward (SBSF) routing algorithm for MH network that based on the round robin scheduling algorithm. Simulation and numerical results suggest that the proposed routing algorithm improves the overall performance of M-H network by both reducing the transmission delay and increasing the total data throughput even in the presence of faulty nodes.http://www.world-academy-of-science.org

    Improving Oblivious Reconfigurable Networks with High Probability

    Full text link
    Oblivious Reconfigurable Networks (ORNs) use rapidly reconfiguring switches to create a dynamic time-varying topology. Prior theoretical work on ORNs has focused on the tradeoff between maximum latency and guaranteed throughput. This work shows that by relaxing the notion of guaranteed throughput to an achievable rate with high probability, one can achieve a significant improvement in the latency/throughput tradeoff. For a fixed maximum latency, we show that almost twice the maximum possible guaranteed throughput rate can be achieved with high probability. Alternatively for a fixed throughput value, relaxing to achievement with high probability decreases the maximum latency to almost the square root of the latency required to guarantee the throughput rate. We first give a lower bound on the best maximum latency possible given an achieved throughput rate with high probability. This is done using an LP duality style argument. We then give a family of ORN designs which achieves these tradeoffs. The connection schedule is based on the Vandermonde Basis Scheme of Amir, Wilson, Shrivastav, Weatherspoon, Kleinberg, and Agarwal, although the period and routing scheme differ significantly. We prove achievable throughput with high probability by interpreting the amount of flow on each edge as a sum of negatively associated variables, and applying a Chernoff bound. This gives us a design with maximum latency that is tight with our lower bound (up to a log factor) for almost all constant throughput values.Comment: 19 pages, 1 figur

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud
    • …
    corecore