3,236 research outputs found

    Content addressable memory project

    Get PDF
    A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks

    Optimal Networks from Error Correcting Codes

    Full text link
    To address growth challenges facing large Data Centers and supercomputing clusters a new construction is presented for scalable, high throughput, low latency networks. The resulting networks require 1.5-5 times fewer switches, 2-6 times fewer cables, have 1.2-2 times lower latency and correspondingly lower congestion and packet losses than the best present or proposed networks providing the same number of ports at the same total bisection. These advantage ratios increase with network size. The key new ingredient is the exact equivalence discovered between the problem of maximizing network bisection for large classes of practically interesting Cayley graphs and the problem of maximizing codeword distance for linear error correcting codes. Resulting translation recipe converts existent optimal error correcting codes into optimal throughput networks.Comment: 14 pages, accepted at ANCS 2013 conferenc

    CLEX: Yet Another Supercomputer Architecture?

    Get PDF
    We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized (1+o(1))(1+o(1))-optimally both with regard to bandwidth and delays. This is achieved at node degrees of nεn^{\varepsilon}, for an arbitrary small constant ε(0,1]\varepsilon\in (0,1]. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors 1010 respectively 55 in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems

    Symmetric Interconnection Networks from Cubic Crystal Lattices

    Full text link
    Torus networks of moderate degree have been widely used in the supercomputer industry. Tori are superb when used for executing applications that require near-neighbor communications. Nevertheless, they are not so good when dealing with global communications. Hence, typical 3D implementations have evolved to 5D networks, among other reasons, to reduce network distances. Most of these big systems are mixed-radix tori which are not the best option for minimizing distances and efficiently using network resources. This paper is focused on improving the topological properties of these networks. By using integral matrices to deal with Cayley graphs over Abelian groups, we have been able to propose and analyze a family of high-dimensional grid-based interconnection networks. As they are built over nn-dimensional grids that induce a regular tiling of the space, these topologies have been denoted \textsl{lattice graphs}. We will focus on cubic crystal lattices for modeling symmetric 3D networks. Other higher dimensional networks can be composed over these graphs, as illustrated in this research. Easy network partitioning can also take advantage of this network composition operation. Minimal routing algorithms are also provided for these new topologies. Finally, some practical issues such as implementability and preliminary performance evaluations have been addressed

    Deadlock avoidance with virtual channels

    Get PDF
    High Performance Computing is a rapidly evolving area of computer science which attends to solve complicated computational problems with the combination of computational nodes connected through high speed networks. This work concentrates on the networks problems that appear in such networks and specially focuses on the Deadlock problem that can decrease the efficiency of the communication or even destroy the balance and paralyze the network. Goal of this work is the Deadlock avoidance with the use of virtual channels, in the switches of the network where the problem appears. The deadlock avoidance assures that will not be loss of data inside network, having as result the increased latency of the served packets, due to the extra calculation that the switches have to make to apply the policy.La computación de alto rendimiento es una zona de rápida evolución de la informática que busca resolver complicados problemas de cálculo con la combinación de los nodos de cómputo conectados a través de redes de alta velocidad. Este trabajo se centra en los problemas de las redes que aparecen en este tipo de sistemas y especialmente se centra en el problema del "deadlock" que puede disminuir la eficacia de la comunicación con la paralización de la red. El objetivo de este trabajo es la evitación de deadlock con el uso de canales virtuales, en los conmutadores de la red donde aparece el problema. Evitar el deadlock asegura que no se producirá la pérdida de datos en red, teniendo como resultado el aumento de la latencia de los paquetes, debido al overhead extra de cálculo que los conmutadores tienen que hacer para aplicar la política.La computació d'alt rendiment és una àrea de ràpida evolució de la informàtica que pretén resoldre complicats problemes de càlcul amb la combinació de nodes de còmput connectats a través de xarxes d'alta velocitat. Aquest treball se centra en els problemes de les xarxes que apareixen en aquest tipus de sistemes i especialment se centra en el problema del "deadlock" que pot disminuir l'eficàcia de la comunicació amb la paralització de la xarxa. L'objectiu d'aquest treball és l'evitació de deadlock amb l'ús de canals virtuals, en els commutadors de la xarxa on apareix el problema. Evitar deadlock assegura que no es produirà la pèrdua de dades en xarxa, tenint com a resultat l'augment de la latència dels paquets, degut al overhead extra de càlcul que els commutadors han de fer per aplicar la política

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast
    corecore