3,236 research outputs found
Content addressable memory project
A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
Optimal Networks from Error Correcting Codes
To address growth challenges facing large Data Centers and supercomputing
clusters a new construction is presented for scalable, high throughput, low
latency networks. The resulting networks require 1.5-5 times fewer switches,
2-6 times fewer cables, have 1.2-2 times lower latency and correspondingly
lower congestion and packet losses than the best present or proposed networks
providing the same number of ports at the same total bisection. These advantage
ratios increase with network size. The key new ingredient is the exact
equivalence discovered between the problem of maximizing network bisection for
large classes of practically interesting Cayley graphs and the problem of
maximizing codeword distance for linear error correcting codes. Resulting
translation recipe converts existent optimal error correcting codes into
optimal throughput networks.Comment: 14 pages, accepted at ANCS 2013 conferenc
CLEX: Yet Another Supercomputer Architecture?
We propose the CLEX supercomputer topology and routing scheme. We prove that
CLEX can utilize a constant fraction of the total bandwidth for point-to-point
communication, at delays proportional to the sum of the number of intermediate
hops and the maximum physical distance between any two nodes. Moreover, %
applying an asymmetric bandwidth assignment to the links, all-to-all
communication can be realized -optimally both with regard to
bandwidth and delays. This is achieved at node degrees of ,
for an arbitrary small constant . In contrast, these
results are impossible in any network featuring constant or polylogarithmic
node degrees. Through simulation, we assess the benefits of an implementation
of the proposed communication strategy. Our results indicate that, for a
million processors, CLEX can increase bandwidth utilization and reduce average
routing path length by at least factors respectively in comparison to
a torus network. Furthermore, the CLEX communication scheme features several
other properties, such as deadlock-freedom, inherent fault-tolerance, and
canonical partition into smaller subsystems
Symmetric Interconnection Networks from Cubic Crystal Lattices
Torus networks of moderate degree have been widely used in the supercomputer
industry. Tori are superb when used for executing applications that require
near-neighbor communications. Nevertheless, they are not so good when dealing
with global communications. Hence, typical 3D implementations have evolved to
5D networks, among other reasons, to reduce network distances. Most of these
big systems are mixed-radix tori which are not the best option for minimizing
distances and efficiently using network resources. This paper is focused on
improving the topological properties of these networks.
By using integral matrices to deal with Cayley graphs over Abelian groups, we
have been able to propose and analyze a family of high-dimensional grid-based
interconnection networks. As they are built over -dimensional grids that
induce a regular tiling of the space, these topologies have been denoted
\textsl{lattice graphs}. We will focus on cubic crystal lattices for modeling
symmetric 3D networks. Other higher dimensional networks can be composed over
these graphs, as illustrated in this research. Easy network partitioning can
also take advantage of this network composition operation. Minimal routing
algorithms are also provided for these new topologies. Finally, some practical
issues such as implementability and preliminary performance evaluations have
been addressed
Deadlock avoidance with virtual channels
High Performance Computing is a rapidly evolving area of computer science which attends to solve complicated computational problems with the combination of computational nodes connected through high speed networks. This work concentrates on the networks problems that appear in such networks and specially focuses on the Deadlock problem that can decrease the efficiency of the communication or even destroy the balance and paralyze the network. Goal of this work is the Deadlock avoidance with the use of virtual channels, in the switches of the network where the problem appears. The deadlock avoidance assures that will not be loss of data inside network, having as result the increased latency of the served packets, due to the extra calculation that the switches have to make to apply the policy.La computación de alto rendimiento es una zona de rápida evolución de la informática que busca resolver complicados problemas de cálculo con la combinación de los nodos de cómputo conectados a través de redes de alta velocidad. Este trabajo se centra en los problemas de las redes que aparecen en este tipo de sistemas y especialmente se centra en el problema del "deadlock" que puede disminuir la eficacia de la comunicación con la paralización de la red. El objetivo de este trabajo es la evitación de deadlock con el uso de canales virtuales, en los conmutadores de la red donde aparece el problema. Evitar el deadlock asegura que no se producirá la pérdida de datos en red, teniendo como resultado el aumento de la latencia de los paquetes, debido al overhead extra de cálculo que los conmutadores tienen que hacer para aplicar la política.La computació d'alt rendiment és una àrea de ràpida evolució de la informàtica que pretén resoldre complicats problemes de càlcul amb la combinació de nodes de còmput connectats a través de xarxes d'alta velocitat. Aquest treball se centra en els problemes de les xarxes que apareixen en aquest tipus de sistemes i especialment se centra en el problema del "deadlock" que pot disminuir l'eficàcia de la comunicació amb la paralització de la xarxa. L'objectiu d'aquest treball és l'evitació de deadlock amb l'ús de canals virtuals, en els commutadors de la xarxa on apareix el problema. Evitar deadlock assegura que no es produirà la pèrdua de dades en xarxa, tenint com a resultat l'augment de la latència dels paquets, degut al overhead extra de càlcul que els commutadors han de fer per aplicar la política
Exploring Adaptive Implementation of On-Chip Networks
As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores.
Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies.
Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast
- …