160 research outputs found
OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks
We present a new, deadlock-free, routing scheme for toroidal interconnection
networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which
exploits non-minimal links, both in the source and in the destination nodes.
When minimal links are congested, OFR deroutes packets to carefully chosen
intermediate destinations, in order to obtain travel paths which are only an
additive constant longer than the shortest ones. Since routing performance is
very sensitive to changes in the traffic model or in the router parameters, an
accurate discrete-event simulator of the toroidal network has been developed to
empirically validate OFR, by comparing it against other relevant routing
strategies, over a range of typical real-world traffic patterns. On the
16x16x16 (4096 nodes) simulated network OFR exhibits improvements of the
maximum sustained throughput between 14% and 114%, with respect to Adaptive
Bubble Routing.Comment: 9 pages, 5 figures, to be presented at ICPADS 201
-Routing on Plane Grids
The packet routing problem plays an essential role in communication networks. It involves how to transfer data from some origins to some destinations within a reasonable amount of time. In the -routing problem, each node can send at most packets and receive at most packets. Permutation routing is the particular case . In the -central routing problem, all nodes at distance at most from a fixed node want to send a packet to . In this article we study the permutation routing, the -central routing and the general -routing problems on plane grids, that is square grids, triangular grids and hexagonal grids. We use the \emph{store-and-forward} -port model, and we consider both full and half-duplex networks. The main contributions are the following: \begin{itemize} \item[1.] Tight permutation routing algorithms on full-duplex hexagonal grids, and half duplex triangular and hexagonal grids. \item[2.] Tight -central routing algorithms on triangular and hexagonal grids. \item[3.] Tight -routing algorithms on square, triangular and hexagonal grids. \item[4.] Good approximation algorithms (in terms of running time) for -routing on square, triangular and hexagonal grids, together with new lower bounds on the running time of any algorithm using shortest path routing. \end{itemize} \noindent All these algorithms are completely distributed, i.e. can be implemented independently at each node. Finally, we also formulate the -routing problem as a \textsc{Weighted Edge Coloring} problem on bipartite graphs
Multistage Packet-Switching Fabrics for Data Center Networks
Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume.
A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery.
For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity.
Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals.
The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ)
NoC fabric. The design merges assets of the output queuing, and
NoCs to provide high throughput, and smooth latency variations.
An approximate analytical model of the switch performance is also proposed.
To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC
(MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
On-chip networks for manycore architecture
Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 109-116).Over the past decade, increasing the number of cores on a single processor has successfully enabled continued improvements of computer performance. Further scaling these designs to tens and hundreds of cores, however, still presents a number of hard problems, such as scalability, power efficiency and effective programming models. A key component of manycore systems is the on-chip network, which faces increasing efficiency demands as the number of cores grows. In this thesis, we present three techniques for improving the efficiency of on-chip interconnects. First, we present PROM (Path-based, Randomized, Oblivious, and Minimal routing) and BAN (Bandwidth Adaptive Networks), techniques that offer efficient intercore communication for bandwith-constrained networks. Next, we present ENC (Exclusive Native Context), the first deadlock-free, fine-grained thread migration protocol developed for on-chip networks. ENC demonstrates that a simple and elegant technique in the on-chip network can provide critical functional support for higher-level application and system layers. Finally, we provide a realistic context by sharing our hands-on experience in the physical implementation of the on-chip network for the Execution Migration Machine, an ENC-based 110-core processor fabricated in 45nm ASIC technology.by Myong Hyon Cho.Ph.D
Aspects of k-k-Routing in Meshes and OTIS Networks
Aspects of k-k Routing in Meshes and OTIS-Networks
Abstract
Efficient data transport in parallel computers build on
sparse interconnection networks is crucial for their
performance. A basic transport problem in such a computer
is the k-k routing problem. In this thesis,
aspects of the k-k routing problem on r-dimensional
meshes and OTIS-G networks are discussed. The first oblivious
routing algorithms for these networks are presented
that solve the k-k routing problem in an
asymptotically optimal running time and a constant
buffer size. Furthermore, other aspects of the k-k
routing problem for OTIS-G networks are analysed.
In particular, lower bounds for the problem based on the
diameter and bisection width of OTIS-G networks are
given, and the k-k sorting problem on the OTIS-Mesh
is considered. Based on OTIS-G networks, a new class
of networks, called Extended OTIS-G networks, is introduced,
which have smaller diameters than OTIS-G networks.Für die Leistungfähigkeit von Parallelrechnern, die über ein Verbindungsnetzwerk kommunizieren, ist ein effizienter Datentransport entscheidend. Ein grundlegendes Transportproblem in einem solchen Rechner ist das k-k Routing Problem. In dieser Arbeit werden Aspekte dieses Problems in r-dimensionalen Gittern und OTIS-G Netzwerken untersucht. Es wird der erste vergessliche (oblivious) Routing Algorithmus vorgestellt, der das k-k Routing Problem in diesen Netzwerken in einer asymptotisch optimalen Laufzeit bei konstanter Puffergröße löst. Für OTIS-G Netzwerke werden untere Laufzeitschranken für das untersuchte Problem angegeben, die auf dem Durchmesser und der Bisektionsweite der Netzwerke basieren. Weiterhin wird ein Algorithmus vorgestellt, der das k-k Sorting Problem mit einer Laufzeit löst, die nahe an der Bisektions- und Durchmesserschranke liegt. Basierend auf den OTIS-G Netzwerken, wird eine neue Klasse von Netzwerken eingeführt, die sogenannten Extended OTIS-G Netzwerke, die sich durch einen kleineren Durchmesser von OTIS-G Netzwerken unterscheiden
Multistage Packet-Switching Fabrics for Data Center Networks
Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume.
A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery.
For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity.
Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals.
The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ)
NoC fabric. The design merges assets of the output queuing, and
NoCs to provide high throughput, and smooth latency variations.
An approximate analytical model of the switch performance is also proposed.
To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC
(MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
Submicron Systems Architecture Project: Semiannual Technical Report
No abstract available
Non-minimal adaptive routing for efficient interconnection networks
RESUMEN: La red de interconexiĂłn es un concepto clave de los sistemas de computaciĂłn paralelos. El primer aspecto que define una red de interconexiĂłn es su topologĂa. Habitualmente, las redes escalables y eficientes en tĂ©rminos de coste y consumo energĂ©tico tienen bajo diámetro y se basan en topologĂas que encaran el lĂmite de Moore y en las que no hay diversidad de caminos mĂnimos. Una vez definida la topologĂa, quedando implĂcitamente definidos los lĂmites de rendimiento de la red, es necesario diseñar un algoritmo de enrutamiento que se acerque lo máximo posible a esos lĂmites y debido a la ausencia de caminos mĂnimos, este además debe explotar los caminos no mĂnimos cuando el tráfico es adverso. Estos algoritmos de enrutamiento habitualmente seleccionan entre rutas mĂnimas y no mĂnimas en base a las condiciones de la red. Las rutas no mĂnimas habitualmente se basan en el algoritmo de balanceo de carga propuesto por Valiant, esto implica que doblan la longitud de las rutas mĂnimas y por lo tanto, la latencia soportada por los paquetes se incrementa. En cuanto a la tecnologĂa, desde su introducciĂłn en entornos HPC a principios de los años 2000, Ethernet ha sido usado en un porcentaje representativo de los sistemas.
Esta tesis introduce una implementaciĂłn realista y competitiva de una red escalable y sin pĂ©rdidas basada en dispositivos de red Ethernet commodity, considerando topologĂas de bajo diámetro y bajo consumo energĂ©tico y logrando un ahorro energĂ©tico de hasta un 54%. Además, propone un enrutamiento sobre la citada arquitectura, en adelante QCN-Switch, el cual selecciona entre rutas mĂnimas y no mĂnimas basado en notificaciones de congestiĂłn explĂcitas. Una vez implementada la decisiĂłn de enrutar siguiendo rutas no mĂnimas, se introduce un enrutamiento adaptativo en fuente capaz de adaptar el nĂşmero de saltos en las rutas no mĂnimas. Este enrutamiento, en adelante ACOR, es agnĂłstico de la topologĂa y mejora la latencia en hasta un 28%. Finalmente, se introduce un enrutamiento dependiente de la topologĂa, en adelante LIAN, que optimiza el nĂşmero de saltos de las rutas no mĂnimas basado en las condiciones de la red. Los resultados de su evaluaciĂłn muestran que obtiene una latencia cuasi Ăłptima y mejora el rendimiento de algoritmos de enrutamiento actuales reduciendo la latencia en hasta un 30% y obteniendo un rendimiento estable y equitativo.ABSTRACT: Interconnection network is a key concept of any parallel computing system. The first aspect to define an interconnection network is its topology. Typically, power and cost-efficient scalable networks with low diameter rely on topologies that approach the Moore bound in which there is no minimal path diversity. Once the topology is defined, the performance bounds of the network are determined consequently, so a suitable routing algorithm should be designed to accomplish as much as possible of those limits and, due to the lack of minimal path diversity, it must exploit non-minimal paths when the traffic pattern is adversarial. These routing algorithms usually select between minimal and non-minimal paths based on the network conditions, where the non-minimal paths are built according to Valiant load-balancing algorithm. This implies that these paths double the length of minimal ones and then the latency supported by packets increases. Regarding the technology, from its introduction in HPC systems in the early 2000s, Ethernet has been used in a significant fraction of the systems.
This dissertation introduces a realistic and competitive implementation of a scalable lossless Ethernet network for HPC environments considering low-diameter and low-power topologies. This allows for up to 54% power savings. Furthermore, it proposes a routing upon the cited architecture, hereon QCN-Switch, which selects between minimal and non-minimal paths per packet based on explicit congestion notifications instead of credits. Once the miss-routing decision is implemented, it introduces two mechanisms regarding the selection of the intermediate switch to develop a source adaptive routing algorithm capable of adapting the number of hops in the non-minimal paths. This routing, hereon ACOR, is topology-agnostic and improves average latency in all cases up to 28%. Finally, a topology-dependent routing, hereon LIAN, is introduced to optimize the number of hops in the non-minimal paths based on the network live conditions. Evaluations show that LIAN obtains almost-optimal latency and outperforms state-of-the-art adaptive routing algorithms, reducing latency by up to 30.0% and providing stable throughput and fairness.This work has been supported by the Spanish Ministry of Education, Culture and Sports
under grant FPU14/02253, the Spanish Ministry of Economy, Industry and Competitiveness
under contracts TIN2010-21291-C02-02, TIN2013-46957-C2-2-P, and TIN2013-46957-C2-2-P (AEI/FEDER, UE), the Spanish Research Agency under contract PID2019-105660RBC22/AEI/10.13039/501100011033, the European Union under agreements FP7-ICT-2011-
7-288777 (Mont-Blanc 1) and FP7-ICT-2013-10-610402 (Mont-Blanc 2), the University of
Cantabria under project PAR.30.P072.64004, and by the European HiPEAC Network of Excellence through an internship grant supported by the European Union’s Horizon 2020 research
and innovation program under grant agreement No. H2020-ICT-2015-687689
- …