53 research outputs found
A Scalable Multi-Stage Packet-Switch for Data Center Networks
The growing trends of data centers over last decades including social networking, cloud-based applications and storage technologies enabled many advances to take place in the networking area. Recent changes imply continuous demand for bandwidth to manage the large amount of packetized traffic. Cluster switches and routers make the switching fabric in a Data Center Network (DCN) environment and provide interconnectivity between elements of the same DC and inter DCs. To handle the constantly variable loads, switches need deliver outstanding throughput along with resiliency and scalability for DCN requirements. Conventional DCN switches adopt crossbars or/and blocks of memories mounted in a multistage fashion (commonly 2-Tiers or 3-Tiers). However, current multistage switches, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. We propose a novel and highly scalable multistage switch based on Networkson- Chip (NoC) fabrics for DCNs. In particular, we describe a three-stage Clos packet-switch with a Round Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of the conventional singlehop crossbar. The design, referred to as Clos-UDN, overcomes shortcomings of traditional multistage architectures as it (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Simulations show that the Clos-UDN outperforms some common multistage switches under a range of input traffics, making it highly appealing for ultra-high capacity DC networks
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
On the Application of PSpice for Localised Cloud Security
The work reported in this thesis commenced with a review of methods for creating random binary sequences for encoding data locally by the client before storing in the Cloud. The first method reviewed investigated evolutionary computing software which generated noise-producing functions from natural noise, a highly-speculative novel idea since noise is stochastic. Nevertheless, a function was created which generated noise to seed chaos oscillators which produced random binary sequences and this research led to a circuit-based one-time pad key chaos encoder for encrypting data. Circuit-based delay chaos oscillators, initialised with sampled electronic noise, were simulated in a linear circuit simulator called PSpice. Many simulation problems were encountered because of the nonlinear nature of chaos but were solved by creating new simulation parts, tools and simulation paradigms. Simulation data from a range of chaos sources was exported and analysed using Lyapunov analysis and identified two sources which produced one-time pad sequences with maximum entropy. This led to an encoding system which generated unlimited, infinitely-long period, unique random one-time pad encryption keys for plaintext data length matching. The keys were studied for maximum entropy and passed a suite of stringent internationally-accepted statistical tests for randomness. A prototype containing two delay chaos sources initialised by electronic noise was produced on a double-sided printed circuit board and produced more than 200 Mbits of OTPs. According to Vladimir Kotelnikov in 1941 and Claude Shannon in 1945, one-time pad sequences are theoretically-perfect and unbreakable, provided specific rules are adhered to. Two other techniques for generating random binary sequences were researched; a new circuit element, memristance was incorporated in a Chua chaos oscillator, and a fractional-order Lorenz chaos system with order less than three. Quantum computing will present many problems to cryptographic system security when existing systems are upgraded in the near future. The only existing encoding system that will resist cryptanalysis by this system is the unconditionally-secure one-time pad encryption
Configurable data center switch architectures
In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces
Recommended from our members
Network Structures, Concurrency, and Interpretability: Lessons from the Development of an AI Enabled Graph Database System
This thesis describes the development of the SmartGraph, an AI enabled graph database. The need for such a system has been independently recognized in the isolated fields of graph databases, graph computing, and computational graph deep learning systems, such as TensorFlow. Though prior works have investigated some relationships between these fields, we believe that the SmartGraph is the first system designed from conception to incorporate the most significant and useful characteristics of each. Examples include the ability to store graph structured data, run analytics natively on this data, and run gradient descent algorithms. It is the synergistic aspects of combining these fields that provide the most novel results presented in this dissertation. Key among them is how the notion of “graph querying” as used in graph databases can be used to solve a problem that has plagued deep learning systems since their inception; rather than attempting to embed graph structured datasets into restrictive vector spaces, we instead allow the deep learning functionality of the system to natively perform graph querying in memory during optimization as a way of interpreting (and learning) the graph. This results in a concept of natural and interpretable processing of graph structured data.
Graph computing systems have traditionally used distributed computing across multiple compute nodes (e.g. separate machines connected via Ethernet or internet) to deal with large-scale datasets whilst working sequentially on problems over entire datasets. In this dissertation, we outline a distributed graph computing methodology that facilitates all the above capabilities (even in an environment consisting of a single physical machine) while allowing for a workflow more typical of a graph database than a graph computing system; massive concurrent access allowing for arbitrarily asynchronous execution of queries and analytics across the entire system. Further, we demonstrate how this methodology is key to the artificial intelligence capabilities of the system
Implementation of packet processing functions in high capacity internet routers.
Internet predstavlja jedan od najvažnijih temelja razvoja modernog društva i
učestvuje u svim aspektima svakodnevnog života - poslovnom, socijalnom, zabavnom,
edukativnom itd. Internet je postigao globalni uspeh zahvaljujući svojoj robusnosti i
mogućnosti da povezuje različite tehnologije u jednu meñusobno povezanu mrežu.
Osnovu arhitekture Interneta čine ruteri koji omogućavaju globalnu povezanost svih
delova Internet mreže. Pošto ruteri čine osnovnu gradivnu jedinicu Interneta,
performanse i mogućnosti rutera imaju ogroman uticaj na kvalitet rada Internet mreže.
Broj Internet korisnika neprestano raste. Takoñe, razvijaju se i nove aplikacije i
servisi koji zahtevaju sve veće protoke, usled čega se u Internet mreži instaliraju linkovi
sve većih kapaciteta. Kao posledica, količina saobraćaja na Internetu neprestano raste,
pa samim tim Internet ruteri postaju sve opterećeniji, naročito u jezgru Internet mreže
gde je saobraćaj najintezivniji. Internet ruteri moraju neprestano da se usavršavaju i
unapreñuju, da bi mogli veoma brzo obrañivati ogromne količine podataka. Dodatne
otežavajuće okolnosti sa stanovišta obrade podataka u ruterima su potreba za
uvoñenjem mehanizama kvaliteta servisa i multikast saobraćaj koji je sve popularniji.
Mnogi istraživači i naučnici rade na unapreñivanju funkcionalnosti rutera i
razvoju novih rešenja i algoritama koji treba da omoguće efikasniji rad rutera. Meñutim,
velik problem u razvoju novih rešenja i unapreñenja postojećih funkcija je zatvorenost
rutera komercijalnih proizvoñača pa samim tim razvijana rešenja se tipično ispituju
zasebno bez potpune integracije sa svim funkcijama rutera. Ovakav način ispitivanja je
nepotpun jer ne omogućava kompletan uvid u kvalitet rada novog rešenja u realnom
okruženju. Da bi se izbegli navedeni problemi, razvojni tim pod vodstvom dr
Aleksandre Smiljanić je u okviru projekta „Sistemska integracija Internet rutera“
podržanog od strane Ministarstva za Nauku i tehnološki razvoj Republike Srbije
započeo razvoj prototipa Internet rutera. Konačni cilj projekta je bio razvoj
komercijalnog proizvoda, meñutim, pored ovog cilja namera je bila i da se obezbedi
otvorena platforma istraživačima i studentima na kojoj bi mogli da proučavaju internu
strukturu i arhitekturu rutera i da razvijaju i testiraju nova rešenja u realnom okruženju.Internet is one of the most important parts of the modern society. It participates
in all aspects of everyday’s life - business, social, entertainment, education etc. Internet
achieved global success thanks to its robustness and internetworking between various
technologies. Routers enable Internet’s global connectivity and thus represent the
foundation of the Internet. As routers are the main components of the Internet, their
performances and capabilities have great impact on Internet quality performances.
The number of Internet users continuously grows. New applications and services
that demand high throughput are constantly developed, and as consequence higher
capacity links are installed. The Internet traffic continuously grows, so Internet routers
are more and more loaded with traffic, especially in the Internet core, where Internet
traffic is most intensive. Therefore, Internet routers must be always upgraded to support
high speed processing of large amount of the Internet traffic. QoS mechanisms and
multicast traffic represent additional difficulties in the future router development.
Many researchers and scientists are involved in router development process that
includes development of new solutions and algorithms that enable more efficient router
performances. However, the main problem in the development process is the closed
router architecture in routers of commercial companies, thus developed solutions are
tested without complete integration with the rest of the router functions. This leads to
incomplete development and testing. To avoid aforementioned problems, research team
led by Aleksandra Smiljanić started Internet router prototype development in the project
„System integration of the Internet router“ supported by the Serbian Ministry of
Science. The main goal of the project was development of the commercial router. Also,
very important goal was development of the open source platform for researchers and
students that would be used for the education purposes, as well as the research purposes
where new solutions could be tested in the real environment.
Internet routers contain two planes - data plane and control plane. Data plane is
implemented in hardware and is responsible for fast IP packet processing. Control plane
is implemented in software and is responsible for communication with router’s
environment (neighbor routers, administrators and etc.). In this PhD thesis IP packet
processors are developed and implemented. IP packet processors represent the most
important part of the data plane
The ingenuity of common workmen: and the invention of the computer
Since World War II, state support for scientific research has been assumed crucial to technological and economic progress. Governments accordingly spent tremendous sums to that end. Nothing epitomizes the alleged fruits of that involvement better than the electronic digital computer. The first such computer has been widely reputed to be the ENIAC, financed by the U.S. Army for the war but finished afterwards. Vastly improved computers followed, initially paid for in good share by the Federal Government of the United States, but with the private sector then dominating, both in development and use, and computers are of major significance.;Despite the supposed success of public-supported science, evidence is that computers would have evolved much the same without it but at less expense. Indeed, the foundations of modern computer theory and technology were articulated before World War II, both as a tool of applied mathematics and for information processing, and the computer was itself on the cusp of reality. Contrary to popular understanding, the ENIAC actually represented a movement backwards and a dead end.;Rather, modern computation derived more directly, for example, from the prewar work of John Vincent Atanasoff and Clifford Berry, a physics professor and graduate student, respectively, at Iowa State College (now University) in Ames, Iowa. They built the Atanasoff Berry Computer (ABC), which, although special purpose and inexpensive, heralded the efficient and elegant design of modern computers. Moreover, while no one foresaw commercialization of computers based on the ungainly and costly ENIAC, the commercial possibilities of the ABC were immediately evident, although unrealized due to war. Evidence indicates, furthermore, that the private sector was willing and able to develop computers beyond the ABC and could have done so more effectively than government, to the most sophisticated machines.;A full and inclusive history of computers suggests that Adam Smith, the eighteenth century Scottish philosopher, had it right. He believed that minimal and aloof government best served society, and that the inherent genius of citizens was itself enough to ensure the general prosperity
Multistage Packet-Switching Fabrics for Data Center Networks
Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume.
A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery.
For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity.
Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals.
The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ)
NoC fabric. The design merges assets of the output queuing, and
NoCs to provide high throughput, and smooth latency variations.
An approximate analytical model of the switch performance is also proposed.
To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC
(MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
Multistage Packet-Switching Fabrics for Data Center Networks
Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume.
A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery.
For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity.
Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals.
The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ)
NoC fabric. The design merges assets of the output queuing, and
NoCs to provide high throughput, and smooth latency variations.
An approximate analytical model of the switch performance is also proposed.
To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC
(MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
The Sixth Copper Mountain Conference on Multigrid Methods, part 1
The Sixth Copper Mountain Conference on Multigrid Methods was held on 4-9 Apr. 1993, at Copper Mountain, CO. This book is a collection of many of the papers presented at the conference and as such represents the conference proceedings. NASA LaRC graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection clearly shows its rapid trend to further diversity and depth
- …