    A Scalable Multi-Stage Packet-Switch for Data Center Networks

    The growing trends of data centers over last decades including social networking, cloud-based applications and storage technologies enabled many advances to take place in the networking area. Recent changes imply continuous demand for bandwidth to manage the large amount of packetized traffic. Cluster switches and routers make the switching fabric in a Data Center Network (DCN) environment and provide interconnectivity between elements of the same DC and inter DCs. To handle the constantly variable loads, switches need deliver outstanding throughput along with resiliency and scalability for DCN requirements. Conventional DCN switches adopt crossbars or/and blocks of memories mounted in a multistage fashion (commonly 2-Tiers or 3-Tiers). However, current multistage switches, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. We propose a novel and highly scalable multistage switch based on Networkson- Chip (NoC) fabrics for DCNs. In particular, we describe a three-stage Clos packet-switch with a Round Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of the conventional singlehop crossbar. The design, referred to as Clos-UDN, overcomes shortcomings of traditional multistage architectures as it (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Simulations show that the Clos-UDN outperforms some common multistage switches under a range of input traffics, making it highly appealing for ultra-high capacity DC networks

    Solution of partial differential equations on vector and parallel computers

    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    On the Application of PSpice for Localised Cloud Security

    The work reported in this thesis commenced with a review of methods for creating random binary sequences for encoding data locally by the client before storing in the Cloud. The first method reviewed investigated evolutionary computing software which generated noise-producing functions from natural noise, a highly-speculative novel idea since noise is stochastic. Nevertheless, a function was created which generated noise to seed chaos oscillators which produced random binary sequences and this research led to a circuit-based one-time pad key chaos encoder for encrypting data. Circuit-based delay chaos oscillators, initialised with sampled electronic noise, were simulated in a linear circuit simulator called PSpice. Many simulation problems were encountered because of the nonlinear nature of chaos but were solved by creating new simulation parts, tools and simulation paradigms. Simulation data from a range of chaos sources was exported and analysed using Lyapunov analysis and identified two sources which produced one-time pad sequences with maximum entropy. This led to an encoding system which generated unlimited, infinitely-long period, unique random one-time pad encryption keys for plaintext data length matching. The keys were studied for maximum entropy and passed a suite of stringent internationally-accepted statistical tests for randomness. A prototype containing two delay chaos sources initialised by electronic noise was produced on a double-sided printed circuit board and produced more than 200 Mbits of OTPs. According to Vladimir Kotelnikov in 1941 and Claude Shannon in 1945, one-time pad sequences are theoretically-perfect and unbreakable, provided specific rules are adhered to. Two other techniques for generating random binary sequences were researched; a new circuit element, memristance was incorporated in a Chua chaos oscillator, and a fractional-order Lorenz chaos system with order less than three. Quantum computing will present many problems to cryptographic system security when existing systems are upgraded in the near future. The only existing encoding system that will resist cryptanalysis by this system is the unconditionally-secure one-time pad encryption

    Configurable data center switch architectures

    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Implementation of packet processing functions in high capacity internet routers.

    Internet predstavlja jedan od najvažnijih temelja razvoja modernog društva i učestvuje u svim aspektima svakodnevnog života - poslovnom, socijalnom, zabavnom, edukativnom itd. Internet je postigao globalni uspeh zahvaljujući svojoj robusnosti i mogućnosti da povezuje različite tehnologije u jednu meñusobno povezanu mrežu. Osnovu arhitekture Interneta čine ruteri koji omogućavaju globalnu povezanost svih delova Internet mreže. Pošto ruteri čine osnovnu gradivnu jedinicu Interneta, performanse i mogućnosti rutera imaju ogroman uticaj na kvalitet rada Internet mreže. Broj Internet korisnika neprestano raste. Takoñe, razvijaju se i nove aplikacije i servisi koji zahtevaju sve veće protoke, usled čega se u Internet mreži instaliraju linkovi sve većih kapaciteta. Kao posledica, količina saobraćaja na Internetu neprestano raste, pa samim tim Internet ruteri postaju sve opterećeniji, naročito u jezgru Internet mreže gde je saobraćaj najintezivniji. Internet ruteri moraju neprestano da se usavršavaju i unapreñuju, da bi mogli veoma brzo obrañivati ogromne količine podataka. Dodatne otežavajuće okolnosti sa stanovišta obrade podataka u ruterima su potreba za uvoñenjem mehanizama kvaliteta servisa i multikast saobraćaj koji je sve popularniji. Mnogi istraživači i naučnici rade na unapreñivanju funkcionalnosti rutera i razvoju novih rešenja i algoritama koji treba da omoguće efikasniji rad rutera. Meñutim, velik problem u razvoju novih rešenja i unapreñenja postojećih funkcija je zatvorenost rutera komercijalnih proizvoñača pa samim tim razvijana rešenja se tipično ispituju zasebno bez potpune integracije sa svim funkcijama rutera. Ovakav način ispitivanja je nepotpun jer ne omogućava kompletan uvid u kvalitet rada novog rešenja u realnom okruženju. Da bi se izbegli navedeni problemi, razvojni tim pod vodstvom dr Aleksandre Smiljanić je u okviru projekta „Sistemska integracija Internet rutera“ podržanog od strane Ministarstva za Nauku i tehnološki razvoj Republike Srbije započeo razvoj prototipa Internet rutera. Konačni cilj projekta je bio razvoj komercijalnog proizvoda, meñutim, pored ovog cilja namera je bila i da se obezbedi otvorena platforma istraživačima i studentima na kojoj bi mogli da proučavaju internu strukturu i arhitekturu rutera i da razvijaju i testiraju nova rešenja u realnom okruženju.Internet is one of the most important parts of the modern society. It participates in all aspects of everyday’s life - business, social, entertainment, education etc. Internet achieved global success thanks to its robustness and internetworking between various technologies. Routers enable Internet’s global connectivity and thus represent the foundation of the Internet. As routers are the main components of the Internet, their performances and capabilities have great impact on Internet quality performances. The number of Internet users continuously grows. New applications and services that demand high throughput are constantly developed, and as consequence higher capacity links are installed. The Internet traffic continuously grows, so Internet routers are more and more loaded with traffic, especially in the Internet core, where Internet traffic is most intensive. Therefore, Internet routers must be always upgraded to support high speed processing of large amount of the Internet traffic. QoS mechanisms and multicast traffic represent additional difficulties in the future router development. Many researchers and scientists are involved in router development process that includes development of new solutions and algorithms that enable more efficient router performances. However, the main problem in the development process is the closed router architecture in routers of commercial companies, thus developed solutions are tested without complete integration with the rest of the router functions. This leads to incomplete development and testing. To avoid aforementioned problems, research team led by Aleksandra Smiljanić started Internet router prototype development in the project „System integration of the Internet router“ supported by the Serbian Ministry of Science. The main goal of the project was development of the commercial router. Also, very important goal was development of the open source platform for researchers and students that would be used for the education purposes, as well as the research purposes where new solutions could be tested in the real environment. Internet routers contain two planes - data plane and control plane. Data plane is implemented in hardware and is responsible for fast IP packet processing. Control plane is implemented in software and is responsible for communication with router’s environment (neighbor routers, administrators and etc.). In this PhD thesis IP packet processors are developed and implemented. IP packet processors represent the most important part of the data plane

    The ingenuity of common workmen: and the invention of the computer

    Since World War II, state support for scientific research has been assumed crucial to technological and economic progress. Governments accordingly spent tremendous sums to that end. Nothing epitomizes the alleged fruits of that involvement better than the electronic digital computer. The first such computer has been widely reputed to be the ENIAC, financed by the U.S. Army for the war but finished afterwards. Vastly improved computers followed, initially paid for in good share by the Federal Government of the United States, but with the private sector then dominating, both in development and use, and computers are of major significance.;Despite the supposed success of public-supported science, evidence is that computers would have evolved much the same without it but at less expense. Indeed, the foundations of modern computer theory and technology were articulated before World War II, both as a tool of applied mathematics and for information processing, and the computer was itself on the cusp of reality. Contrary to popular understanding, the ENIAC actually represented a movement backwards and a dead end.;Rather, modern computation derived more directly, for example, from the prewar work of John Vincent Atanasoff and Clifford Berry, a physics professor and graduate student, respectively, at Iowa State College (now University) in Ames, Iowa. They built the Atanasoff Berry Computer (ABC), which, although special purpose and inexpensive, heralded the efficient and elegant design of modern computers. Moreover, while no one foresaw commercialization of computers based on the ungainly and costly ENIAC, the commercial possibilities of the ABC were immediately evident, although unrealized due to war. Evidence indicates, furthermore, that the private sector was willing and able to develop computers beyond the ABC and could have done so more effectively than government, to the most sophisticated machines.;A full and inclusive history of computers suggests that Adam Smith, the eighteenth century Scottish philosopher, had it right. He believed that minimal and aloof government best served society, and that the inherent genius of citizens was itself enough to ensure the general prosperity

    Multistage Packet-Switching Fabrics for Data Center Networks

    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    The Sixth Copper Mountain Conference on Multigrid Methods, part 1

    The Sixth Copper Mountain Conference on Multigrid Methods was held on 4-9 Apr. 1993, at Copper Mountain, CO. This book is a collection of many of the papers presented at the conference and as such represents the conference proceedings. NASA LaRC graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection clearly shows its rapid trend to further diversity and depth