286 research outputs found

    Distributed Software Router Management

    Get PDF
    With the stunning success of the Internet, information and communication technologies diffused increasingly attracting more uses to join the the Internet arsenal which in turn accelerates the traffic growth. This growth rate does not seem to slow down in near future. Networking devices support these traffic growth by offering an ever increasing transmission and switching speed, mostly due to the technological advancement of microelectronics granted by Moore’s Law. However, the comparable growth rate of the Internet and electronic devices suggest that capacity of systems will become a crucial factor in the years ahead. Besides the growth rate challenge that electronic devices face with respect to traffic growth, networking devices have always been characterized by the development of proprietary architectures. This means that incompatible equipment and architectures, especially in terms of configuration and management procedures. The major drawback of such industrial practice, however, is that the devices lack flexibility and programmability which is one of the source of ossification for today’s Internet. Thus scaling or modifying networking devices, particularly routers, for a desired function requires a flexible and programmable devices. Software routers (SRs) based on personal computers (PCs) are among these devices that satisfy the flexibility and programmability criteria. Furthermore, the availability of large number of open-source software for networking applications both for data as well as control plane and the low cost PCs driven by PC-market economy scale make software routers appealing alternative to expensive proprietary networking devices. That is, while software routers have the advantage of being flexible, programmable and low cost, proprietary networking equipments are usually expensive, difficult to extend, program, or otherwise experiment with because they rely on specialized and closed hardware and software. Despite their advantages, however, software routers are not without limitation. The objections to software routers include limited performance, scalability problems and lack of advanced functionality. These limitations arose from the fact that a single server limited by PCI bus width and CPU is given a responsibility to process large amount of packets. Offloading some packet processing tasks performed by the CPU to other processors, such as GPUs of the same PC or external CPUs, is a viable approach to overcome some of these limitations. In line with this, a distributed Multi-Stage Software Router (MSSR) architecture has been proposed in order to overcome both the performance and scalability issues of single PC based software routers. The architecture has three stages: i) a front-end layer-2 load balancers (LBs), open-software or open-hardware based, that act as interfaces to the external networks and distribute IP packets to ii) back-end personal computers (BEPCs), also named back-end routers in this thesis, that provide IP routing functionality, and iii) an interconnection network, based on Ethernet switches, that connects the two stages. Performance scaling of the architecture is achieved by increasing the redundancy of the routing functionality stage where multiple servers are given a coordinated task of routing packets. The scalability problem related to number of interfaces per PC is also tackled in MSSR by bundling two or more PCs’ interfaces through a switch at the front-end stage. The overall architecture is controlled and managed by a control entity named Virtual Control Processor (virtualCP), which runs on a selected back-end router, through a DIST protocol. This entity is also responsible to hide the internal details of the multistage software router architecture such that the whole architecture appear to external network devices as a single device. However, building a flexible and scalable high-performance MSSR architecture requires large number of independently, but coordinately, running internal components. As the number of internal devices increase so does the architecture control and management complexity. In addition, redundant components to scale performance means power wastage at low loads. These challenges have to be addressed in making the multistage software router a functional and competent network device. Consequently, the contribution of this thesis is to develop an MSSR centralized management system that deals with these challenges. The management system has two broadly classified sub-systems: I) power management: a module responsible to address the energy inefficiency in multistage software router architecture II) unified information management: a module responsible to create a unified management information base such that the distributed multistage router architecture appears as a single device to external network from management information perspective. The distributed multistage router power management module tries to minimize the energy consumption of the architecture by resizing the architecture to the traffic demand. During low load periods only few components, especially that of routing functionality stage, are required to readily give a service. Thus it is wise to device a mechanism that puts idle components to low power mode to save energy during low load periods. In this thesis an optimal and two heuristic algorithms, namely on-line and off-line, are proposed to adapt the architecture to an input load demand. We demonstrate that the optimal algorithm, besides having scalability issue, is an off-line approach that introduce service disruption and delay during the architecture reconfiguration period. In solving these issues, heuristic solutions are proposed and their performance is measured against the optimal solution. Results show that the algorithms fairly approximate the optimal solution and use of these algorithms save up to 57.44% of the total architecture energy consumption during low load periods. The on-line algorithms are superior among the heuristic solutions as it has the advantage of being less disruptive and has minimal service delay. Furthermore, the thesis shows that the proposed algorithms will be more efficient if the architecture is designed keeping in mind energy as one of the design parameter. In achieving this goal three different approaches to design an MSSR architecture are proposed and their energy saving efficient is evaluated both with respect to the optimal solution and other similar cluster design approaches. The multistage software router is unique from a single device as it is composed of independently running components. This means that the MSSR management information is distributed in the architecture since individual components register their own management information. It is said, however, that the MSSR internal devices work cooperatively to appear as a single network device to the external network. The MSSR architecture, as a single device, therefore requires its own management information base which is built from the management information bases dispersed among internal components. This thesis proposes a mechanism to collect and organize this distributed management information and create a single management information base representing the whole architecture. Accordingly existing SNMP management communication model has been modified to fit to distributed multi-stage router architecture and a possible management architecture is proposed. In compiling the management information, different schemes has been adopted to deal with different SNMP management information variables. Scalability analysis shows that proposed management system scales well and does not pose a threat to the overall architecture scalability

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Machine Learning for Multi-Layer Open and Disaggregated Optical Networks

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Advances in Optical Amplifiers

    Get PDF
    Optical amplifiers play a central role in all categories of fibre communications systems and networks. By compensating for the losses exerted by the transmission medium and the components through which the signals pass, they reduce the need for expensive and slow optical-electrical-optical conversion. The photonic gain media, which are normally based on glass- or semiconductor-based waveguides, can amplify many high speed wavelength division multiplexed channels simultaneously. Recent research has also concentrated on wavelength conversion, switching, demultiplexing in the time domain and other enhanced functions. Advances in Optical Amplifiers presents up to date results on amplifier performance, along with explanations of their relevance, from leading researchers in the field. Its chapters cover amplifiers based on rare earth doped fibres and waveguides, stimulated Raman scattering, nonlinear parametric processes and semiconductor media. Wavelength conversion and other enhanced signal processing functions are also considered in depth. This book is targeted at research, development and design engineers from teams in manufacturing industry, academia and telecommunications service operators

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
    • …
    corecore