938 research outputs found

    On the design of a high-performance adaptive router for CC-NUMA multiprocessors

    Get PDF
    Copyright © 2003 IEEEThis work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade off between network performance and hardware cost. The outcome of this research is a High-Performance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10 percent to 40 percent in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83 percent of its theoretical maximum throughput under random traffic and up to 70 percent when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.Valentín Puente, José-Ángel Gregorio, Ramón Beivide, and Cruz Iz

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    On the Effectiveness of Source Throttling for Networks-on-Chip in Chip Multiprocessor Designs

    Get PDF
    In modern chip-multiprocessor (CMP) designs, with the increasing number of cores, traffic between different cores keeps increasing. Consequently, on-chip interconnection networks experience increasingly large communication bandwidth demand. This thesis focuses on Quality-of-Service (QoS) of Networks-on-Chip (NoC). NoC is considered as a scalable approach of interconnection network compared to conventional bus-based architecture. Like Ethernet, NoC faces common QoS issues such as bandwidth utilization and fairness. This thesis is a study on the effectiveness of source throttling for NoC, including fairness and overall performance such as program run time and packet latency. Source throttling is a well-known technique for traffic regulation. It is shown to be effective for bufferless NoC in previous studies. Due to different traffic behaviors and characteristics, however, it is not obvious if source throttling is effective for general buffered NoC. The first part of this research is a set of network simulations on various synthetic traffic cases. The results indicate that source throttling can reduce application runtime when (1) the network is congested, (2) there are dependencies among communication requests, and (3) the width of the dependence graph must be sufficiently large. The second part is full system simulations on public benchmark suites. Source throttling does not bring benefit for these relative realistic cases. Further experiment reveals that the aforementioned conditions are not satisfied. This explains why source throttling is of little use for general buffered NoC in CMP designs

    On packet switch design

    Get PDF

    Distributed control architecture for multiservice networks

    Get PDF
    The research focuses in devising decentralised and distributed control system architecture for the management of internetworking systems to provide improved service delivery and network control. The theoretical basis, results of simulation and implementation in a real-network are presented. It is demonstrated that better performance, utilisation and fairness can be achieved for network customers as well as network/service operators with a value based control system. A decentralised control system framework for analysing networked and shared resources is developed and demonstrated. This fits in with the fundamental principles of the Internet. It is demonstrated that distributed, multiple control loops can be run on shared resources and achieve proportional fairness in their allocation, without a central control. Some of the specific characteristic behaviours of the service and network layers are identified. The network and service layers are isolated such that each layer can evolve independently to fulfil their functions better. A common architecture pattern is devised to serve the different layers independently. The decision processes require no co-ordination between peers and hence improves scalability of the solution. The proposed architecture can readily fit into a clearinghouse mechanism for integration with business logic. This architecture can provide improved QoS and better revenue from both reservation-less and reservation-based networks. The limits on resource usage for different types of flows are analysed. A method that can sense and modify user utilities and support dynamic price offers is devised. An optimal control system (within the given conditions), automated provisioning, a packet scheduler to enforce the control and a measurement system etc are developed. The model can be extended to enhance the autonomicity of the computer communication networks in both client-server and P2P networks and can be introduced on the Internet in an incremental fashion. The ideas presented in the model built with the model-view-controller and electronic enterprise architecture frameworks are now independently developed elsewhere into common service delivery platforms for converged networks. Four US/EU patents were granted based on the work carried out for this thesis, for the cross-layer architecture, multi-layer scheme, measurement system and scheduler. Four conference papers were published and presented

    Reconfiguration in an Optical Multiring Interconnection Network - Masters Thesis, December 2002

    Get PDF
    The advent of optical technology that can feasibly support extremely high bandwidth chip-to-chip communication raises a host of architectural questions in the design of digital systems. Terabit per second (and higher) bandwidths have not been previously available at the chip level. In this thesis, we examine the use of this technology in two different scenarios, viz., as the interconnection network in a multiprocessor system and as a switch fabric in network routers. Specifically, we examine the performance gains associated with utilizing the bandwidth reconfiguration capabilities of a system based on this technology

    Efficient Q. S support for higt-performance interconnects

    Get PDF
    Las redes de interconexión son un componente clave en un gran número de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red. Las soluciones tradicionales para ofrecer qos en redes de interconexión de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propósito es alcanzar un soporte completo de qos con el mínimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento. Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de tráfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexión de altas prestaciones. Por último, también investigamos la interacción de los mecanismos de qos con el control de congestión

    Design of traffic shaper / scheduler for packet switches and DiffServ networks : algorithms and architectures

    Get PDF
    The convergence of communications, information, commerce and computing are creating a significant demand and opportunity for multimedia and multi-class communication services. In such environments, controlling the network behavior and guaranteeing the user\u27s quality of service is required. A flexible hierarchical sorting architecture which can function either as a traffic shaper or a scheduler according to the requirement of the traffic load is presented to meet the requirement. The core structure can be implemented as a hierarchical traffic shaper which can support a large number of connections with a wide variety of rates and burstiness without the loss of the granularity in cells\u27 conforming departure time. The hierarchical traffic shaper can implement the exact sorting scheme with a substantial reduced memory size by using two stages of timing queues, and with substantial reduction in complexity, without introducing any sorting inaccuracy. By setting a suitable threshold to the length of the departure queue and using a lookahead algorithm, the core structure can be converted to a hierarchical rateadaptive scheduler. Based on the traffic load, it can work as an exact sorting traffic shaper or a Generic Cell Rate Algorithm (GCRA) scheduler. Such a rate-adaptive scheduler can reduce the Cell Transfer Delay and the Maximum Memory Occupancy greatly while keeping the fairness in the bandwidth assignment which is the inherent characteristic of GCRA. By introducing a best-effort queue to accommodate besteffort traffic, the hierarchical sorting architecture can be changed to a near workconserving scheduler. It assigns remaining bandwidth to the best-effort traffic so that it improves the utilization, of the outlink while it guarantees the quality of service requirements of those services which require quality of service guarantees. The inherent flexibility of the hierarchical sorting architecture combined with intelligent algorithms determines its multiple functions. Its implementation not only can manage buffer and bandwidth resources effectively, but also does not require no more than off-the-shelf hardware technology. The correlation of the extra shaping delay and the rate of the connections is revealed, and an improved fair traffic shaping algorithm, Departure Event Driven plus Completing Service Time Resorting algorithm, is presented. The proposed algorithm introduces a resorting process into Departure Event Driven Traffic Shaping Algorithm to resolve the contention of multiple cells which are all eligible for transmission in the traffic shaper. By using the resorting process based on each connection\u27s rate, better fairness and flexibility in the bandwidth assignment for connections with wide range of rates can be given. A Dual Level Leaky Bucket Traffic Shaper(DLLBTS) architecture is proposed to be implemented at the edge nodes of Differentiated Services Networks in order to facilitate the quality of service management process. The proposed architecture can guarantee not only the class-based Service Level Agreement, but also the fair resource sharing among flows belonging to the same class. A simplified DLLBTS architecture is also given, which can achieve the goals of DLLBTS while maintain a very low implementation complexity so that it can be implemented with the current VLSI technology. In summary, the shaping and scheduling algorithms in the high speed packet switches and DiffServ networks are studied, and the intelligent implementation schemes are proposed for them

    A comparative survey of scheduling mechanisms in the internet

    Get PDF
    As the Internet is rapidly growing and its popularity increases, users tend to use creative, timeconservative, entertained and economical technologies. Real-time applications such as online gaming, voice and video applications are becoming more popular. Research effort to improve scheduling mechanisms in routers is currently given less attention by network researchers. This trend is far behind in industrial implementation and standards institutions. This paper attempts to compare the development in this subject from academic, standards and industry point of views. The results show that there is an enormous difference between academic research and standards and market domains in term of the evolution of scheduling mechanis
    corecore