546 research outputs found

    Scalable parallel communications

    Get PDF
    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

    On the Stability of Isolated and Interconnected Input-Queued Switches under Multiclass Traffic

    Get PDF
    In this correspondence, we discuss the stability of scheduling algorithms for input-queueing (IQ) and combined input/output queueing (CIOQ) packet switches. First, we show that a wide class of IQ schedulers operating on multiple traffic classes can achieve 100 % throughput. Then, we address the problem of the maximum throughput achievable in a network of interconnected IQ switches and CIOQ switches loaded by multiclass traffic, and we devise some simple scheduling policies that guarantee 100 % throughput. Both the Lyapunov function methodology and the fluid modeling approach are used to obtain our results

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF

    A High Speed Hardware Scheduler for 1000-port Optical Packet Switches to Enable Scalable Data Centers

    Get PDF
    Meeting the exponential increase in the global demand for bandwidth has become a major concern for today's data centers. The scalability of any data center is defined by the maximum capacity and port count of the switching devices it employs, limited by total pin bandwidth on current electronic switch ASICs. Optical switches can provide higher capacity and port counts, and hence, can be used to transform data center scalability. We have recently demonstrated a 1000-port star-coupler based wavelength division multiplexed (WDM) and time division multiplexed (TDM) optical switch architecture offering a bandwidth of 32 Tbit/s with the use of fast wavelength-tunable transmitters and high-sensitivity coherent receivers. However, the major challenge in deploying such an optical switch to replace current electronic switches lies in designing and implementing a scalable scheduler capable of operating on packet timescales. In this paper, we present a pipelined and highly parallel electronic scheduler that configures the high-radix (1000-port) optical packet switch. The scheduler can process requests from 1000 nodes and allocate timeslots across 320 wavelength channels and 4000 wavelength-tunable transceivers within a time constraint of 1ÎŒs. Using the Opencell NanGate 45nm standard cell library, we show that the complete 1000-port parallel scheduler algorithm occupies a circuit area of 52.7mm2, 4-8x smaller than that of a high-performance switch ASIC, with a clock period of less than 8ns, enabling 138 scheduling iterations to be performed in 1ÎŒs. The performance of the scheduling algorithm is evaluated in comparison to maximal matching from graph theory and conventional software-based wavelength allocation heuristics. The parallel hardware scheduler is shown to achieve similar matching performance and network throughput while being orders of magnitude faster

    Ethernet - a survey on its fields of application

    Get PDF
    During the last decades, Ethernet progressively became the most widely used local area networking (LAN) technology. Apart from LAN installations, Ethernet became also attractive for many other fields of application, ranging from industry to avionics, telecommunication, and multimedia. The expanded application of this technology is mainly due to its significant assets like reduced cost, backward-compatibility, flexibility, and expandability. However, this new trend raises some problems concerning the services of the protocol and the requirements for each application. Therefore, specific adaptations prove essential to integrate this communication technology in each field of application. Our primary objective is to show how Ethernet has been enhanced to comply with the specific requirements of several application fields, particularly in transport, embedded and multimedia contexts. The paper first describes the common Ethernet LAN technology and highlights its main features. It reviews the most important specific Ethernet versions with respect to each application field’s requirements. Finally, we compare these different fields of application and we particularly focus on the fundamental concepts and the quality of service capabilities of each proposal

    A hardware scheduler based on task queues for FPGA-based embedded real-time systems

    Get PDF
    A hardware scheduler is developed to improve real-time performance of soft-core processor based computing systems. A hardware scheduler typically accelerates system performance at the cost of increased hardware resources, inflexibility and integration difficulty. However, the reprogrammability of FPGA-based systems removes the problems of inflexibility and integration difficulty. This paper introduces a new task-queue architecture to better support practical task controls and maintain good resource scaling. The scheduler can be configured to support various algorithms such as time sliced priority scheduling, Earliest Deadline First and Least Slack Time. The hardware scheduler reduces scheduling overhead by more than 1,000 clock cycles and raises the system utilization bound by a maximum 19.2 percent. Scheduling jitter is reduced from hundreds of clock cycles in software to just two or three cycles for most operations. The additional resource cost is no more than 17 percent of a typical softcore system for a small scale embedded application

    Efficient Q. S support for higt-performance interconnects

    Get PDF
    Las redes de interconexiĂłn son un componente clave en un gran nĂșmero de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red. Las soluciones tradicionales para ofrecer qos en redes de interconexiĂłn de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propĂłsito es alcanzar un soporte completo de qos con el mĂ­nimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento. Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de trĂĄfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexiĂłn de altas prestaciones. Por Ășltimo, tambiĂ©n investigamos la interacciĂłn de los mecanismos de qos con el control de congestiĂłn

    Control Plane Hardware Design for Optical Packet Switched Data Centre Networks

    Get PDF
    Optical packet switching for intra-data centre networks is key to addressing traffic requirements. Photonic integration and wavelength division multiplexing (WDM) can overcome bandwidth limits in switching systems. A promising technology to build a nanosecond-reconfigurable photonic-integrated switch, compatible with WDM, is the semiconductor optical amplifier (SOA). SOAs are typically used as gating elements in a broadcast-and-select (B\&S) configuration, to build an optical crossbar switch. For larger-size switching, a three-stage Clos network, based on crossbar nodes, is a viable architecture. However, the design of the switch control plane, is one of the barriers to packet switching; it should run on packet timescales, which becomes increasingly challenging as line rates get higher. The scheduler, used for the allocation of switch paths, limits control clock speed. To this end, the research contribution was the design of highly parallel hardware schedulers for crossbar and Clos network switches. On a field-programmable gate array (FPGA), the minimum scheduler clock period achieved was 5.0~ns and 5.4~ns, for a 32-port crossbar and Clos switch, respectively. By using parallel path allocation modules, one per Clos node, a minimum clock period of 7.0~ns was achieved, for a 256-port switch. For scheduler application-specific integrated circuit (ASIC) synthesis, this reduces to 2.0~ns; a record result enabling scalable packet switching. Furthermore, the control plane was demonstrated experimentally. Moreover, a cycle-accurate network emulator was developed to evaluate switch performance. Results showed a switch saturation throughput at a traffic load 60\% of capacity, with sub-microsecond packet latency, for a 256-port Clos switch, outperforming state-of-the-art optical packet switches
    • 

    corecore