163 research outputs found

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF

    Multi-Terabit/s IP Switching with Guaranteed Service for Streaming Traffic

    Full text link
    traffic on the Internet continues to grow exponentially, there is a real need to solve transmission and switching scalability. Moreover, future Internet traffic will be dominated by streaming media flows, such as video-telephony, video-conferencing, 3D video, virtual reality, and many more. Consequently, network solutions will need to offer quality of service and traffic engineering together with the above mentioned scalability - i.e., over-provisioning is not likely be a viable solution to accommodate streaming media traffic. This paper describes the architecture of a ultra-scalable IP switch and the first experiments with a prototypal implementation. The switch scalability is a consequence of it operating pipeline forwarding of packets, which also results in quality of service guarantees for UDP-based streaming applications, while preserving elastic TCP-based traffic as is, i.e., without affecting any existing applications based on "best- effort" services. Moreover, the prototype demonstrates the low complexity of pipeline forwarding implementation as the deployed network gear was realized from off-the-shelf components in only nine months through the design, implementation, and testing efforts of the authors

    Distributed Router Architecture for Packet-Routed Optical Networks

    Full text link

    An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

    Get PDF
    By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application

    Enhancing QoS provisioning and granularity in next generation internet

    Get PDF
    Next Generation IP technology has the potential to prevail, both in the access and in the core networks, as we are moving towards a multi-service, multimedia and high-speed networking environment. Many new applications, including the multimedia applications, have been developed and deployed, and demand Quality of Service (QoS) support from the Internet, in addition to the current best effort service. Therefore, QoS provisioning techniques in the Internet to guarantee some specific QoS parameters are more a requirement than a desire. Due to the large amount of data flows and bandwidth demand, as well as the various QoS requirements, scalability and fine granularity in QoS provisioning are required. In this dissertation, the end-to-end QoS provisioning mechanisms are mainly studied, in order to provide scalable services with fine granularity to the users, so that both users and network service providers can achieve more benefits from the QoS provisioned in the network. To provide the end-to-end QoS guarantee, single-node QoS provisioning schemes have to be deployed at each router, and therefore, in this dissertation, such schemes are studied prior to the study of the end-to-end QoS provisioning mechanisms. Specifically, the effective sharing of the output bandwidth among the large amount of data flows is studied, so that fairness in the bandwidth allocation among the flows can be achieved in a scalable fashion. A dual-rate grouping architecture is proposed in this dissertation, in which the granularity in rate allocation can be enhanced, while the scalability of the one-rate grouping architecture is still maintained. It is demonstrated that the dual-rate grouping architecture approximates the ideal per-flow based PFQ architecture better than the one-rate grouping architecture, and provides better immunity capability. On the end-to-end QoS provisioning, a new Endpoint Admission Control scheme for Diffserv networks, referred to as Explicit Endpoint Admission Control (EEAC), is proposed, in which the admission control decision is made by the end hosts based on the end-to-end performance of the network. A novel concept, namely the service vector, is introduced, by which an end host can choose different services at different routers along its data path. Thus, the proposed service provisioning paradigm decouples the end-to-end QoS provisioning from the service provisioning at each router, and the end-to-end QoS granularity in the Diffserv networks can be enhanced, while the implementation complexity of the Diffserv model is maintained. Furthermore, several aspects of the implementation of the EEAC and service vector paradigm, referred to as EEAC-SV, in the Diffserv architecture are also investigated. The performance analysis and simulation results demonstrate that the proposed EEAC-SV scheme, not only increases the benefit to the service users, but also enhances the benefit to the network service provider in terms of network resource utilization. The study also indicates that the proposed EEAC-SV scheme can provide a compatible and friendly networking environment to the conventional TCP flows, and the scheme can be deployed in the current Internet in an incremental and gradual fashion

    Node design in optical packet switched networks

    Get PDF
    • 

    corecore