I. INTRODUCTION

D
ESPITE the recent slowdown in the telecom equipment market, current estimates and measurements predict that Internet traffic will continue to grow for many years to come. Driving this growth is the fact that the Internet has moved from a convenience to a mission-critical platform for conducting of and succeeding in business. In addition, the provision of broadband services to end users will prolong this growth for many years to come. As a result, there is a great demand for gigabit/terabit electronic routers and switches (IP routers, ATM switches, Ethernet switches) that knit together the constituent networks of the global Internet, creating the illusion of a unified whole. These switches/routers must not only have an aggregate capacity of gigabits/terabits coupled with forwarding rates of billions of packets per second, but they must also deal with nontrivial issues such as scheduling support for differentiated services, a wide variety of interface types, scalability in terms of capacity and port density, and backward compatibility with a wide range of packet formats and routing protocols.
This special issue is a collection of high-quality papers, presenting state-of-the-art design and analysis of high-performance electronic packet switches and routers.
II. SCALABILITY
In the invited paper, "Scalable Electronic Packet Switches," Chiussi and Francini provide an overview of current state of the art of practical large packet switches and routers, and discuss the issues affecting their scalability. The attention falls to three major scalability aspects: implementation, support of quality-of-service (QoS), and multicasting. The impact of these aspects is shown on the most popular switch architectures.
III. IP LOOKUP
The use of classless interdomain routing (CIDR) allows arbitrary aggregation of network addresses and reduces routing Most previous schemes depend on the memory access technology, which limits their performance. Desai et al.present a fundamentally different approach in the paper, "Reconfigurable Finite-State Machine Based IP Lookup Engine for High-Speed Router." The IP address lookup problem is presented in the form of a large finite-state machine (FSM), which is then decomposed and implemented into reconfigurable hardware blocks. Performance of the proposed architecture breaks the memory bandwidth limitation and in principle is scalable with very large scale integration (VLSI) technology.
In the paper, "High-Speed IP Routing With Binary Decision Diagrams Based Hardware Address Lookup Engine," Sangireddy and Somani also notice the memory limitation problem. Their solution is a binary decision diagrams (BDDs) based optimized combinational logic, which can be implemented using reconfigurable hardware. The choice of BDD scheme proves to be more beneficial in the scenario that the number of physical ports in a router would increase continuously.
Another practical problem in IP lookup is the high cost of CAMs. In the paper, "Scalable IP Lookup for Internet Routers," Taylor et al. present a fast Internet protocol lookup (FIPL) architecture which utilizes tree bitmap algorithm. The architecture only uses a fraction of a reconfigurable logic device and a single commodity SRAM, offering an attractive alternative to expensive CAM-based commercial solutions.
IV. CROSSBAR SCHEDULING
The majority of the crossbar scheduling algorithms can be classified as maximum weight/size matching and their iterative pointer-based approximations, driven by either optimal performance or ease of implementation. The next two papers take a shift from legacy scheduling schemes, but are still targeting the provision of efficient crossbar matchings for high performance switches.
In the paper, "DISA: A Robust Scheduling Algorithm for Scalable Crosspoint-Based Switch Fabrics," Elhanany and Sadot present a nonpointer based approach. It performs a synchronized output reservation whereby each input selects a designated output while taking into consideration both local transmission requests and the availability of global resources. The robustness of the scheme under admissible traffic, without the need of speedup, is shown through analysis and computer simulations.
0733-8716/03$17.00 © 2003 IEEE In the paper, "Randomized Scheduling Algorithms for High-Aggregate Bandwidth Switches," Giaccone et al.exploit hardware parallelism and randomization to yield a set of scheduling algorithms: APSARA, LAURA, and SERENA. Noticing the slow change of queuing lengths in successive time slots, the authors use memory to simplify the implementation and a novel MERGE operation to ensure non-decreasing matching weight. The proposed algorithms are stable under any admissible arrival process. They are simpler than maximum weight matching (MWM) algorithms and achieve comparable delay performance.
V. QoS GUARANTEE
As link rate increases, more and more applications require the switching system to provide QoS guarantees. The following four papers involve different aspects of providing QoS, including packet classification, queue management, bandwidth management, and delay bound analysis.
Emerging Internet applications demands advanced packet classifiers. In the paper, "Fast and Scalable Packet Classification," van Lunteren and Engbersen propose a new multifield two-phase classification scheme, parallel packet classification. The scheme uses a novel encoding of the intermediate result vectors, which significantly reduces the storage requirements and minimizes the dependencies within the search structures, thus enabling fast incremental updates. It also involves several encoding styles that can be applied simultaneously and allow the storage efficiency and update dynamics to be tuned at the granularity of individual rules.
Active queue management (AQM) schemes aim to regulate ransmission control protocol (TCP) traffic in an efficient and fair way. Management decision is mainly based on the number of the flows in the buffer and data source rate of a flow. In the paper, "An Active Queue Management Scheme Based on a Capture-Recapture Model," Chan and Hamdi estimate the above parameters by randomly capture/recapture incoming packets. This approach can be implemented with low time/space complexity and experiments show that it closely approximates the "ideal" case where full state information is provided.
Applying fair queuing on output ports maybe ineffective when most of the packets nowadays are waiting at the input buffer. Zhang and Bhuyan realize this and address the problem of fair scheduling of packets in input-queued switch architectures. In the paper, "Deficit Round-Robin Scheduling for Input-Queued Switches," they propose a flow-based fair scheduling algorithm which can allocate the switch bandwidth in proportion to each flow's reservation. Such a scheme is demonstrated to achieve fair scheduling while providing high throughput and low latency. A practical version of the flow-based scheme, based on switch port, is also described.
Recent attention has been paid to the problem of minimizing the worst packet delay. The paper, "Scheduling Reserved Traffic in Input-Queued Switches: New Delay Bounds via Probabilistic Techniques," derives delay bounds for decomposition-based algorithms. Andrews and Vojnović show that by using probabilistic techniques they are able to tighten worst delay bounds in many scenarios.
VI. OQ EMULATION
Output queued (OQ) architecture is known to be of optimal performance amongst all queuing schemes. However, memory bandwidth limitation makes it not practical for large switch sizes. Recent research focuses on how to emulate the performance of OQ while using more practical approaches.
In the paper, "Output-Queued Switch Emulation by Fabrics With Limited Memory," Magill et al.present a switch architecture with input queuing, fabric queuing, flow-control between the limited fabric buffers and the inputs, and output queuing. This combined input/fabric/output queued (CIFOQ) switch with speedup of two is shown to emulate a broad class of scheduling algorithms operating an OQ switch. The use of limited amount of fabric buffers enables distributed scheduling and significantly reduces the scheduling complexity when compared with the memoryless combined input/output queued (CIOQ) architecture.
Rather than emulating output queuing exactly at the expense of complex algorithms or extra memory, Lee and Seo focus on matching the performance of an output-queued switch statistically using implementable schemes. In their paper, "A Practical Approach for Statistical Matching of Output Queuing," a novel multiple input/output-queued (MIOQ) switch architecture that requires no speedup is proposed. A multitoken-based round-robin arbiter and a virtual FIFO queuing scheme cooporate with the architecture, providing high operation rate and cell order guarantee. Additionally, the proposed switch naturally provides asymmetric bandwidth for inputs and outputs.
VII. MISCELLANEOUS
Small cell-based IP routers normally handle multicast traffic by attaching a bitmap local multicast label (LML) to each cell. Marsan et al. point out that this approach would induce intolerable overhead for switches with 128 ports or more. In their paper, "Compression of Multicast Labels in Large IP Routers," static, adaptive, and hybrid lossy compression algorithms to reduce the size of LML are discussed. Analytical and simulation models are used to investigate the performance of the different compression approaches.
In scheduling a single IQ/CIOQ switch, maximum weight matching (MWM) is identified as optimal due to 100% throughput under admissible traffic and satisfying delay performance. However, a usual MWM policy turns to be unstable in networks of interconnected IQ/CIOQ switches. The stability among switches is addressed in the paper "On the Stability of Local Scheduling Policies in Networks of Packet Switches With Input Queues." Marsan et al. analyze two scheduling policies, Birkhoff-von Neumann based and modified weight MWM based, using fluid models methodology. They identify these policies require no coordination among switches and guarantee 100% throughput in a network of IQ/CIOQ switches.
In the paper, "The Sliding-Window Packet Switch: A Shared-Memory Switch Architecture With Plural Memory Modules and Decentralized Control," Kumar proposes a new class of shared-memory packet switches. The architecture has separated multiple memory modules that are logically shared among all the ports of the switch and the control is decentralized. Decentralized switching functions enable the sliding-window switch to operate in a pipeline fashion and enhance scalability and switching capacity.
ACKNOWLEDGMENT
The Guest Editors would first like to acknowledge all authors for having chosen this special issue to disseminate their research results. We also appreciate the reviewers' time and effort in controlling the quality by providing detailed and constructive feedbacks. Last but not least, their sincere gratitude for the remarkable efforts of all the editorial staff of JSAC. (1997, 1998, 1999, and 2000) and the Conference on Lasers and Electrooptics CLEO (1999 and 2000) . He is also a Member of the Lasers and Electrooptic Society and the Optical Society of America.
MOUNIR HAMDI, Guest Editor-in-Chief
H. Jonathan Chao (F'01) received the B.S. and M.S. degrees in electrical engineering from National Chiao Tung University, Taiwan, and the Ph.D. degree in electrical engineering from Ohio State University, Columbus.
He is a Professor of electrical and computer engineering at Polytechnic University, NY, where he joined in January 1992. He has been doing research in the areas of terabit switches/routers, QoS control, optical networking, and network security. He holds more than 20 patents and has published over 100 journal and conference papers in the above areas. He has also served as a consultant for various companies, such as Lucent Technologies, NEC, and Telcordia. He has been giving short courses to industry people in the subjects of SONET/ATM/IP/MPLS networks for over a decade. 
