Abstract-Signaling protocols, primarily used to set up and teardown connections, are essential in connection-oriented networks. Up to now, signaling protocols are mostly implemented in software for two reasons: complexity and the requirement for flexibility. Adversely, the price paid is in performance. Software implementations of signaling protocols are rarely capable of handling over 1000 calls/s. Corresponding call setup delays per switch are in the order of milliseconds. To improve performance for high-speed networks, we implemented a subset of the resource reservation protocol-traffic engineering signaling protocol in reconfigurable field programmable gate array hardware. Our implementation demonstrates the feasibility of 1000x speedup vis-à-vis software implementations. The impact of this work is far-reaching in that it enables connection-oriented networks to support new applications that require rate guarantees but have short call holding times.
signaling protocols in software, the price paid is in performance. Signaling protocol implementations in software are rarely capable of handling over 1000 calls/s. Correspondingly, call setup delays per switch are in the order of milliseconds [7] . This control-plane overhead will be increasingly significant as data link rates increase since data-plane transmission delays will correspondingly shrink.
Toward improving performance, we undertook a hardware implementation of a signaling protocol, RSVP-TE with extensions for GMPLS, specifically for SONET switches. We used reconfigurable hardware, i.e., field programmable gate arrays (FPGAs) to meet the requirement for flexibility. These devices are a compromise between general-purpose processors used in software implementations at one end of the flexibility-performance spectrum, and application specific integrated circuits (ASICs) at the opposite end of this spectrum. FPGAs can be reprogrammed with updated versions as signaling protocols evolve. As for the challenge posed by the complexity of signaling protocols, our approach is to implement only the time-critical operations of the signaling protocol in hardware, and relegate the nontime-critical operations to software.
In order to demonstrate the feasibility of hardware-accelerated signaling, we modeled RSVP-TE in VHDL and mapped it onto a Xilinx XC2V3000 FPGA (with 21% resource utilization). From the timing simulations, we determined that the control-plane aspects of a call can be handled in 7.2 s assuming a 50 MHz clock (this includes the processing time for three signaling messages involved in the setup and teardown of a connection, Path, Resv, and PathTear/ResvTear). Using a pipelined architecture, a call handling capacity of 400 000 calls/s can be achieved. It is a 100x-1000x speedup vis-à-vis its software counterpart.
We have designed a prototype board equipped with the RSVP-TE hardware signaling accelerator, as shown in Fig. 1(a) . The Gigabit Ethernet (GbE) interface works as the signaling channel. RSVP-TE signaling messages are carried on this interface. The TCAM and SRAM0 are used to store data tables. The SRAM1 buffers the transmitted but as-yet unacknowledged messages. The first-in-first-out (FIFO) is the interface between the hardware and software signaling modules. Fig. 1(b) shows a possible configuration of a circuit-switching system using this board. Besides the signaling module board, the system needs to be equipped with multiple line cards to carry user traffic, a CPU module to run the software signaling process, a switch fabric module, and a power module.
The impact of this work can be far-reaching. By decreasing call processing delays and increasing call handling capacities, it becomes conceivable to set up and tear down connections more often. This allows for a finer granularity of resource sharing and, hence, better utilization. This work will enable the use of rateguaranteed connections for shorter-duration calls than currently possible. For example, many research efforts are underway to provide scientists rate-guaranteed connections for transfers of their large files [8] . As data rates increase to 10 Gb/s and beyond, file transmission delays will decrease making any overhead associated with connection setup more and more significant. Therefore, given that the signaling message processing needed to establish connections is clearly an overhead, any effort to decrease this overhead will increase the potential uses of connection-oriented networks. Our work is motivated by this observation.
Section II presents background on connection setup and teardown procedures and surveys prior work on this topic. Section III describes the subset of RSVP-TE that we defined for hardware acceleration. Section IV describes our hardware implementation, while Section V summarizes our conclusions.
II. BACKGROUND AND PRIOR WORK
In this section, we will provide a brief review of connection setup and teardown. We will also describe related prior work.
A. Background
Connection setup and teardown procedures, along with the associated message exchanges, constitute a signaling protocol. Setting up a connection at a switch consists of five steps.
Step 1) Determining the next-hop switch toward which the connection should be routed.
Step 2) Checking for the availability of and reserving required resources (link capacity and optionally buffer space).
Step 3) Assigning "labels" for the connection. The exact form of the "label" depends on the type of connection-oriented network. For example, in synchronous optical network/synchronous digital hierarchy (SONET/SDH) switches, a "label" identifies time slot(s) on the input and output switch interfaces.
Step 4) Programming the switch fabric to map incoming labels to outgoing labels.
Step 5) Updating the state information associated with the connection. In a typical connection setup procedure, as illustrated in Fig. 2 , a signaling message requesting the setup of a connection (e.g., Path message in RSVP-TE) progresses from the calling end device toward the called end device hop-by-hop, and a response signaling message (e.g., Resv message in RSVP-TE) travels in the reverse direction, again hop-by-hop. The first two steps should be performed in the forward direction so that connection is routed along a path on which resources are available. The last step, updating the state information, should be performed in both directions. Steps 3) and 4) could be performed as signaling messages to proceed in either direction. For example, in [4] and [6] , labels are assigned in the reverse direction and carried in Resv messages. However, in [6] , a switch can also assign labels in the forward direction and include these as "suggested" labels in Path messages. The connection release procedure follows a similar hop-by-hop approach. In RSVP-TE, this procedure may be initiated by either end with PathTear/ResvTear message. Switches processing the PathTear/ResvTear messages free up resources and label assignments. Fig. 3 illustrates the unfolded view of a switch in connection-oriented networks. The user plane hardware consists of a switch fabric and line cards that terminate input/output interfaces carrying user traffic. The control plane unit consists of a signaling protocol engine, which could have a hardware accelerator as we are proposing, or be completely implemented in software. Input and output signaling interfaces carry signaling messages. These are "logical" and could be realized as multiplexed channels on the user plane interfaces.
B. Prior Work
There are many signaling protocol standards, as listed in Section I. In addition, many other signaling protocols have also been proposed in the literature [9] - [14] . Some of these protocols such as fast reservation schemes [9] , [10] , YESSIR [11] , and PCC [12] have been designed to achieve low call setup delays by improving the signaling protocols themselves. Fast reservation protocol (FRP) [13] is the only signaling protocol that has been implemented in ASIC hardware. Such an ASIC implementation is inflexible because upgrading the signaling protocol implementation entails a complete redesign of the ASIC. Recently, a simple signaling protocol called "JumpStart," designed for hardware implementation, was proposed in [14] for burst-switched networks.
In [15] , we proposed a performance-oriented signaling protocol called optical circuit signaling protocol (OCSP), which we defined for SONET networks. Our primary goal in designing OCSP was to achieve high performance. As a consequence, OCSP was designed to be simple enough for hardware implementation. Our challenge now is to take a generic flexible signaling protocol such as RSVP-TE, which was not defined for high performance, and yet demonstrate a hardware-accelerated implementation.
Other comparable protocols implemented in hardware include transmission control protocol (TCP). In [16] , Benz proposed to implement the "normal" TCP functionalities in hardware and handle complex functionalities such as congestion control, error control, in software. He implemented his approach on the Myrinet platform. A similar concept, TCP offload engine (TOE) [17] , is gaining some popularity in today's market. These solutions offload part of the TCP functionalities from the CPU to a coprocessor located on the network interface card (NIC) or Host Bus Adapter (HBA, the NIC equivalent in storage area networks). [18] proposed to implement a technique called TCP switching, in which the TCP SYNchronize segment is used to trigger connection setup and TCP FINish segment is used to trigger connection release. By processing these segments inside switches, the TCP SYN/FIN procedures become comparable to a signaling protocol for connection setup/release. The authors planned to implement this technique in FPGAs.
Molinero-Fernandez and Mckeown

III. SUBSET OF RSVP-TE FOR HARDWARE IMPLEMENTATION
As a generalized signaling protocol targeting different type of connection-oriented networks, RSVP-TE with extensions for GMPLS is complex. But most of the signaling functions are nontime-critical. It is not only impractical but unnecessary to implement the complete RSVP-TE signaling protocol in hardware. Our approach is to extract a subset of RSVP-TE functions for hardware acceleration and relegate the remaining functions to software. The former should be large enough to handle time-critical signaling functions and yet small enough to make hardware implementation feasible.
The following six aspects of GMPLS signaling protocols make hardware implementation difficult [15] .
1) The implementation must handle a large number of object (parameter) types and values for fields within objects that have been defined to support a variety of switches. 2) There is a need to generate new messages different from the received messages and/or automatically initiate messages.
3) The signaling engine must maintain state information associated with each connection. 4) Support for timers is required. 5) The signaling engine must parse out data from the flexibly encoded parameters structured type-length-value (TLV) format. 6) Additional difficulties arise because the GMPLS protocol was designed without high-performance implementation as an objective. For example, consider the connection reference parameter used to identify a connection within a switch. Since the connection reference parameter in the GMPLS protocol has a global significance, it is large. If this parameter is designed to have only a local significance, the size of this parameter can be reduced significantly making data table lookups simpler. Some of these difficulties can be overcome by defining a subset of the GMPLS RSVP-TE signaling protocol for hardware implementation, while others require innovative implementation techniques. In this section, we address those difficulties (specifically, the first three) that can be overcome by limiting the features supported in the subset. In Section IV, we will discuss implementation techniques.
We start with the specification of GMPLS RSVP-TE with extensions for SONET/SDH networks, and define a subset to include the following functionality.
• The common-case scenario handling of messages related to the setup and release of point-to-point unidirectional SONET circuits at a transit SONET switch that operates at a cross-connect rate of STS-1.
• IPv4 addresses are used to identify sources and destinations of the SONET circuits. The switch itself and its neighboring switches are assumed to have only one Internet protocol (IP) address per node. Interface numbers are used to identify interfaces of a switch. Separation of control plane from the user plane is supported by this subset.
• We allow for the presence of logical links on end-to-end paths. This means switches that are not physically connected by direct links can still be RSVP-TE neighbors.
A. RSVP-TE Signaling Messages
RSVP defined seven signaling messages, Path, Resv, PathErr, ResvErr, PathTear, ResvTear, and ResvConf. RSVP-TE added the Hello message for node failure detection. When RSVP-TE was extended for GMPLS, the Notify message was introduced to support fast failure notification. Among these messages, Path and Resv messages are used to set up a connection, while PathTear/ResvTear messages are used to tear down a connection. We selected these four messages for hardware acceleration because Path and Resv messages are needed for connection setup, the procedure targeted for speedup, and PathTear/ResvTear because resources need to be released at a pace corresponding to the fast setups. The remaining messages are relegated to software.
B. Objects and Fields Within Objects
Each RSVP-TE message begins with a common header followed by a variable number of variable-length "objects." As noted earlier, there are a large number of objects and fields within objects. Consider the SESSION object as an example. In RSVP, a C-Type of 1 was defined for the SESSION object. The fields include destination address, protocol ID, and destination port number. However, with the extension of RSVP to RSVP-TE, a new C-Type 7 was defined with the fields redefined to include destination address, tunnel ID, and extended tunnel ID. This means the signaling engine should first read the C-Type field, and then based on its value parse the remainder of the object. This example illustrates why a hardware implementation for handling these objects can become quite complex. Our approach to this problem is to define C-Type of 7 in the subset and delegate the handling of all messages carrying a SESSION object with a different C-Type to the software signaling process.
C. State Transition Diagram for a Connection at a Switch
Unlike in stateless connectionless networks, in connectionoriented networks, a switch needs to maintain state information for each connection. We defined four states for a connection, as shown in Fig. 4 . Before a Path message arrives, the connection does not exist and is, hence, in the NULL state. After a Path message is received, a next-hop IP address is determined and resources are reserved, the connection enters the RESERVED state. When a Resv message arrives, the switch fabric is programmed and the connection enters the ESTABLISHED state. When a PathTear/ResvTear message is received, the allocated resources on the switch are released; the state of the connection returns to NULL. The connection enters the SOFT-WARE_CTRL state when the hardware accelerator cannot handle a received message, in which case it passes control of this connection's state table entry to the software module.
D. Optional Objects
Besides all the mandatory objects in Path, Resv, PathTear, and ResvTear messages, we support three optional objects, MESSAGE_ID, MESSAGE_ID_ACK, and SUGGESTED_ LABEL [6] , [19] .
RSVP supports the notion of soft-state and periodic refresh messages. If a refresh is not received before the timeout interval expires, connections are released. Since packet forwarding is based on the IP routing data table, as routing data changes, the resource reservations need to follow. Hence, RSVP [20] included the use of refresh message. A second reason for refresh messages is that since RSVP uses unreliable IP service, the occasional loss of an RSVP message is handled through refreshes. However, if the refresh interval is small, the overhead spent in processing refresh messages can become excessive; while if the refresh interval is large, it takes longer to detect the loss of an RSVP message. RFC 2961 [19] makes the case for not using refresh messages in GMPLS networks, where once a circuit is established, the routing data table is not consulted for data forwarding. To handle the reliability issue, it introduces new objects MESSAGE_ID and MESSAGE_ID_ACK, along with the concept of retransmission timers and exponential backoffs to ensure reliable message transport. Since refresh timeout values are mandatory in RSVP, the hardware signaling accelerator accepts these values, but processing of refresh messages will be relegated to the software signaling process. We expect that refresh messages are not likely to be used in GMPLS networks, and even though the extensions proposed in [19] are currently optional, we expect these to become widely adopted in implementations of RSVP-TE for GMPLS networks. Hence, we support these objects in our hardware-accelerated subset.
In RSVP, the downstream switch selects the label. RSVP-TE with extensions for GMPLS [6] allows for the Path message to carry a SUGGESTED_LABEL object, though it is optional. We note that there is potential conflict in SONET networks if the SUGGESTED_LABEL object in the Path message is left as optional. Consider the following scenario. An output interface (STS-12) has four available time slots, 000 011 111 111 (one bit for each time slot, "0" stands for "available" and "1" stands for "occupied"). A first call requests an STS-3c; the upstream switch tentatively reserves the first three time slots. A second call requesting an STS-1 arrives next, and needs to be routed on the same output interface. The switch will then make a tentative reservation of the remaining time slot. If the Resv message for the second call returns first, and the downstream switch assigns a time slot different from the one tentatively reserved in forward direction, the first call for which a tentative reservation was made can no longer be accommodated because the it requires a concatenated assignment. Hence, we recommend that the SUGGESTED_LABEL object be mandatory in Path messages to force the downstream switch to use the label selected by the upstream switch. This is the result of RSVP-TE growing out of RSVP, which was developed as a protocol for receiver-initiated additions to a multicast tree. In GMPLS networks, where hard resource reservations of time slots and wavelengths are necessary, a reservation and corresponding timeslot/wavelength selection needs to be made in the forward direction of call setup.
IV. HARDWARE IMPLEMENTATION OF THE RSVP-TE SUBSET
To demonstrate the feasibility of hardware-accelerated RSVP-TE signaling, we implement the RSVP-TE subset described in Section III onto a Xilinx Virtex II FPGA, which we call hardware signaling accelerator [recall the FPGA in Fig. 1(a) ]. Fig. 5 illustrates the architecture of the hardware signaling accelerator. It has three stages, object dispatching, object processing, and object assembling. In the object dispatching stage, signaling messages are buffered in an internal static random access memory (SRAM) for checksum verification. Meanwhile, objects and fields are checked and dispatched to different registers in the register bank. In the object processing stage, message objects are processed in parallel by the object processors. Finally, appropriate objects are reassembled into a new message, which is then sent to the next switch. These three stages are fully pipelined to achieve a high throughput.
Object processing will often require the reading and/or writing of data stored in external TCAM and SRAM devices located on the signaling module board along with the FPGA, as illustrated in Fig. 1(a) . The TCAM and SRAM interfaces within the FPGA (shown in Fig. 5 ) control access to these external memory devices. The GbE Interface, together with external GbE MAC device [ Fig. 1(a) ], provides a 1 Gb/s channel for signaling messages. The FIFO Interface on the FPGA controls the data flow from internal message buffer to external message buffer (FIFO). The hardware signaling accelerator configures external switch fabric devices, which will be located on a switch fabric module board as illustrated in Fig. 1(b) , through the switch fabric interface. Finally, the whole signaling module board communicates with other boards through the high-speed backplane interface.
A. Implementation of the Object Dispatcher
Instead of defining objects at fixed locations within messages, RSVP-TE uses a flexible TLV structure. Each object is a selfcontained element and composed of a type field (Class-Num and C-Type), a length field (Length), and a variable-length value field (Object contents). The objects can appear in any order (with some constraints). The TLV structure was designed for flexibility, allowing protocol designers to add parameters in arbitrary order. But this construct makes parameter extraction in hardware a complex task. This difficulty (aspect 5 in the list provided in Section III), however, cannot be overcome by limiting the subset definition. It needs innovative implementation, which is described in Section IV.
The TLV structure makes parameter extraction in hardware a complex task. For example, when processing an IP packet header, the hardware can always extract the fifth word from the IP header to obtain the destination IP address. But in RSVP_TE messages, since the SESSION object carrying the destination IP address can occur anywhere in the message, it is hard to find the object and extract its fields.
In order to solve this challenge, we design a scheme with two-level dispatching. The message dispatcher first delimits a message and objects based on message length and object length. The delimited objects are then sent to all object dispatchers. But only the object dispatcher matching the object type is triggered and the fields in the object are dispatched into corresponding registers. The unknown object processor captures all unsupported objects. If a message contains such an object (i.e., one outside the set of objects defined in the RSVP-TE subset), the message will be passed to the software signaling process. This scheme is flexible.
As objects are dispatched, different fields of the objects are verified with the preset values supported by the subset we defined. These values could either be "hardwired" or set in initialization registers. For example, our hardware implementation only supports the SESSION object with C-Type 7, as described in Section III. This C-Type value (7) is "hardwired." An example of an object whose field values are handled by storing data in initialization registers is the LABEL_REQUEST object. This object specifies the type of switch supported by this signaling engine. Since we are defining this subset for a SONET switch, the field LSP encoding type within this object can have only one value, i.e., 5 (for SONET/SDH). The field switching type within this object can have one value 100 (for TDM). We place these two numbers 5 and 100 in corresponding registers during initialization time and design the hardware circuitry to compare the values in an incoming message carrying the LABEL_RE-QUEST object with the values stored in these registers. This makes our hardware implementation more flexible allowing the same implementation to be used even if the switch is, for example, a lambda switch, in which case these two values would be reset to 8 and 150, respectively. The tradeoff between performance and flexibility was considered when we made our choices of which values to hardwire and which values to store in initialization registers.
B. Implementation of Data Tables
We defined six data tables to support our implementation of the RSVP-TE subset defined in Section III, including routing  table, incoming connectivity table, outgoing connectivity table,  outgoing connection admission control (CAC) table, user/control mapping table, and state table. These data tables reside in the SRAM and TCAM devices on the signaling module board [ Fig. 1(a) ] or in the SRAM block within the FPGA. The mapping of the data tables into TCAM, SRAM, and FPGA memory is shown in Fig. 6 .
The routing table is used to determine the next-hop switch. Traditionally a routing table is implemented in software using Trie data structures [21] . Trie-based routing tables are often large, and lookups are slow. Many schemes have been proposed to compress the size of the routing table and improve its performance [22] , [23] . However, these schemes need complex user logic. We use a TCAM and a SRAM to store the routing table. The index part, a 32-bit destination IP address, is stored in a TCAM, and the return value, a 32-bit next-hop address, is stored in an associated SRAM. TCAMs have fast lookup speeds, are flexible, and allow for simple user logic. The drawback of TCAMs is scalability, which we currently ignore because of the prototype nature of this implementation.
The incoming connectivity table determines how the interface ID used by a neighbor maps to a local input interface. The index into the table, stored in the TCAM, is the combination of a 32-bit previous IP address and a 6-bit output interface number (the target switch fabric device we choose, VSC9182, has 64 output interfaces; each can be identified by a 6-bit interface number). With the 38-bit index, the TCAM lookup yields the matched address and the least 6-bit can be used to identify an input interface. Similarly, the outgoing connectivity table maintains data on the local output interfaces leading to neighboring switches. The outgoing CAC in the order of thousands. For example, the VSC9182 device can support 768 simultaneous connections. Even the VSC9187 device, with its lower cross-connect rate of VT1.5, supports a maximum number of 3,024 simultaneous connections. We implement a state table with 1 K entries in the TCAM and SRAM. The 128-bit five-tuple used to identify a connection is stored in TCAM and the associated state information is stored in SRAM as shown in Fig. 6 . Since the width of the TCAM is 72-bit, each 128-bit five-tuple is split into two 64-bit words.
C. Retransmission Management
Timers are required to support the solutions proposed in [19] for reliable message transmission. All signaling messages carry MESSAGE_ID object, which is acknowledged by MESSAGE _ID_ACK object carried in an Ack message or piggy-backed in a message in the reverse direction. If the MESSAGE_ID_ACK is not received before a retransmission timer times out at the sender, the message is retransmitted. A second timer, which we call piggyback timer, is used to hold MESSAGE_ID_ACK objects awaiting a message to be sent in the reverse direction to avoid unnecessary Ack messages. An Ack message is generated only if this timer expires. Fig. 7 illustrates the proposed retransmission management scheme. The hardware signaling accelerator maintains a system timer , which provides system timing for all other timers. On the transmitting side, when a signaling message is sent out, the message, together with a time tag marking the transmission instant (the value at the moment the message is transmitted), is copied to the unacknowledged message buffer. The buffer is organized as FIFO. Therefore, the head of the queue always contains the oldest message. The time tag of the head message is copied to the retransmission timer . Assuming the initial time-out value is , when , the retransmission timer times out and, hence, the message is retransmitted and copied to the buffer for the first retransmission. The first retransmission buffer is organized in a similar way, with a retransmission timer , and a time-out value of (exponential backoff). On the receiving side, for each neighbor , there is a piggyback timer and a buffer organized as a FIFO. Each entry in the buffer contains a MESSAGE_ID_ACK object destined for and the associated time when the corresponding MESSAGE_ID was received. Entries in each buffer are time ordered given the FIFO nature. Piggyback timer will expire if the head entry expires. If there is an RSVP-TE message destined for neighbor before piggyback timer expires, all pending MESSAGE_ID_ACK objects for neighbor can be piggybacked on to the RSVP-TE message. Otherwise, a separate Ack message will be generated to carry these MESSAGE_ID_ACK objects.
D. Simulation
We developed a prototype VHDL model for the signaling hardware accelerator, used Synplify for synthesis and Xilinx ISE for place and route. The implementation uses only 14% of the FPGA resources (Xilinx XC2V3000).
We performed timing simulations of the signaling hardware accelerator using ModelSim simulator. Processing the Path message, which involves the access and updating of the data tables, takes 37 clock cycles. The time to receive a Path message is 40 clock cycles, as is the time to transmit the outgoing Path message. Receiving, processing, and transmitting all other messages takes no more than 40 clock cycles each. Since receiving, processing, and transmitting stages are fully pipelined (lockstep operation) to achieve high throughput, idle cycles are inserted if a stage is not ready. As a worst case estimate, the total time for receiving, processing, and transmitting a single message consumes 120 clock cycles, which is 2.4 s with a 50 MHz clock. Since connection setup/release requires the handling of three signaling messages, we require a total of 7.2 s/call. The call handling capacity is as high as 400 000 calls/s because of pipelining, which allows the system to accept a new message every 40 clock cycles.
V. CONCLUSION
Implementation of signaling protocols in hardware poses a considerably larger number of problems than implementing user plane protocols such as IP, ATM, etc. Our implementation has demonstrated the hardware handling of functions such as parsing out various fields of TLV-encoded messages, maintaining state information, writing resource availability tables, etc., all of which are operations not encountered when processing IP headers or ATM headers. We also demonstrated the significant performance gains of hardware-accelerated implementation of RSVP-TE, i.e., call handling within a few microseconds. Overall, this prototype implementation of RSVP-TE in FPGA hardware has demonstrated the potential for 100x-1000x speedup vis-à-vis software implementations on state-of-the-art processors. Currently we are building the prototype board and developing related software to set up a demonstration system.
