Abstract -A novel configuration based general-purpose protocol processor is proposed. It can perform much faster protocol processing compared to general-purpose processors. As it is configuration based, different protocols can be configured for different protocols and different applications. The configurability makes compatibility possible, it also processes protocols very fast on the fly. The proposed architecture can be used as a platform or an accelerator for network-based applications.
1. Multiple ports and multiple Gigabits per second real-time framing and deframing. 2. To pre-process as much protocol jobs as possible before a memory access. 3. A general, simple, fast, and flexible architecture for different kinds of 4. A built in protocol recognition and automatic configuration capability. 5 . Low power, high speed, and memory (size and access) efficient protocols.
architecture.
Two kinds of protocol processors are available on the market nowadays, one is the specific single protocol-limited ASIC (we call it SPASIC in this paper), the other is the processor-based general-purpose CPU (we call it GPCPU in this paper). None of them can fit the requirements for future computer communications. The first one, SPASIC, is only used for one protocol or a few specific protocols included in the design. Obviously, it does not support future protocols. The second one, GPCPU, cannot work at very high speed because of the general architecture. As a redundant and speedlimited architecture, it is not the satisfactory solution for a relatively stable and control-extensive flow. From another point of view, the protocol processor must be compacted because it is often used as a pre-processor and as a small part in a certain kind of application. Therefore, the redundant architecture is not suitable for embedded or integrated solutions.
Most solutions available now use a specific circuit to process the protocol flow, and use a GPCPU for switching, routing, and other applications. Because of the limited SPASIC architecture, future flexibility is limited. For multiple applications, more SPASIC cores are integrated to cover more protocols and this makes the system redundant.
We need to recognize the protocol of the incoming package and then configure the processor to fit the protocol because the system might be used in a variable environment. Therefore, a new architecture is strongly requested, which is as fast as a SPASIC, as flexible as a GPCPU, and as simple as possible.
FUNCTIONAL COVERAGE OF DPSP
The system proposed is a new architecture for control-extensive processes, e.g. protocol processing. One example is to take the data package from AUI (Attachment Unit Interface of 1 OMb Ethernet), or MI1 (Medium-Independent Interface of lOOMb Ethernet), or GMII (for G-bits Ethernet). Fast pre-process for different level of protocols is performed, for example, from Ethernet to IP and even up to TCP on the fly.
We can solve all problems mentioned above by introducing the Deep pipeline serial processor DPSP. It executes the protocol processing based on a booted and predefined configuration. Since the control is based on the configuration instead of software programs, DPSP can process protocols in real speed, e.g. Gbit Ethernet. After booting, the configuration HW can be shut down, which gives possibilities of low power. Following this way, the application, e.g. IP telephone or IP switching can be separated from the protocol framing and de-framing. The advantages are:
1. Framing and de-framing are performed in a separate core; it acts as a platform or an accelerator and makes more application integration possible. 2. Separated the DPSP as a stand-alone machine working at high speed with a standard implementation. 3. All functional blocks inside the DPSP are self-contained and configured, therefore the adaptation to a long-term unpredictable future protocols is possible.
4.
The protocol can be recognized by this solution and a correct configuration can be booted to the DPSP after the recognition process. We define this feature as the self-learning and self-adaptation for any product used for different environments, e.g. home RF.
The architecture performs protocol processing based on both preconfigured setting and a real time control program. The pre-configured setting processes the protocol in every cycle inside each field of a data fiame. The real-time control program only works on the higher level such as branch decisions, macro selections, and job hand over. Thus, the processing speed can be much higher because there is no program (which is slow in principle) involved in sub level processing. By planning the configuration, the architecture can supply as good flexibility as that supplied by a GPCPU.
APPLICATION OVERVIEW
The goal is to make a platform for all possible network applications. Part of the possible applications and features supported by the platform can be listed:
1. Fast fiaming, de-fiaming for the Internet switching: G-bits Ethernet source, and destination address extraction, fast IP DA and SA extraction etc. 2 . Predict the memory allocation: relax memory traffic, payload reordering, etc. 3. Fast queue and priority check for the real time network applications. 4. For certain applications the products recognize the protocol of the coming data, and boot the protocol configuration after learning. 5. The user can boot different protocols for different applications. 6. For fast prototyping or SoC integration.
ARCHITECTURE
We introduce a new architecture that can work towards the physical limits of CMOS [3] . It can be implemented using conventional ASIC design flow, and can be configured by a program to suit different kinds of protocols and applications. The proposed architecture is divided into two parts. The first, which is the key part, namely Deep Pipeline Serial Processor (DPSP). Serial does not mean bit serial, it is a byte or a word based serial architecture. The second part is a normal micro-controller, the pC. The pC supports the DPSP configuration, the interface between DPSP and the application, and the realtime high-level job control. The DPSP can work much faster than the micro controller can.
The proposed architecture executes the protocol process based on both programs and pre-set configurations. The program only controls macro jobs, which are based on the frame rate instead of the byte rate. The pre-set configuration controls real time protocol processing at high speed with a relatively fixed control and working mode. Therefore, the program control induced speed limit is completely eliminated.
The proposed architecture is configured for a specific protocol before the protocol process. The configuration is performed by writing coefficients and control codes into control registers in a Functional Page, FP. All Functional Pages are scheduled in the order in which the protocol is processed in sequence.
For implementation convenience, data coming into every functional page is pipelined. Functional pages are connected one by one following the job schedule. Each FP manages its process in its own sub field. For example, the FP for CRC manages only the CRC check on the fly. Another example, the FP for header matching only matches the protocol header for its synchronization.
I -
The System block diagram is given in Fig. 1 . The left part is DPSP and the right part is the pC for configuration, applications, and for supporting applications. Different protocols can be executed according to the configuration given by pC. The pC performs the service support. Which is divided into three parts. The frst part is booting, including the boot of configurations for all FP and programs in the counter and controller. The second part is the DPSP monitoring, including checking the DPSP executing status, receiving and transmitting payload data, and sending interactive control. The third part is to coordinate DPSP with the application hardware. The configuration is performed during the power-on boot. When the protocol of the incoming data is unknown, the booting is performed for the protocol recognition first, and secondly, the normal protocol specific configuration according to the result of the recognition is booted. The DPSP top level architecture is given in Fig. 2 . Following functions will be allocated as FP's in the DPSP given in the above figure: 1 . Matching: It sets up the synchronization by recognizing the preamble. 2. Error checking: Check errors according to the coding of the protocol. easy and fast way according to extracted fields.
Fast ACK as an important function is performed on the fly in DPSP. Necessary messages such as DA and SA are kept for building the fast ACK. The FP for ACK is allocated between the shift-in and shift-out. The fast ACK packet can get TCP ACK, IP address and LAN address, e.g. Ethernet address from the buffer.
The data flow is given in Fig. 2 . The data coming fiom the physical level has been converted to byte level format and data rate is one eighth of the bit rate. Control signals (single pins) are handover start-finish strobes from the counter and controller. Control signals coming to the counter and controller gives timing status. Shift in and out are 8 bits input-output data of DPSP. Other width of data busses can be configured.
Functional Pages
Simple FP implementation can be done by custom design. Complicated FP will be implemented using synthesis. The flags are outputs from the sythesised logic using the configuration, the incoming data, and the control conditions as inputs. As an example, the matching unit uses configuration registers to save the header pattern. When the shifted input data matches the pattern at a certain point, a matching flag is given as Y-match = & (data, configuration-register). 
Counter and Controller
The counter and controller is a counter based state machine (FSM) adapted by configurations. A complete configuration set will be written into a register file. Each one or few lines in the register file are configured for the control of a FP.
There are two levels of controls performed in the "counter and controller".
The upper level control is specified as the handover process. The lower level control supports only the counting status. The upper level control is a kind of interactive control. The lower level control is not interactive because the FP uses the status as a control reference without giving feedback. The deep pipeline is scheduled inside each FP. The control of the deep pipeline is given by the lower level control fkom the "counter and controller". Status of the state machine is configured according to the recognized protocol, A group of control vectors for a specific FP is selected (addressed) by the counter. Therefore, the control procedure is scheduled following the configuration. The deep pipeline data path performs the protocol jobs in N+A cycles. Here N is the number of bytes (or words, according to which protocol is used) and A is the number of cycles used for hand over one job fkom one FP to another FP.
-- The control is scheduled in the following way:
6. Change the control procedure if the micro controller gives a new request. 7. Inform the micro controller to that the data is available. 8. Responde to the micro controller to accept data. 9. Send the accepted data to a FP responsible for the acknowlegement.
CONCLUSION:
We have described a configuration based DPSP architecture as a platform for network applications. The architecture implements the infi-astructure of an accelarator which gives the necessary fi-aming and de-fiaming, and a fast acknowledgement. Most protocol processes can be supported by DPSP architecture because of the flexible configuration. The configuration-based architecture can also support protocol recognition based on predefined protocol preambles. As the DPSP is a specific architecture for protocol processes, it can accelarate protocol processing on the fly for high speed applications.
ACKNOWLEDGEMENT
Authors would thank to the useh1 discussions with Dr. Kenny Ranerup Switchcore, Sweden. The research is supported by the Center for Industrial Technology at Linkoping University (CENIIT) and Exellence Center in Computer Science and Systems Engineering in LinkBping (ECSEL).
