6 research outputs found

    Cost effective RISC core supporting the large sending offload

    Get PDF
    The Ethernet speed has increased sending and receiving frames from 40 to 100 Gbps after the IEEE P802.3ba released. The industry and academia have focused scaling up the TCP/IP protocol processing for 40-100 Gbps. LSO is a de facto standard, which is offloaded to network interface for sending packets up to 10 Gbps. It not clears whether a network interface can support such function for new 40-100 Gbps. The widely use of the hardware-based NIC such as the use of a fully customized logic based network interface can be due to the following reasons; Still it is not clear whether the General Purpose Processor (GPP) can provide the processing required for high-speed line beyond the 10 Gbps. Also, the limit of the GPP's clock in supporting the processing of network interfaces. However, using a RISC core engine for offloading the LSO function can deliver some important features to network interfaces design, such as simplicity, scalability, shorter developing cycle time. In this paper, we have investigated using a specialized RISC core to process the LSO functions for TCP/IP and UDP/IP for high-speed communications rate up to 100 Gbps. To achieve this, we have enhanced the LSO algorithm to scale it to 100 Gbps. A fast DMA is used to support transferring data in the network interface. The LSO processing methodology on the network has presented. In addition, the RISC's performance and data movements for high communication rate up to 100 Gbps have been measured. A 148 MHz RISC core can support the sending-side processing for up to 100 Gbps transmission speed for the TCP/IP and UDP/IP protocol when the MTU is applied (1500 bytes). A DMA with 3759 MHz is required to eliminate the idle cycles while transferring data over the 64-bit local bus

    Exploiting Task-Level Concurrency in a Programmable Network Interface

    No full text
    Conference PaperProgrammable network interfaces provide the potential to extend the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. This paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware for a programmable controller with two processors. By carefully partitioning the handler procedures that process various events related to the progress of a packet, the system can minimize sharing, achieve load balance, and efficiently utilize on-chip storage. Compared to the uniprocessor firmware released by the manufacturer, the parallelized network interface firmware increases throughput by 65% for bidirectional UDP traffic of maximum-sized packets, 157% for bidirectional UDP traffic of minimum-sized packets, and 32-107% for real network services. This parallelization results in performance within 10-20% of a modern ASIC-based network interface for real network services.National Science Foundatio

    Exploiting task-level concurrency in a programmable network interface

    No full text

    Design of a scalable network interface to support enhanced TCP and UDP processing for high speed networks

    Get PDF
    Communication networks have advanced rapidly in providing additional services, with improvements made to their bandwidth and the integration of advanced technology. As the speed of networks exceeds 10 Gbps, the time frame for completing the processing of TCP and UDP packets has become extremely short. The design and implementation of high performance Network Interfaces (NIs) that can support offload protocol functions for current and next-generation networks is challenging. In this thesis two software approaches are presented to enhance protocol processing of TCP and UDP in the network interface. A novel software Large Receive Offload (LRO) approach for enhancing the receiving side has been proposed. The LRO works by aggregating the incoming TCP and UDP packets into larger packets inside the NI’s buffer. The receiving side software has been improved to support out-of-order packets. The second proposed software solution is applied on the Large Send Offload (LSO). The proposed LSO function processing is implemented by segmenting TCP and UDP messages that are larger than the Maximum Transmission Unit to the Maximum Segment Size. New packet headers are generated for each new outgoing packet. A scalable programmable NI based 32-bit RISC core is presented that can support 100 Gbps network speeds. Acceleration of the processing time frame required at the NI has been implemented to prevent hazards (such as Data Hazard and Control Hazard) during the execution of the LRO and the LSO functions. An R2000/3000 RISC has been used in order to test the LRO and LSO functions and to discover the instruction set that is most suitable. Following this the VHDL NI was implemented with three pipeline RISC cores, a simple DMA controller and Content Addressable Memory. An evaluation of the desired RISC clock rate that is required to process TCP and UDP streams at 100 Gbps was conducted. It was determined that a RISC core running at 752 MHz with a DMA clock of 3753 MHz was able to process packets 512 bytes or larger fast enough to support 100 Gbps network speeds
    corecore