Search CORE

3 research outputs found

Reducing Data Copying Overhead in Web Servers

Author: Yeung Gary
Publication venue: 'University of Waterloo'
Publication date: 01/06/2010
Field of study

Web servers that generate dynamic content are widely used in the development of Internet applications. With the Internet highly connected to people’s lifestyles, the service requirements of Internet applications have increased significantly. This increasing trend intensifies the need to improve server performance in dynamic content generation. In this thesis, we describe the opportunity to improve server performance by co-locating the web server and the application server on the same machine. We identify related work and discuss their respective advantages and deficiencies. We then introduce and explain our technique that passes the client socket’s file descriptor from the web server process to the application server. This allows the application server to reply to the client directly, reducing the amount of data copied and improving server performance. Experiments were designed to evaluate the performance of this technique and provide a detailed analysis of processor time and data copying during response delivery. A performance comparison against alternative approaches has been performed. We analyze the results to understand factors in data copying efficiency and determine that cache misses are an important factor in server performance. There are four major contributions in this thesis. First, we show that in multiprocessor environments, co-locating web servers and application servers can take advantage of faster communication. Second, we introduce a new technique that reduces the amount of data copied by two-thirds. This technique requires no modifications to the application server code (other existing techniques do), and it is also applicable in a variety of systems, allowing easy adoption in production environments. Third, we provide a performance comparison against other approaches and raise questions regarding data copying efficiency. Our technique attains an average peak throughput of 1.27 times the FastCGI with Unix domain sockets in both uniprocessor and multiprocessor environments. Finally, our analysis on the effect of cache misses on server performance provides valuable insights into why these benefits are obtained

University of Waterloo's Institutional Repository

Reducing Internet Latency : A Survey of Techniques and their Merit

Author: Briscoe Bob
Brunstrom Anna
Fairhurst Gorry
Gjessing Stein
Griwodz Carsten
Hayes David
Petlund Andreas
Ross David
Tsan Ing-Jyh
Welzl Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Bob Briscoe, Anna Brunstrom, Andreas Petlund, David Hayes, David Ros, Ing-Jyh Tsang, Stein Gjessing, Gorry Fairhurst, Carsten Griwodz, Michael WelzlPeer reviewedPreprin

Aberdeen University Research

Design of a scalable network interface to support enhanced TCP and UDP processing for high speed networks

Author: Elbeshti Mohamed
Publication venue
Publication date: 01/01/2014
Field of study

Communication networks have advanced rapidly in providing additional services, with improvements made to their bandwidth and the integration of advanced technology. As the speed of networks exceeds 10 Gbps, the time frame for completing the processing of TCP and UDP packets has become extremely short. The design and implementation of high performance Network Interfaces (NIs) that can support offload protocol functions for current and next-generation networks is challenging. In this thesis two software approaches are presented to enhance protocol processing of TCP and UDP in the network interface. A novel software Large Receive Offload (LRO) approach for enhancing the receiving side has been proposed. The LRO works by aggregating the incoming TCP and UDP packets into larger packets inside the NI’s buffer. The receiving side software has been improved to support out-of-order packets. The second proposed software solution is applied on the Large Send Offload (LSO). The proposed LSO function processing is implemented by segmenting TCP and UDP messages that are larger than the Maximum Transmission Unit to the Maximum Segment Size. New packet headers are generated for each new outgoing packet. A scalable programmable NI based 32-bit RISC core is presented that can support 100 Gbps network speeds. Acceleration of the processing time frame required at the NI has been implemented to prevent hazards (such as Data Hazard and Control Hazard) during the execution of the LRO and the LSO functions. An R2000/3000 RISC has been used in order to test the LRO and LSO functions and to discover the instruction set that is most suitable. Following this the VHDL NI was implemented with three pipeline RISC cores, a simple DMA controller and Content Addressable Memory. An evaluation of the desired RISC clock rate that is required to process TCP and UDP streams at 100 Gbps was conducted. It was determined that a RISC core running at 752 MHz with a DMA clock of 3753 MHz was able to process packets 512 bytes or larger fast enough to support 100 Gbps network speeds

Research Repository