2,181 research outputs found

    Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks

    Get PDF
    Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsulation, and the semantic gap between virtual Ethernet features and underlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks

    Throughput-Delay Analysis of Interrupt-Driven Kernels with DMA Enabled and Disabled in High-Speed Networks

    Get PDF
    Interrupt processing can be a major bottleneck in the end-to-end performance of high-speed networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. Under heavy network traffic, the system performance will be negatively affected due to interrupt overhead caused by the incoming traffic. In particular, excessive latency and significant degradation in system throughput can be experienced. In this paper, we present a throughput-delay analysis of such behavior. We develop analytical models based on queueing theory and Markov processes. In our analysis, we consider and model three systems: ideal, PIO, and DMA. In ideal system, the interrupt overhead is ignored. In PIO, DMA is disabled and copying of incoming packets is performed by the CPU. In DMA, copying of incoming packet is performed by DMA engines. For high-speed network hosts, both PIO and DMA can be desirable configuration options. The analysis yields insight into understanding and predicting the impact of system and network choices on the performance of interrupt-driven systems when subjected to light and heavy network loads. Simulations and reported experimental results show that our analytical models are valid and give a good approximation

    Throughput-Delay Analysis of Interrupt-Driven Kernels with DMA Enabled and Disabled in High-Speed Networks

    Get PDF
    Interrupt processing can be a major bottleneck in the end-to-end performance of high-speed networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. Under heavy network traffic, the system performance will be negatively affected due to interrupt overhead caused by the incoming traffic. In particular, excessive latency and significant degradation in system throughput can be experienced. In this paper, we present a throughput-delay analysis of such behavior. We develop analytical models based on queueing theory and Markov processes. In our analysis, we consider and model three systems: ideal, PIO, and DMA. In ideal system, the interrupt overhead is ignored. In PIO, DMA is disabled and copying of incoming packets is performed by the CPU. In DMA, copying of incoming packet is performed by DMA engines. For high-speed network hosts, both PIO and DMA can be desirable configuration options. The analysis yields insight into understanding and predicting the impact of system and network choices on the performance of interrupt-driven systems when subjected to light and heavy network loads. Simulations and reported experimental results show that our analytical models are valid and give a good approximation

    Performance Analysis and Comparison of Interrupt-Handling Schemes in Gigabit Networks

    Get PDF
    Interrupt processing can be a major bottleneck in the end-to-end performance of Gigabit networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. In particular, excessive latency and significant degradation in system throughput can be encountered. Also, user applications may livelock as the CPU power gets mostly consumed by interrupt handling and protocol processing. A number of interrupt handling schemes has been proposed and employed to mitigate the interrupt overhead and improve OS performance. Among the most popular interrupt handling schemes are normal interruption, polling, interrupt coalescing, and disabling and enabling of interrupts. In previous work, we presented a preliminary analytical study and models of normal interruption and interrupt coalescing. In this article, we extend our analysis and modeling to include polling and the scheme of interrupt disabling and enabling. For polling, we study both pure (or FreeBSD-style) polling and Linux NAPI polling. The performances for all these schemes are compared using both mathematical analysis and discrete-event simulation. The performance is studied in terms of three key performance indictors: throughput, system latency, and the residual CPU bandwidth available for user applications. As opposed to our previous work, we consider not only Poisson traffic, but also bursty traffic with empirical packet size distribution. Our analysis and simulation work gives insight into predicting the system performance and behavior when employing a certain interrupt handling scheme. It is concluded that no single interrupt handling scheme outperforms all other schemes under all traffic conditions. Based on obtained results, we propose and discuss a novel hybrid scheme of interrupt disabling-enabling and pure polling in order to attain peak performance under low and heavy traffic loads

    Performance Analysis and Comparison of Interrupt-Handling Schemes in Gigabit Networks

    Get PDF
    Interrupt processing can be a major bottleneck in the end-to-end performance of Gigabit networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. In particular, excessive latency and significant degradation in system throughput can be encountered. Also, user applications may livelock as the CPU power gets mostly consumed by interrupt handling and protocol processing. A number of interrupt handling schemes has been proposed and employed to mitigate the interrupt overhead and improve OS performance. Among the most popular interrupt handling schemes are normal interruption, polling, interrupt coalescing, and disabling and enabling of interrupts. In previous work, we presented a preliminary analytical study and models of normal interruption and interrupt coalescing. In this article, we extend our analysis and modeling to include polling and the scheme of interrupt disabling and enabling. For polling, we study both pure (or FreeBSD-style) polling and Linux NAPI polling. The performances for all these schemes are compared using both mathematical analysis and discrete-event simulation. The performance is studied in terms of three key performance indictors: throughput, system latency, and the residual CPU bandwidth available for user applications. As opposed to our previous work, we consider not only Poisson traffic, but also bursty traffic with empirical packet size distribution. Our analysis and simulation work gives insight into predicting the system performance and behavior when employing a certain interrupt handling scheme. It is concluded that no single interrupt handling scheme outperforms all other schemes under all traffic conditions. Based on obtained results, we propose and discuss a novel hybrid scheme of interrupt disabling-enabling and pure polling in order to attain peak performance under low and heavy traffic loads

    A Modular Reconfigurable Architecture for Asymmetric and Symmetric-key Cryptographic Algorithms

    Get PDF
    It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Cryptographic algorithms are the central tools for achieving system security. Numerous such algorithms have been devised, and many have found popularity in different domains. High throughput and low-cost implementation of these algorithms is critical for achieving both high security and high-speed processing in an increasingly digital global economy. Conventional methods for implementing ciphers are unable to provide all three crucial characteristics in a single solution: high throughput, low-cost, and cipher-agility. This thesis develops a reconfigurable architecture capable of implementing most symmetric-key as well as asymmetric-key ciphers. The reconfigurable nature of the architecture provides flexibility equivalent to software implementations, with the low-cost and throughput figures approaching ASIC implementations of these ciphers. Detailed discussions of the development of this architecture, along with the top-level design and interconnection scheme, have been provided. The individual components developed have been synthesized on a standard-cell library to provide an estimate of the area/performance characteristics of the design. Preliminary results show throughput values equivalent to FPGA based implementations for most of the tested ciphers, and approaching ASIC based implementations. Keywords: Reconfigurable Computing, Cryptography, Symmetric-Key, Asymmetric-Key, Domain-specific Reconfigurable Architecture

    Performance Analysis and Comparison of Interrupt-Handling Schemes in Gigabit Networks

    Get PDF
    Interrupt processing can be a major bottleneck in the end-to-end performance of Gigabit networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. In particular, excessive latency and significant degradation in system throughput can be encountered. Also, user applications may livelock as the CPU power gets mostly consumed by interrupt handling and protocol processing. A number of interrupt handling schemes has been proposed and employed to mitigate the interrupt overhead and improve OS performance. Among the most popular interrupt handling schemes are normal interruption, polling, interrupt coalescing, and disabling and enabling of interrupts. In previous work, we presented a preliminary analytical study and models of normal interruption and interrupt coalescing. In this article, we extend our analysis and modeling to include polling and the scheme of interrupt disabling and enabling. For polling, we study both pure (or FreeBSD-style) polling and Linux NAPI polling. The performances for all these schemes are compared using both mathematical analysis and discrete-event simulation. The performance is studied in terms of three key performance indictors: throughput, system latency, and the residual CPU bandwidth available for user applications. As opposed to our previous work, we consider not only Poisson traffic, but also bursty traffic with empirical packet size distribution. Our analysis and simulation work gives insight into predicting the system performance and behavior when employing a certain interrupt handling scheme. It is concluded that no single interrupt handling scheme outperforms all other schemes under all traffic conditions. Based on obtained results, we propose and discuss a novel hybrid scheme of interrupt disabling-enabling and pure polling in order to attain peak performance under low and heavy traffic loads

    Throughput and Delay Analysis of Interrupt-Driven Kernels under Poisson and Bursty Traffic

    Get PDF
    This paper studies the performance of interrupt-driven kernels when subjected to heavy network traffic such as that of Gigabit Ethernet. Under heavy network traffic, the kernel performance will be negatively affected due to interrupt overhead caused by the incoming traffic. In particular, excessive latency and significant degradation in system throughput can be experienced. In this paper, we present analytical models to study the performance in terms of two key kernel performance metrics: throughput and delay. The performance is also studied using simulation. Both Poisson and bursty traffic with empirical packet size distribution are considered
    corecore