14 research outputs found

    Towards an Adaptive OS Noise Mitigation Technique for Microbenchmarking on Apple Ipad Devices

    Get PDF
    This study investigates levels of Operating System (OS) noise on Apple iPad mobile devices. OS noise causes variations in application performance that interfere with microbenchmark results. OS noise manifests in collected data through extreme outliers and variations in skewness. Using our collected data, we develop an iterative, semi-automated outlier removal process for Apple iPad OS noise profiles. The profiles generated by outlier removal represent the first step toward an adaptive noise mitigation technique, which presents opportunities for use in microbenchmarking across other mobile platforms

    The problems you're having may not be the problems you think you're having: results from a latency study of windows NT

    Get PDF
    ManuscriptThis paper is intended to catalyze discussions on two intertwined systems topics. First, it presents early results from a latency study of Windows NT that identifies some specific causes of long thread scheduling latencies, many of which delay the dispatching of runnable threads for tens of milliseconds. Reasons for these delays, including technical, methodological, and economic are presented and possible solutions are discussed. Secondly, and equally importantly, it is intended to serve as a cautionary tale against believing one's own intuition about the causes of poor system performance. We went into this study believing we understood a number of the causes of these delays, with our beliefs informed more by conventional wisdom and hunches than data. In nearly all cases the reasons we discovered via instrumentation and measurement surprised us. In fact, some directly contradicted "facts" we thought we "knew"

    Profiling I/O interrupts in modern architectures

    Get PDF
    Journal ArticleAs applications grow increasingly communication-oriented, interrupt performance quickly becomes a crucial component of high performance I/O system design. At the same time, accurately measuring interrupt handler performance is difficult with the traditional simulation, instrumentation, or statistical sampling approaches. One o f the most important components o f interrupt performance is cache behavior. This paper presents a portable method for measuring the cache effects o f I/O interrupt handling using native hardware performance counters. To provide a portability stress test, the method is demonstrated on two commercial platforms with different architectures, the SGI Origin 200 and the Sun LJltra-1. This case study uses the methodology to measure the overhead of the two most common forms o f interrupt traffic: disk and network interrupts. The study demonstrates that the method works well and is reasonably robust. In addition, the results show that disk interrupts behave similar on both platforms, while differences in OS organization cause network interrupts to behave very differently. Furthermore, network interrupts exhibit significantly larger cache footprints.

    Operating system profiling via latency analysis

    Get PDF
    Operating systems are complex and their behavior depends on many factors. Source code, if available, does not directly help one to understand the OS’s behavior, as the behavior depends on actual workloads and external inputs. Runtime profiling is a key technique to prove new concepts, debug problems, and optimize performance. Unfortunately, existing profiling methods are lacking in important areas—they do not provide enough information about the OS’s behavior, they require OS modification and therefore are not portable, or they incur high overheads thus perturbing the profiled OS. We developed OSprof: a versatile, portable, and efficient OS profiling method based on latency distributions analysis. OSprof automatically selects important profiles for subsequent visual analysis. We have demonstrated that a suitable workload can be used to profile virtually any OS component. OSprof is portable because it can intercept operations and measure OS behavior from user-level or from inside the kernel without requiring source code. OSprof has typical CPU time overheads below 4%. In this paper we describe our techniques and demonstrate their usefulness through a series of profiles conducted on Linux, FreeBSD, and Windows, including client/server scenarios. We discovered and investigated a number of interesting interactions, including scheduler behavior, multi-modal I/O distributions, and a previously unknown lock contention, which we fixed.

    HAPPE: Human and Application-Driven Frequency Scaling for Processor Power Efficiency

    Get PDF
    Abstract-Conventional dynamic voltage and frequency scaling techniques use high CPU utilization as a predictor for user dissatisfaction, to which they react by increasing CPU frequency. In this paper, we demonstrate that for many interactive applications, perceived performance is highly dependent upon the particular user and application, and is not linearly related to CPU utilization. This observation reveals an opportunity for reducing power consumption. We propose Human and Application driven frequency scaling for Processor Power Efficiency (HAPPE), an adaptive user-and-application-aware dynamic CPU frequency scaling technique. HAPPE continuously adapts processor frequency and voltage to the learned performance requirement of the current user and application. Adaptation to user requirements is quick and requires minimal effort from the user (typically a handful of key strokes). Once the system has adapted to the user's performance requirements, the user is not required to provide continued feedback but is permitted to provide additional feedback to adjust the control policy to changes in preferences. HAPPE was implemented on a Linux-based laptop and evaluated in 22 hours of controlled user studies. Compared to the default Linux CPU frequency controller, HAPPE reduces the measured system-wide power consumption of CPU-intensive interactive applications by 25 percent on average while maintaining user satisfaction. Index Terms-Power, CPU frequency scaling, user-driven study, mobile systems Ç 1I NTRODUCTION P OWER efficiency has been a major technology driver for battery-powered mobile systems, such as mobile phones, personal digital assistants, MP3 players, and laptops. Power efficiency has also become a new focus for line-powered desktop systems and data centers because of its impact on power dissipation and chip temperature, which affect performance, reliability, and lifetime. Processor power consumption is often a substantial portion of system power consumption in mobile systems Traditional CPU power management approaches can lose sight of an important fact: The ultimate goal of any computer system is to satisfy its users, not to execute a particular number of instructions per second. Although CPU utilization is a good indication of processor performance, the actual perceivable system performance depends on individual users and applications, and user satisfaction is not linearly related to CPU utilization. We conducted a study on 10 users with four interactive applications and found that for some applications, some users are satisfied with system performance when the processor is at the lowest frequency, while other users may not be satisfied even when it operates at the highest frequency. We also found that users may be insensitive to varying processor frequency for one application, but may be very sensitive to such changes for another application. Traditional DVFS policies that consider only CPU utilization or other useroblivious performance metrics are often too pessimistic about user performance requirements, and use a high frequency to satisfy all users, resulting in wasted power. Similar findings were also reported in other studies In this paper, we propose Human and Application driven frequency scaling for Processor Power Efficiency (HAPPE), a CPU DVFS technique that adapts voltage and frequency to the performance requirement of the curren

    Understanding PCIe performance for end host networking

    Get PDF
    In recent years, spurred on by the development and availability of programmable NICs, end hosts have increasingly become the enforcement point for core network functions such as load balancing, congestion control, and application specific network offloads. However, implementing custom designs on programmable NICs is not easy: many potential bottlenecks can impact performance. This paper focuses on the performance implication of PCIe, the de-facto I/O interconnect in contemporary servers, when interacting with the host architecture and device drivers. We present a theoretical model for PCIe and pcie-bench, an open-source suite, that allows developers to gain an accurate and deep understanding of the PCIe substrate. Using pcie-bench, we characterize the PCIe subsystem in modern servers. We highlight surprising differences in PCIe implementations, evaluate the undesirable impact of PCIe features such as IOMMUs, and show the practical limits for common network cards operating at 40Gb/s and beyond. Furthermore, through pcie-bench we gained insights which guided software and future hardware architectures for both commercial and research oriented network cards and DMA engines

    Understanding and Leveraging Virtualization Technology in Commodity Computing Systems

    Get PDF
    Commodity computing platforms are imperfect, requiring various enhancements for performance and security purposes. In the past decade, virtualization technology has emerged as a promising trend for commodity computing platforms, ushering many opportunities to optimize the allocation of hardware resources. However, many abstractions offered by virtualization not only make enhancements more challenging, but also complicate the proper understanding of virtualized systems. The current understanding and analysis of these abstractions are far from being satisfactory. This dissertation aims to tackle this problem from a holistic view, by systematically studying the system behaviors. The focus of our work lies in performance implication and security vulnerabilities of a virtualized system.;We start with the first abstraction---an intensive memory multiplexing for I/O of Virtual Machines (VMs)---and present a new technique, called Batmem, to effectively reduce the memory multiplexing overhead of VMs and emulated devices by optimizing the operations of the conventional emulated Memory Mapped I/O in hypervisors. Then we analyze another particular abstraction---a nested file system---and attempt to both quantify and understand the crucial aspects of performance in a variety of settings. Our investigation demonstrates that the choice of a file system at both the guest and hypervisor levels has significant impact upon I/O performance.;Finally, leveraging utilities to manage VM disk images, we present a new patch management framework, called Shadow Patching, to achieve effective software updates. This framework allows system administrators to still take the offline patching approach but retain most of the benefits of live patching by using commonly available virtualization techniques. to demonstrate the effectiveness of the approach, we conduct a series of experiments applying a wide variety of software patches. Our results show that our framework incurs only small overhead in running systems, but can significantly reduce maintenance window
    corecore