20,532 research outputs found

    OS Scheduling Algorithms for Memory Intensive Workloads in Multi-socket Multi-core servers

    Full text link
    Major chip manufacturers have all introduced multicore microprocessors. Multi-socket systems built from these processors are routinely used for running various server applications. Depending on the application that is run on the system, remote memory accesses can impact overall performance. This paper presents a new operating system (OS) scheduling optimization to reduce the impact of such remote memory accesses. By observing the pattern of local and remote DRAM accesses for every thread in each scheduling quantum and applying different algorithms, we come up with a new schedule of threads for the next quantum. This new schedule potentially cuts down remote DRAM accesses for the next scheduling quantum and improves overall performance. We present three such new algorithms of varying complexity followed by an algorithm which is an adaptation of Hungarian algorithm. We used three different synthetic workloads to evaluate the algorithm. We also performed sensitivity analysis with respect to varying DRAM latency. We show that these algorithms can cut down DRAM access latency by up to 55% depending on the algorithm used. The benefit gained from the algorithms is dependent upon their complexity. In general higher the complexity higher is the benefit. Hungarian algorithm results in an optimal solution. We find that two out of four algorithms provide a good trade-off between performance and complexity for the workloads we studied

    COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

    Get PDF
    Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin

    HAPPY: Hybrid Address-based Page Policy in DRAMs

    Full text link
    Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical. We propose a compact memory address-based encoding technique which can improve or maintain the performance of DRAMs page closure predictors while reducing the hardware overhead in comparison with state-of-the-art techniques. As a case study, we integrate our technique, HAPPY, with a state-of-the-art monitor, the Intel-adaptive open-page policy predictor employed by the Intel Xeon X5650, and a traditional Hybrid page policy. We evaluate them across 70 memory intensive workload mixes consisting of single-thread and multi-thread applications. The experimental results show that using the HAPPY encoding applied to the Intel-adaptive page closure policy can reduce the hardware overhead by 5X for the evaluated 64 GB memory (up to 40X for a 512 GB memory) while maintaining the prediction accuracy

    I/O Schedulers for Proportionality and Stability on Flash-Based SSDs in Multi-Tenant Environments

    Get PDF
    The use of flash based Solid State Drives (SSDs) has expanded rapidly into the cloud computing environment. In cloud computing, ensuring the service level objective (SLO) of each server is the major criterion in designing a system. In particular, eliminating performance interference among virtual machines (VMs) on shared storage is a key challenge. However, studies on SSD performance to guarantee SLO in such environments are limited. In this paper, we present analysis of I/O behavior for a shared SSD as storage in terms of proportionality and stability. We show that performance SLOs of SSD based storage systems being shared by VMs or tasks are not satisfactory. We present and analyze the reasons behind the unexpected behavior through examining the components of SSDs such as channels, DRAM buffer, and Native Command Queuing (NCQ). We introduce two novel SSD-aware host level I/O schedulers on Linux, called A & x002B;CFQ and H & x002B;BFQ, based on our analysis and findings. Through experiments on Linux, we analyze I/O proportionality and stability in multi-tenant environments. In addition, through experiments using real workloads, we analyze the performance interference between workloads on a shared SSD. We then show that the proposed I/O schedulers almost eliminate the interference effect seen in CFQ and BFQ, while still providing I/O proportionality and stability for various I/O weighted scenarios

    Performance metrics for consolidated servers

    Get PDF
    In spite of the widespread adoption of virtualization and consol- idation, there exists no consensus with respect to how to bench- mark consolidated servers that run multiple guest VMs on the same physical hardware. For example, VMware proposes VMmark which basically computes the geometric mean of normalized throughput values across the VMs; Intel uses vConsolidate which reports a weighted arithmetic average of normalized throughput values. These benchmarking methodologies focus on total system through- put (i.e., across all VMs in the system), and do not take into account per-VM performance. We argue that a benchmarking methodology for consolidated servers should quantify both total system through- put and per-VM performance in order to provide a meaningful and precise performance characterization. We therefore present two performance metrics, Total Normalized Throughput (TNT) to characterize total system performance, and Average Normalized Reduced Throughput (ANRT) to characterize per-VM performance. We compare TNT and ANRT against VMmark using published performance numbers, and report several cases for which the VM- mark score is misleading. This is, VMmark says one platform yields better performance than another, however, TNT and ANRT show that both platforms represent different trade-offs in total system throughput versus per-VM performance. Or, even worse, in a cou- ple cases we observe that VMmark yields opposite conclusions than TNT and ANRT, i.e., VMmark says one system performs better than another one which is contradicted by TNT/ANRT performance characterization

    Resource provisioning in Science Clouds: Requirements and challenges

    Full text link
    Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high-performance applications, such as local clusters, high-performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications
    corecore