12,068 research outputs found

    Memory resource balancing for virtualized computing

    Get PDF
    Virtualization has become a common abstraction layer in modern data centers. By multiplexing hardware resources into multiple virtual machines (VMs) and thus enabling several operating systems to run on the same physical platform simultaneously, it can effectively reduce power consumption and building size or improve security by isolating VMs. In a virtualized system, memory resource management plays a critical role in achieving high resource utilization and performance. Insufficient memory allocation to a VM will degrade its performance dramatically. On the contrary, over-allocation causes waste of memory resources. Meanwhile, a VM’s memory demand may vary significantly. As a result, effective memory resource management calls for a dynamic memory balancer, which, ideally, can adjust memory allocation in a timely manner for each VM based on their current memory demand and thus achieve the best memory utilization and the optimal overall performance. In order to estimate the memory demand of each VM and to arbitrate possible memory resource contention, a widely proposed approach is to construct an LRU-based miss ratio curve (MRC), which provides not only the current working set size (WSS) but also the correlation between performance and the target memory allocation size. Unfortunately, the cost of constructing an MRC is nontrivial. In this dissertation, we first present a low overhead LRU-based memory demand tracking scheme, which includes three orthogonal optimizations: AVL-based LRU organization, dynamic hot set sizing and intermittent memory tracking. Our evaluation results show that, for the whole SPEC CPU 2006 benchmark suite, after applying the three optimizing techniques, the mean overhead of MRC construction is lowered from 173% to only 2%. Based on current WSS, we then predict its trend in the near future and take different strategies for different prediction results. When there is a sufficient amount of physical memory on the host, it locally balances its memory resource for the VMs. Once the local memory resource is insufficient and the memory pressure is predicted to sustain for a sufficiently long time, a relatively expensive solution, VM live migration, is used to move one or more VMs from the hot host to other host(s). Finally, for transient memory pressure, a remote cache is used to alleviate the temporary performance penalty. Our experimental results show that this design achieves 49% center-wide speedup

    MARACAS: a real-time multicore VCPU scheduling framework

    Full text link
    This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip

    mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores

    Get PDF
    Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its requests from memory, avoiding fetching them from the backend database. Since the memory space is limited, maximizing the memory utilization is a key to delivering the best performance possible. This has lead to the use of multi-tenant systems, allowing applications to share cache space. In addition, application data access patterns change over time, so the system should be adaptive in its memory allocation. In this thesis, we address both multi-tenancy (where a single cache is used for mul- tiple applications) and dynamic workloads (changing access patterns) using a model that relates the cache size to the application miss ratio, known as a miss ratio curve. Intuitively, the larger the cache, the less likely the system will need to fetch the data from the database. Our efficient, online construction of the miss ratio curve allows us to determine a near optimal memory allocation given the available system memory, while adapting to changing data access patterns. We show that our model outper- forms an existing state-of-the-art sharing model, Memshare, in terms of cache hit ratio and does so at a lower time cost. We show that average hit ratio is consistently 1 percentage point greater and 99.9th percentile latency is reduced by as much as 2.9% under standard web application workloads containing millions of requests

    Individual Differences in the Experience of Cognitive Workload

    Get PDF
    This study investigated the roles of four psychosocial variables – anxiety, conscientiousness, emotional intelligence, and Protestant work ethic – on subjective ratings of cognitive workload as measured by the Task Load Index (TLX) and the further connections between the four variables and TLX ratings of task performance. The four variables represented aspects of an underlying construct of elasticity versus rigidity in response to workload. Participants were 141 undergraduates who performed a vigilance task under different speeded conditions while working on a jigsaw puzzle for 90 minutes. Regression analysis showed that anxiety and emotional intelligence were the two variables most proximally related to TLX ratings. TLX ratings contributed to the prediction of performance on the puzzle, but not the vigilance task. Severity error bias was evident in some of the ratings. Although working in pairs improved performance, it also resulted in higher ratings of temporal demand and perceived performance pressure

    DETECTION OF OPERATOR PERFORMANCE BREAKDOWN IN A MULTITASK ENVIRONMENT

    Get PDF
    The purpose of this dissertation work is: 1) to empirically demonstrate an extreme human operator’s state, performance breakdown (PB), and 2) to develop an objective method for detecting such a state. PB has been anecdotally described as a state where the human operator “loses control of the context” and “cannot maintain the required task performance.” Preventing such a decline in performance could be important to assure the safety and reliability of human-integrated systems, and therefore PB could be useful as a point at which automation can be applied to support human performance. However, PB has never been scientifically defined or empirically demonstrated. Moreover, there exists no method for detecting such a state or the transition to that state. Therefore, after symbolically defining PB, an objective method of potentially identifying PB is proposed. Next, three human-in-the-loop studies were conducted to empirically demonstrate PB and to evaluate the proposed PB detection method. Study 1 was conducted: 1) to demonstrate PB by increasing workload until the subject reports being in a state of PB, and 2) to identify possible parameters of the PB detection method for objectively identifying the subjectively-reported PB point, and determine if they are idiosyncratic. In the experiment, fifteen participants were asked to manage three concurrent tasks (one primary and two secondary tasks) for 18 minutes. The primary task’s difficulty was manipulated over time to induce PB while the secondary tasks’ difficulty remained static. Data on participants’ task performance was collected. Three hypotheses were constructed: 1) increasing workload will induce subjectively-identified PB, 2) there exists criteria that identify the threshold parameters that best detect the performance characteristics that maps to the subjectively-identified PB point, and 3) the criteria for choosing the threshold parameters are consistent across individuals. The results show that increasing workload can induce subjectively-identified PB, although it might not be generalizable — 12 out of 15 participants declared PB. The PB detection method was applied on the performance data and the results showed PB can be identified using the method, particularly when the values of the parameters for the detection method were calibrated individually. Next, study 2 was conducted: 1) to repeat the demonstration of inducing PB, 2) to evaluate whether the threshold parameters established in study 1 for the PB detection method can be used in a subsequent study, or whether they have to be re-calibrated for each study, and 3) to examine whether a specific physiological measure (pulse rate) can be used to identify the subjectively-reported PB point. Study 2 was conducted in the same task environment (three concurrent tasks) as study 1. Three hypotheses were constructed: 1) increasing workload will induce subjectively-identified performance breakdown, 2) the threshold parameters established from study 2 will be the same as those from study 1 for all participants and will perform approximately as well or better, and 3) there exists criteria for choosing the threshold parameters that captures the characteristics at the subjectively-reported PB point using the PB detection method on pulse rate data. The results show that increasing workload induces the same participants (12 out of 15) from study 1 to declare PB. Also, it was found that the threshold parameters established in study 1 for the PB detection method cannot be reliably used in a subsequent study, and suggest that it may require re-calibration for each study. The results provided no evidence that pulse rate data can be used to detect PB. Study 3 was conducted: 1) to determine if PB is induced by the primary task workload or is affected by the presence of the secondary tasks, and 2) to re-test whether threshold parameters from study 1 can be used in a subsequent study. In study 3, the same participants from study 1 and 2 were only asked to perform the primary task while its difficulty increased in a similar manner to the first two studies. Two hypotheses were established: 1) PB will occur without the secondary tasks being present, and 2) the threshold parameters established from study 3 will be the same as those from study 1 and/or study 2 for all participants and will perform approximately as well or better. No participants declared PB without the secondary tasks present, even though the primary task workload was the same. Again, it was verified that the threshold parameters established in study 1 and 2 for the PB detection method cannot be used in a subsequent study, and suggest that it may require re-calibration for each study

    CLAM: Compiler Lease of Cache Memory

    Get PDF
    Caching is a common solution to the data movement performance bottleneck of today’s computational systems and networks. Traditional caching examines program behavior and cache optimization separately, limiting performance. Recently, a new cache policy called Compiler Lease of cAche Memory (CLAM), has been suggested for program-based cache management. CLAM manages cache memory by allowing the compiler to assign leases, or lifespans, to cached items over a hardware-software interface, known as lease cache. Lease cache affords new performance potential, by way of program-driven cache optimization. It is applicable to existing cache architecture optimizations, and can be used to emulate other cache policies. This paper presents the first functional hardware implementation of lease cache for CLAM support. Lease cache hardware architecture is first presented, along with CLAM hardware support systems. The cache is emulated on an FPGA, and benchmarked using a collection of scientific kernels from the PolyBench/C suite, for three CLAM lease assignment policies: Compiler Assigned Reference Leasing (CARL), Phased Reference Leasing (PRL), and Fixed Uniform Leasing (FUL). CARL and PRL are able to achieve superior performance to Least Recently Used (LRU) replacement, while FUL is shown to serve as a safety mechanism for CLAM. Novel spectrum-based cache tenancy analysis verifies PRL’s effectiveness in limiting cache utilization, and can identify changes in the working-set that cause the policy to perform adversely. This suggests that CLAM is extendable to more complex workloads if working-set transitions can elicit a similar change in lease policy. Being able to do so could yield appreciable performance improvements for large and highly iterative workloads like tensors
    • …
    corecore