Search CORE

12,068 research outputs found

Memory resource balancing for virtualized computing

Author: Zhao Weiming
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2011
Field of study

Virtualization has become a common abstraction layer in modern data centers. By multiplexing hardware resources into multiple virtual machines (VMs) and thus enabling several operating systems to run on the same physical platform simultaneously, it can effectively reduce power consumption and building size or improve security by isolating VMs. In a virtualized system, memory resource management plays a critical role in achieving high resource utilization and performance. Insufficient memory allocation to a VM will degrade its performance dramatically. On the contrary, over-allocation causes waste of memory resources. Meanwhile, a VM’s memory demand may vary significantly. As a result, effective memory resource management calls for a dynamic memory balancer, which, ideally, can adjust memory allocation in a timely manner for each VM based on their current memory demand and thus achieve the best memory utilization and the optimal overall performance. In order to estimate the memory demand of each VM and to arbitrate possible memory resource contention, a widely proposed approach is to construct an LRU-based miss ratio curve (MRC), which provides not only the current working set size (WSS) but also the correlation between performance and the target memory allocation size. Unfortunately, the cost of constructing an MRC is nontrivial. In this dissertation, we first present a low overhead LRU-based memory demand tracking scheme, which includes three orthogonal optimizations: AVL-based LRU organization, dynamic hot set sizing and intermittent memory tracking. Our evaluation results show that, for the whole SPEC CPU 2006 benchmark suite, after applying the three optimizing techniques, the mean overhead of MRC construction is lowered from 173% to only 2%. Based on current WSS, we then predict its trend in the near future and take different strategies for different prediction results. When there is a sufficient amount of physical memory on the host, it locally balances its memory resource for the VMs. Once the local memory resource is insufficient and the memory pressure is predicted to sustain for a sufficiently long time, a relatively expensive solution, VM live migration, is used to move one or more VMs from the hot host to other host(s). Finally, for transient memory pressure, a remote cache is used to alleviate the temporary performance penalty. Our experimental results show that this design achieves 49% center-wide speedup

Michigan Technological University

MARACAS: a real-time multicore VCPU scheduling framework

Author: Cheng Zhuoqun
West Richard
Ye Ying
Zhang Jingyi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip

Boston University Institutional Repository (OpenBU)

mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores

Author: Byrne Daniel
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2018
Field of study

Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its requests from memory, avoiding fetching them from the backend database. Since the memory space is limited, maximizing the memory utilization is a key to delivering the best performance possible. This has lead to the use of multi-tenant systems, allowing applications to share cache space. In addition, application data access patterns change over time, so the system should be adaptive in its memory allocation. In this thesis, we address both multi-tenancy (where a single cache is used for mul- tiple applications) and dynamic workloads (changing access patterns) using a model that relates the cache size to the application miss ratio, known as a miss ratio curve. Intuitively, the larger the cache, the less likely the system will need to fetch the data from the database. Our efficient, online construction of the miss ratio curve allows us to determine a near optimal memory allocation given the available system memory, while adapting to changing data access patterns. We show that our model outper- forms an existing state-of-the-art sharing model, Memshare, in terms of cache hit ratio and does so at a lower time cost. We show that average hit ratio is consistently 1 percentage point greater and 99.9th percentile latency is reduced by as much as 2.9% under standard web application workloads containing millions of requests

Michigan Technological University

Individual Differences in the Experience of Cognitive Workload

Author: Guastello Stephen J.
Malon Matthew
Shircel Anton
Timm Paul
Publication venue: e-Publications@Marquette
Publication date: 01/01/2015
Field of study

This study investigated the roles of four psychosocial variables – anxiety, conscientiousness, emotional intelligence, and Protestant work ethic – on subjective ratings of cognitive workload as measured by the Task Load Index (TLX) and the further connections between the four variables and TLX ratings of task performance. The four variables represented aspects of an underlying construct of elasticity versus rigidity in response to workload. Participants were 141 undergraduates who performed a vigilance task under different speeded conditions while working on a jigsaw puzzle for 90 minutes. Regression analysis showed that anxiety and emotional intelligence were the two variables most proximally related to TLX ratings. TLX ratings contributed to the prediction of performance on the puzzle, but not the vigilance task. Severity error bias was evident in some of the ratings. Although working in pairs improved performance, it also resulted in higher ratings of temporal demand and perceived performance pressure

epublications@Marquette

DETECTION OF OPERATOR PERFORMANCE BREAKDOWN IN A MULTITASK ENVIRONMENT

Author: Yoo Hyo-Sang
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2015
Field of study

The purpose of this dissertation work is: 1) to empirically demonstrate an extreme human operator’s state, performance breakdown (PB), and 2) to develop an objective method for detecting such a state. PB has been anecdotally described as a state where the human operator “loses control of the context” and “cannot maintain the required task performance.” Preventing such a decline in performance could be important to assure the safety and reliability of human-integrated systems, and therefore PB could be useful as a point at which automation can be applied to support human performance. However, PB has never been scientifically defined or empirically demonstrated. Moreover, there exists no method for detecting such a state or the transition to that state. Therefore, after symbolically defining PB, an objective method of potentially identifying PB is proposed. Next, three human-in-the-loop studies were conducted to empirically demonstrate PB and to evaluate the proposed PB detection method. Study 1 was conducted: 1) to demonstrate PB by increasing workload until the subject reports being in a state of PB, and 2) to identify possible parameters of the PB detection method for objectively identifying the subjectively-reported PB point, and determine if they are idiosyncratic. In the experiment, fifteen participants were asked to manage three concurrent tasks (one primary and two secondary tasks) for 18 minutes. The primary task’s difficulty was manipulated over time to induce PB while the secondary tasks’ difficulty remained static. Data on participants’ task performance was collected. Three hypotheses were constructed: 1) increasing workload will induce subjectively-identified PB, 2) there exists criteria that identify the threshold parameters that best detect the performance characteristics that maps to the subjectively-identified PB point, and 3) the criteria for choosing the threshold parameters are consistent across individuals. The results show that increasing workload can induce subjectively-identified PB, although it might not be generalizable — 12 out of 15 participants declared PB. The PB detection method was applied on the performance data and the results showed PB can be identified using the method, particularly when the values of the parameters for the detection method were calibrated individually. Next, study 2 was conducted: 1) to repeat the demonstration of inducing PB, 2) to evaluate whether the threshold parameters established in study 1 for the PB detection method can be used in a subsequent study, or whether they have to be re-calibrated for each study, and 3) to examine whether a specific physiological measure (pulse rate) can be used to identify the subjectively-reported PB point. Study 2 was conducted in the same task environment (three concurrent tasks) as study 1. Three hypotheses were constructed: 1) increasing workload will induce subjectively-identified performance breakdown, 2) the threshold parameters established from study 2 will be the same as those from study 1 for all participants and will perform approximately as well or better, and 3) there exists criteria for choosing the threshold parameters that captures the characteristics at the subjectively-reported PB point using the PB detection method on pulse rate data. The results show that increasing workload induces the same participants (12 out of 15) from study 1 to declare PB. Also, it was found that the threshold parameters established in study 1 for the PB detection method cannot be reliably used in a subsequent study, and suggest that it may require re-calibration for each study. The results provided no evidence that pulse rate data can be used to detect PB. Study 3 was conducted: 1) to determine if PB is induced by the primary task workload or is affected by the presence of the secondary tasks, and 2) to re-test whether threshold parameters from study 1 can be used in a subsequent study. In study 3, the same participants from study 1 and 2 were only asked to perform the primary task while its difficulty increased in a similar manner to the first two studies. Two hypotheses were established: 1) PB will occur without the secondary tasks being present, and 2) the threshold parameters established from study 3 will be the same as those from study 1 and/or study 2 for all participants and will perform approximately as well or better. No participants declared PB without the secondary tasks present, even though the primary task workload was the same. Again, it was verified that the threshold parameters established in study 1 and 2 for the PB detection method cannot be used in a subsequent study, and suggest that it may require re-calibration for each study

Purdue E-Pubs

CLAM: Compiler Lease of Cache Memory

Author: Prechtl Ian
Publication venue: RIT Scholar Works
Publication date: 01/08/2020
Field of study

Caching is a common solution to the data movement performance bottleneck of today’s computational systems and networks. Traditional caching examines program behavior and cache optimization separately, limiting performance. Recently, a new cache policy called Compiler Lease of cAche Memory (CLAM), has been suggested for program-based cache management. CLAM manages cache memory by allowing the compiler to assign leases, or lifespans, to cached items over a hardware-software interface, known as lease cache. Lease cache affords new performance potential, by way of program-driven cache optimization. It is applicable to existing cache architecture optimizations, and can be used to emulate other cache policies. This paper presents the first functional hardware implementation of lease cache for CLAM support. Lease cache hardware architecture is first presented, along with CLAM hardware support systems. The cache is emulated on an FPGA, and benchmarked using a collection of scientific kernels from the PolyBench/C suite, for three CLAM lease assignment policies: Compiler Assigned Reference Leasing (CARL), Phased Reference Leasing (PRL), and Fixed Uniform Leasing (FUL). CARL and PRL are able to achieve superior performance to Least Recently Used (LRU) replacement, while FUL is shown to serve as a safety mechanism for CLAM. Novel spectrum-based cache tenancy analysis verifies PRL’s effectiveness in limiting cache utilization, and can identify changes in the working-set that cause the policy to perform adversely. This suggests that CLAM is extendable to more complex workloads if working-set transitions can elicit a similar change in lease policy. Being able to do so could yield appreciable performance improvements for large and highly iterative workloads like tensors

RIT Scholar Works

Using malware for software-defined networking–based smart home security management through a taint checking approach

Author: Braga R
Chi-Chun Lo
Chin E
Enck W
Feamster N
Hsiao-Chung Lin
Jin R
Kim HC
Kreutz D
Kuo-Ming Chao
Ping Wang
Rastogi V
Tripp O
Wang P
Wen-Hui Lin
Wun-Jie Chao
Yin H
Publication venue: 'SAGE Publications'
Publication date: 24/08/2016
Field of study

Crossref

Coventry University Pure Portal

Recommended from our members

QoS-aware mechanisms for improving cost-efficiency of datacenters

Author: Zhu Haishan
Publication venue
Publication date: 22/01/2021
Field of study

Warehouse Scale Computers (WSCs) promise high cost-efficiency by amortizing power, cooling, and management overheads. WSCs today host a large variety of jobs with two broad performance requirements categories: latency-critical (LC) and best-effort (BE). Ideally, to fully utilize all hardware resources, WSC operators can simply fill all the nodes with computing jobs. Unfortunately, because colocated jobs contend for shared resources, systems with high loads often experience performance degradation, which negatively impacts the Quality of Service (QoS) for LC jobs. In fact, service providers usually over-provision resources to avoid any interference with LC jobs, leading to significant resource inefficiencies. In this dissertation, I explore opportunities across different system-abstraction layers to improve the cost-efficiency of dataceters by increasing resource utilization of WSCs with little or no impact on the performance of LC jobs. The dissertation has three main components. First, I explore opportunities to improve the throughput of multicore systems by reducing the performance variation of LC jobs. The main insight is that by reshaping the latency distribution curve, performance headroom of LC jobs can be effectively converted to improved BE throughput. I develop, implement, and evaluate a runtime system that achieves this goal with existing hardware. I leverage the cache partitioning, per-core frequency scaling, and thread masking of server processors. Evaluation results show the proposed solution enables 30% higher system throughput compared to solutions proposed in prior works while maintaining at least as good QoS for LC jobs. Second, I study resource contention in near-future heterogeneous memory architectures (HMA). This study is motivated by recent developments in non-volatile memory (NVM) technologies, which enable higher storage density at the cost of same performance. To understand the performance and QoS impact of HMAs, I design and implement a performance emulator in the Linux kernel that runs unmodified workloads with high accuracy, low overhead, and complete transparency. I further propose and evaluate multiple data and resource management QoS mechanisms, such as locality-aware page admission, occupancy management, and write buffer jailing. Third, I focus on accelerated machine learning (ML) systems. By profiling the performance of production workloads and accelerators, I show that accelerated ML tasks are highly sensitive to main memory interference due to fine-grained interaction between CPU and accelerator tasks. As a result, memory resource contention can significantly decreases the performance and efficiency gains of accelerators. I propose a runtime system that leverages existing hardware capabilities and show 17% higher system efficiency compared to previous approaches. This study further exposes opportunities for future processor architecturesElectrical and Computer Engineerin

Texas ScholarWorks