468 research outputs found

    On Data Management in Pervasive Computing Environments

    Get PDF
    Abstract—This paper presents a framework to address new data management challenges introduced by data-intensive, pervasive computing environments. These challenges include a spatio-temporal variation of data and data source availability, lack of a global catalog and schema, and no guarantee of reconnection among peers due to the serendipitous nature of the environment. An important aspect of our solution is to treat devices as semiautonomous peers guided in their interactions by profiles and context. The profiles are grounded in a semantically rich language and represent information about users, devices, and data described in terms of “beliefs,” “desires, ” and “intentions. ” We present a prototype implementation of this framework over combined Bluetooth and Ad Hoc 802.11 networks and present experimental and simulation results that validate our approach and measure system performance. Index Terms—Mobile data management, pervasive computing environments, data and knowledge representation, profile-driven caching algorithm, profile driven data management, data-centric routing algorithm. æ

    Concertina: Squeezing in cache content to operate at near-threshold voltage

    Get PDF
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Scaling supply voltage to values near the threshold voltage allows a dramatic decrease in the power consumption of processors; however, the lower the voltage, the higher the sensitivity to process variation, and, hence, the lower the reliability. Large SRAM structures, like the last-level cache (LLC), are extremely vulnerable to process variation because they are aggressively sized to satisfy high density requirements. In this paper, we propose Concertina, an LLC designed to enable reliable operation at low voltages with conventional SRAM cells. Based on the observation that for many applications the LLC contains large amounts of null data, Concertina compresses cache blocks in order that they can be allocated to cache entries with faulty cells, enabling use of 100 percent of the LLC capacity. To distribute blocks among cache entries, Concertina implements a compression- and fault-aware insertion/replacement policy that reduces the LLC miss rate. Concertina reaches the performance of an ideal system implementing an LLC that does not suffer from parameter variation with a modest storage overhead. Specifically, performance degrades by less than 2 percent, even when using small SRAM cells, which implies over 90 percent of cache entries having defective cells, and this represents a notable improvement on previously proposed techniques.Peer ReviewedPostprint (author's final draft

    High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)

    Get PDF
    High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

    Optimal Eviction Policies for Stochastic Address Traces

    Full text link
    The eviction problem for memory hierarchies is studied for the Hidden Markov Reference Model (HMRM) of the memory trace, showing how miss minimization can be naturally formulated in the optimal control setting. In addition to the traditional version assuming a buffer of fixed capacity, a relaxed version is also considered, in which buffer occupancy can vary and its average is constrained. Resorting to multiobjective optimization, viewing occupancy as a cost rather than as a constraint, the optimal eviction policy is obtained by composing solutions for the individual addressable items. This approach is then specialized to the Least Recently Used Stack Model (LRUSM), a type of HMRM often considered for traces, which includes V-1 parameters, where V is the size of the virtual space. A gain optimal policy for any target average occupancy is obtained which (i) is computable in time O(V) from the model parameters, (ii) is optimal also for the fixed capacity case, and (iii) is characterized in terms of priorities, with the name of Least Profit Rate (LPR) policy. An O(log C) upper bound (being C the buffer capacity) is derived for the ratio between the expected miss rate of LPR and that of OPT, the optimal off-line policy; the upper bound is tightened to O(1), under reasonable constraints on the LRUSM parameters. Using the stack-distance framework, an algorithm is developed to compute the number of misses incurred by LPR on a given input trace, simultaneously for all buffer capacities, in time O(log V) per access. Finally, some results are provided for miss minimization over a finite horizon and over an infinite horizon under bias optimality, a criterion more stringent than gain optimality.Comment: 37 pages, 3 figure

    Compression-aware and performance-efficient insertion policies for long-lasting hybrid LLCs

    Get PDF
    Emerging non-volatile memory (NVM) technologies can potentially replace large SRAM memories such as the last-level cache (LLC). However, despite recent advances, NVMs suffer from higher write latency and limited write endurance. Recently, NVM-SRAM hybrid LLCs are proposed to combine the best of both worlds. Several policies have been proposed to improve the performance and lifetime of hybrid LLCs by intelligently steering the incoming LLC blocks into either the SRAM or NVM part, regarding the cache behavior of the LLC blocks and the SRAM/NVM device properties. However, these policies neither consider compressing the contents of the cache block nor using partially worn-out NVM cache blocks.This paper proposes new insertion policies for byte-level fault-tolerant hybrid LLCs that collaboratively optimize for lifetime and performance. Specifically, we leverage data compression to utilize partially defective NVM cache entries, thereby improving the LLC hit rate. The key to our approach is to guide the insertion policy by both the reuse properties of the block and the size resulting from its compression. A block is inserted in NVM only if it is a read-reuse block or its compressed size is lower than a threshold. It will be inserted in SRAM if the block is a write-reuse or its compressed size is greater than the threshold. We use set-dueling to tune the compression threshold at runtime. This compression threshold provides a knob to control the NVM write rate and, together with a rule-based mechanism, allows balancing performance and lifetime.Overall, our evaluation shows that, with affordable hardware overheads, the proposed schemes can nearly reach the performance of an SRAM cache with the same associativity while improving lifetime by 17× compared to a hybrid NVM-unaware LLC. Our proposed scheme outperforms the state-of-the-art insertion policies by 9% while achieving a comparative lifetime. The rule-based mechanism shows that by compromising, for instance, 1.1% and 1.9% performance, the NVM lifetime can be further increased by 28% and 44%, respectively.This work was partially funded by the HiPEAC collaboration grant 2020, the Center for Advancing Electronics Dresden (cfaed), the German Research Council (DFG) through the HetCIM project (502388442) under the Priority Program on ‘Disruptive Memory Technologies’ (SPP 2377), and from grants (1) PID2019-105660RB-C21 and PID2019-107255GB- C22/AEI/10.13039/501100011033 from Agencia Estatal de Investigación (AEI), and (2) gaZ: T5820R research group from Dept. of Science, University and Knowledge Society, Government of Aragon.Peer ReviewedPostprint (author's final draft

    Reference Speculation-driven Memory Management

    Get PDF
    The “Memory Wall”, the vast gulf between processor execution speed and memory latency, has led to the development of large and deep cache hierarchies over the last twenty years. Although processor frequency is no-longer on the exponential growth curve, the drive towards ever greater main memory capacity and limited off-chip bandwidth have kept this gap from closing significantly. In addition, future memory technologies such as Non-Volatile Memory (NVM) devices do not help to decrease the latency of the first reference to a particular memory address. To reduce the increasing off-chip memory access latency, this dissertation presents three intelligent speculation mechanisms that can predict and manage future memory usage. First, we propose a novel hardware data prefetcher called Signature Path Prefetcher (SPP), which offers effective solutions for major challenges in prefetcher design. SPP uses a compressed history-based scheme that accurately predicts a series of long complex address patterns. For example, to address a series of long complex memory references, SPP uses a compressed history signature that is able to learn and prefetch complex data access patterns. Moreover, unlike other history-based algorithms, which miss out on many prefetching opportunities when address patterns make a transition between physical pages, SPP tracks the stream of data accesses across physical page boundaries and continues prefetching as soon as they move to new pages. Finally, SPP uses the confidence it has in its predictions to adaptively throttle itself on a per-prefetch stream basis. In our analysis, we find that SPP outperforms the state-of-the-art hardware data prefetchers by 6.4% with higher prefetching accuracy and lower off-chip bandwidth usage. Second, we develop a holistic on-chip cache management system that tightly integrates data prefetching and cache replacement algorithms into one unified solution. Also, we eliminate the use of Program Counter (PC) in the cache replacement module by using a simple dead block prediction with global hysteresis. In addition to effectively predicting dead blocks in the Last-Level Cache (LLC) by observing program phase behaviors, the replacement component also gives feedback to the prefetching component to help decide on the optimal fill level for prefetches. Meanwhile, the prefetching component feeds confidence information about each individual prefetch to the LLC replacement component. A low confidence prefetch is less likely to interfere with the contents of the LLC, and as confidence in that prefetch increases, its position within the LLC replacement stack is solidified, and it eventually is brought into the L2 cache, close to where it will be used in the processor core. Third, we observe that the host machine in virtualized system operates under different memory pressure regimes, as the memory demand from guest Virtual Machines (VMs) changes dynamically at runtime. Adapting to this runtime system state is critical to reduce the performance cost of VM memory management. We propose a novel dynamic memory management policy called Memory Pressure Aware (MPA) ballooning. MPA ballooning dynamically speculates and allocates memory resources to each VM based on the current memory pressure regime. Moreover, MPA ballooning proactively reacts and adapts to sudden changes in memory demand from guest VMs. MPA ballooning requires neither additional hardware support, nor incurs extra minor page faults in its memory pressure estimation

    EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE

    Get PDF
    The performance gap between processors and storage systems has been increasingly critical overthe years. Yet the performance disparity remains, and further, storage energy consumption israpidly becoming a new critical problem. While smarter caching and predictive techniques domuch to alleviate this disparity, the problem persists, and data storage remains a growing contributorto latency and energy consumption.Attempts have been made at data layout maintenance, or intelligent physical placement ofdata, yet in practice, basic heuristics remain predominant. Problems that early studies soughtto solve via layout strategies were proven to be NP-Hard, and data layout maintenance todayremains more art than science. With unknown potential and a domain inherently full of uncertainty,layout maintenance persists as an area largely untapped by modern systems. But uncertainty inworkloads does not imply randomness; access patterns have exhibited repeatable, stable behavior.Predictive information can be gathered, analyzed, and exploited to improve data layouts. Ourgoal is a dynamic, robust, sustainable predictive engine, aimed at improving existing layouts byreplicating data at the storage device level.We present a comprehensive discussion of the design and construction of such a predictive engine,including workload evaluation, where we present and evaluate classical workloads as well asour own highly detailed traces collected over an extended period. We demonstrate significant gainsthrough an initial static grouping mechanism, and compare against an optimal grouping method ofour own construction, and further show significant improvement over competing techniques. We also explore and illustrate the challenges faced when moving from static to dynamic (i.e. online)grouping, and provide motivation and solutions for addressing these challenges. These challengesinclude metadata storage, appropriate predictive collocation, online performance, and physicalplacement. We reduced the metadata needed by several orders of magnitude, reducing the requiredvolume from more than 14% of total storage down to less than 12%. We also demonstrate how ourcollocation strategies outperform competing techniques. Finally, we present our complete modeland evaluate a prototype implementation against real hardware. This model was demonstrated tobe capable of reducing device-level accesses by up to 65%

    An architecture for recycling intermediates in a column-store

    Get PDF
    Automatic recycling intermediate results to improve both query response time and throughput is a grand c
    corecore