270,643 research outputs found

    Refactoring intermediately executed code to reduce cache capacity misses

    Get PDF
    The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

    Wireless Device-to-Device Communications with Distributed Caching

    Full text link
    We introduce a novel wireless device-to-device (D2D) collaboration architecture that exploits distributed storage of popular content to enable frequency reuse. We identify a fundamental conflict between collaboration distance and interference and show how to optimize the transmission power to maximize frequency reuse. Our analysis depends on the user content request statistics which are modeled by a Zipf distribution. Our main result is a closed form expression of the optimal collaboration distance as a function of the content reuse distribution parameters. We show that if the Zipf exponent of the content reuse distribution is greater than 1, it is possible to have a number of D2D interference-free collaboration pairs that scales linearly in the number of nodes. If the Zipf exponent is smaller than 1, we identify the best possible scaling in the number of D2D collaborating links. Surprisingly, a very simple distributed caching policy achieves the optimal scaling behavior and therefore there is no need to centrally coordinate what each node is caching.Comment: to appear in ISIT 201

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Get PDF
    Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

    Mode Selection, Resource Allocation and Power Control for D2D-Enabled Two-Tier Cellular Network

    Full text link
    This paper proposes a centralized decision making framework at the macro base station (MBS) for device to device (D2D) communication underlaying a two-tier cellular network. We consider a D2D pair in the presence of an MBS and a femto access point, each serving a user, with quality of service constraints for all users. Our proposed solution encompasses mode selection (choosing between cellular or reuse or dedicated mode), resource allocation (in cellular and dedicated mode) and power control (in reuse mode) within a single framework. The framework prioritizes D2D dedicated mode if the D2D pair are close to each other and orthogonal resources are available. Otherwise, it allows D2D reuse mode if the D2D satisfies both the maximum distance and an additional interference criteria. For reuse mode, we present a geometric vertex search approach to solve the power allocation problem. We analytically prove the validity of this approach and show that it achieves near optimal performance. For cellular and dedicated modes, we show that frequency sharing maximizes sum rate and solve the resource allocation problem in closed form. Our simulations demonstrate the advantages of the proposed framework in terms of the performance gains achieved in D2D mode.Comment: Submitted for possible journal publicatio

    RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

    Get PDF
    Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

    Imaging cell lineage with a synthetic digital recording system

    Get PDF
    Cell lineage plays a pivotal role in cell fate determination. Chow et al. demonstrate the use of an integrase-based synthetic barcode system called intMEMOIR, which uses the serine integrase Bxb1 to perform irreversible nucleotide edits. Inducible editing either deletes or inverts its target region, thus encoding information in three-state memory elements, or trits, and avoiding undesired recombination events. Using intMEMOIR combined with single-molecule fluorescence in situ hybridization, the authors were able to identify clonal structures as well as gene expression patterns in the fly brain, enabling both clonal analysis and expression profiling with intact spatial information. The ability to visualize cell lineage relationships directly within their native tissue context provides insights into development and disease
    corecore