270,643 research outputs found
Refactoring intermediately executed code to reduce cache capacity misses
The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results
Wireless Device-to-Device Communications with Distributed Caching
We introduce a novel wireless device-to-device (D2D) collaboration
architecture that exploits distributed storage of popular content to enable
frequency reuse. We identify a fundamental conflict between collaboration
distance and interference and show how to optimize the transmission power to
maximize frequency reuse. Our analysis depends on the user content request
statistics which are modeled by a Zipf distribution. Our main result is a
closed form expression of the optimal collaboration distance as a function of
the content reuse distribution parameters. We show that if the Zipf exponent of
the content reuse distribution is greater than 1, it is possible to have a
number of D2D interference-free collaboration pairs that scales linearly in the
number of nodes. If the Zipf exponent is smaller than 1, we identify the best
possible scaling in the number of D2D collaborating links. Surprisingly, a very
simple distributed caching policy achieves the optimal scaling behavior and
therefore there is no need to centrally coordinate what each node is caching.Comment: to appear in ISIT 201
Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential
Emerging computer architectures will feature drastically decreased flops/byte
(ratio of peak processing rate to memory bandwidth) as highlighted by recent
studies on Exascale architectural trends. Further, flops are getting cheaper
while the energy cost of data movement is increasingly dominant. The
understanding and characterization of data locality properties of computations
is critical in order to guide efforts to enhance data locality. Reuse distance
analysis of memory address traces is a valuable tool to perform data locality
characterization of programs. A single reuse distance analysis can be used to
estimate the number of cache misses in a fully associative LRU cache of any
size, thereby providing estimates on the minimum bandwidth requirements at
different levels of the memory hierarchy to avoid being bandwidth bound.
However, such an analysis only holds for the particular execution order that
produced the trace. It cannot estimate potential improvement in data locality
through dependence preserving transformations that change the execution
schedule of the operations in the computation. In this article, we develop a
novel dynamic analysis approach to characterize the inherent locality
properties of a computation and thereby assess the potential for data locality
enhancement via dependence preserving transformations. The execution trace of a
code is analyzed to extract a computational directed acyclic graph (CDAG) of
the data dependences. The CDAG is then partitioned into convex subsets, and the
convex partitioning is used to reorder the operations in the execution trace to
enhance data locality. The approach enables us to go beyond reuse distance
analysis of a single specific order of execution of the operations of a
computation in characterization of its data locality properties. It can serve a
valuable role in identifying promising code regions for manual transformation,
as well as assessing the effectiveness of compiler transformations for data
locality enhancement. We demonstrate the effectiveness of the approach using a
number of benchmarks, including case studies where the potential shown by the
analysis is exploited to achieve lower data movement costs and better
performance.Comment: Transaction on Architecture and Code Optimization (2014
Mode Selection, Resource Allocation and Power Control for D2D-Enabled Two-Tier Cellular Network
This paper proposes a centralized decision making framework at the macro base
station (MBS) for device to device (D2D) communication underlaying a two-tier
cellular network. We consider a D2D pair in the presence of an MBS and a femto
access point, each serving a user, with quality of service constraints for all
users. Our proposed solution encompasses mode selection (choosing between
cellular or reuse or dedicated mode), resource allocation (in cellular and
dedicated mode) and power control (in reuse mode) within a single framework.
The framework prioritizes D2D dedicated mode if the D2D pair are close to each
other and orthogonal resources are available. Otherwise, it allows D2D reuse
mode if the D2D satisfies both the maximum distance and an additional
interference criteria. For reuse mode, we present a geometric vertex search
approach to solve the power allocation problem. We analytically prove the
validity of this approach and show that it achieves near optimal performance.
For cellular and dedicated modes, we show that frequency sharing maximizes sum
rate and solve the resource allocation problem in closed form. Our simulations
demonstrate the advantages of the proposed framework in terms of the
performance gains achieved in D2D mode.Comment: Submitted for possible journal publicatio
RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors.
This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance
Imaging cell lineage with a synthetic digital recording system
Cell lineage plays a pivotal role in cell fate determination. Chow et al. demonstrate the use of an integrase-based synthetic barcode system called intMEMOIR, which uses the serine integrase Bxb1 to perform irreversible nucleotide edits. Inducible editing either deletes or inverts its target region, thus encoding information in three-state memory elements, or trits, and avoiding undesired recombination events. Using intMEMOIR combined with single-molecule fluorescence in situ hybridization, the authors were able to identify clonal structures as well as gene expression patterns in the fly brain, enabling both clonal analysis and expression profiling with intact spatial information. The ability to visualize cell lineage relationships directly within their native tissue context provides insights into development and disease
- …
