89,689 research outputs found

    Trace-level reuse

    Get PDF
    Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, the instructions that make up such traces have the same source operand values. The execution of such traces will obviously produce the same outcome and thus, their execution can be skipped if the processor records the outcome of previous executions. This paper presents an analysis of the performance potential of trace-level reuse and discusses a preliminary realistic implementation. Like instruction-level reuse, trace-level reuse can improve performance by decreasing resource contention and the latency of some instructions. However, we show that trace-level reuse is more effective than instruction-level reuse because the former can avoid fetching the instructions of reused traces. This has two important benefits: it reduces the fetch bandwidth requirements, and it increases the effective instruction window size since these instructions do not occupy window entries. Moreover, trace-level reuse can compute all at once the result of a chain of dependent instructions, which may allow the processor to avoid the serialization caused by data dependences and thus, to potentially exceed the dataflow limit.Peer ReviewedPostprint (published version

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Get PDF
    Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

    Trace-level speculative multithreaded architecture

    Get PDF
    This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant penalties are introduced in the presence of a misspeculation and (b) any type of trace predictor can work together with this proposal. In this way, aggressive trace predictors can be incorporated since misspeculations do not introduce significant penalties. We describe in detail TSMA (trace-level speculative multithreaded architecture) and present initial results to show the benefits of this proposal. We show how simple trace predictors achieve significant speed-up in the majority of cases. Results of a simple trace speculation mechanism show an average speed-up of 16%.Peer ReviewedPostprint (published version

    Refactoring intermediately executed code to reduce cache capacity misses

    Get PDF
    The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

    Cooperative announcement-based caching for video-on-demand streaming

    Get PDF
    Recently, video-on-demand (VoD) streaming services like Netflix and Hulu have gained a lot of popularity. This has led to a strong increase in bandwidth capacity requirements in the network. To reduce this network load, the design of appropriate caching strategies is of utmost importance. Based on the fact that, typically, a video stream is temporally segmented into smaller chunks that can be accessed and decoded independently, cache replacement strategies have been developed that take advantage of this temporal structure in the video. In this paper, two caching strategies are proposed that additionally take advantage of the phenomenon of binge watching, where users stream multiple consecutive episodes of the same series, reported by recent user behavior studies to become the everyday behavior. Taking into account this information allows us to predict future segment requests, even before the video playout has started. Two strategies are proposed, both with a different level of coordination between the caches in the network. Using a VoD request trace based on binge watching user characteristics, the presented algorithms have been thoroughly evaluated in multiple network topologies with different characteristics, showing their general applicability. It was shown that in a realistic scenario, the proposed election-based caching strategy can outperform the state-of-the-art by 20% in terms of cache hit ratio while using 4% less network bandwidth

    Content Reuse and Interest Sharing in Tagging Communities

    Full text link
    Tagging communities represent a subclass of a broader class of user-generated content-sharing online communities. In such communities users introduce and tag content for later use. Although recent studies advocate and attempt to harness social knowledge in this context by exploiting collaboration among users, little research has been done to quantify the current level of user collaboration in these communities. This paper introduces two metrics to quantify the level of collaboration: content reuse and shared interest. Using these two metrics, this paper shows that the current level of collaboration in CiteULike and Connotea is consistently low, which significantly limits the potential of harnessing the social knowledge in communities. This study also discusses implications of these findings in the context of recommendation and reputation systems.Comment: 6 pages, 6 figures, AAAI Spring Symposium on Social Information Processin
    • …
    corecore