20 research outputs found

    LLVM Static Analysis for Program Characterization and Memory Reuse Profile Estimation

    Full text link
    Profiling various application characteristics, including the number of different arithmetic operations performed, memory footprint, etc., dynamically is time- and space-consuming. On the other hand, static analysis methods, although fast, can be less accurate. This paper presents an LLVM-based probabilistic static analysis method that accurately predicts different program characteristics and estimates the reuse distance profile of a program by analyzing the LLVM IR file in constant time, regardless of program input size. We generate the basic-block-level control flow graph of the target application kernel and determine basic-block execution counts by solving the linear balance equation involving the adjacent basic blocks' transition probabilities. Finally, we represent the kernel memory accesses in a bracketed format and employ a recursive algorithm to calculate the reuse distance profile. The results show that our approach can predict application characteristics accurately compared to another LLVM-based dynamic code analysis tool, Byfl.Comment: This paper was accepted at the MEMSYS '23 conference, The International Symposium on Memory Systems, October 02, 2023 - October 05, 2023, Alexandria, V

    Elastic Provisioning of Cloud Caches: a Cost-aware TTL Approach

    Get PDF
    We consider elastic resource provisioning in the cloud, focusing on in-memory key-value stores used as caches. Our goal is to dynamically scale resources to the traffic pattern minimizing the overall cost, which includes not only the storage cost, but also the cost due to misses. In fact, a small variation on the cache miss ratio may have a significant impact on user perceived performance in modern web services, which in turn has an impact on the overall revenues for the content provider that uses those services. We propose and study a dynamic algorithm for TTL caches, which is able to obtain close-to-minimal costs. Since high-throughput caches require low complexity operations, we discuss a practical implementation of such a scheme requiring constant overhead per request independently from the cache size. We evaluate our solution with real-world traces collected from Akamai, and show that we are able to obtain a 17% decrease in the overall cost compared to a baseline static configuration

    mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores

    Get PDF
    Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its requests from memory, avoiding fetching them from the backend database. Since the memory space is limited, maximizing the memory utilization is a key to delivering the best performance possible. This has lead to the use of multi-tenant systems, allowing applications to share cache space. In addition, application data access patterns change over time, so the system should be adaptive in its memory allocation. In this thesis, we address both multi-tenancy (where a single cache is used for mul- tiple applications) and dynamic workloads (changing access patterns) using a model that relates the cache size to the application miss ratio, known as a miss ratio curve. Intuitively, the larger the cache, the less likely the system will need to fetch the data from the database. Our efficient, online construction of the miss ratio curve allows us to determine a near optimal memory allocation given the available system memory, while adapting to changing data access patterns. We show that our model outper- forms an existing state-of-the-art sharing model, Memshare, in terms of cache hit ratio and does so at a lower time cost. We show that average hit ratio is consistently 1 percentage point greater and 99.9th percentile latency is reduced by as much as 2.9% under standard web application workloads containing millions of requests

    Graph Locality Prefetcher for Graph Database

    Get PDF
    This work presents a hardware prefetcher to improve the performance of accessing graph data representing large and complex networks. We represent complex networks as graphs, and queries amount to traversals on the graph. Unlike conventional memory hierarchies that exploit spatial and temporal locality, we observe that graph traversals do not necessarily exhibit these same notions of locality. This results in degraded performance of the memory hierarchy. Consequently, our hardware prefetcher exploits locality that is intrinsic to graph traversals, which we call graph-locality to improve the performance of the memory hierarchy. We design and evaluate our prototype using a micro-architectural simulator, and deploy benchmarks from GDBench that is oriented to evaluate the performance of graph database systems.1 yea
    corecore