20 research outputs found
LLVM Static Analysis for Program Characterization and Memory Reuse Profile Estimation
Profiling various application characteristics, including the number of
different arithmetic operations performed, memory footprint, etc., dynamically
is time- and space-consuming. On the other hand, static analysis methods,
although fast, can be less accurate. This paper presents an LLVM-based
probabilistic static analysis method that accurately predicts different program
characteristics and estimates the reuse distance profile of a program by
analyzing the LLVM IR file in constant time, regardless of program input size.
We generate the basic-block-level control flow graph of the target application
kernel and determine basic-block execution counts by solving the linear balance
equation involving the adjacent basic blocks' transition probabilities.
Finally, we represent the kernel memory accesses in a bracketed format and
employ a recursive algorithm to calculate the reuse distance profile. The
results show that our approach can predict application characteristics
accurately compared to another LLVM-based dynamic code analysis tool, Byfl.Comment: This paper was accepted at the MEMSYS '23 conference, The
International Symposium on Memory Systems, October 02, 2023 - October 05,
2023, Alexandria, V
Elastic Provisioning of Cloud Caches: a Cost-aware TTL Approach
We consider elastic resource provisioning in the cloud, focusing on in-memory
key-value stores used as caches. Our goal is to dynamically scale resources to
the traffic pattern minimizing the overall cost, which includes not only the
storage cost, but also the cost due to misses. In fact, a small variation on
the cache miss ratio may have a significant impact on user perceived
performance in modern web services, which in turn has an impact on the overall
revenues for the content provider that uses those services. We propose and
study a dynamic algorithm for TTL caches, which is able to obtain
close-to-minimal costs. Since high-throughput caches require low complexity
operations, we discuss a practical implementation of such a scheme requiring
constant overhead per request independently from the cache size. We evaluate
our solution with real-world traces collected from Akamai, and show that we are
able to obtain a 17% decrease in the overall cost compared to a baseline static
configuration
mPart: Miss Ratio Curve Guided Partitioning in Key-Value Stores
Web applications employ key-value stores to cache the data that is most commonly accessed. The cache improves an web application’s performance by serving its requests from memory, avoiding fetching them from the backend database. Since the memory space is limited, maximizing the memory utilization is a key to delivering the best performance possible. This has lead to the use of multi-tenant systems, allowing applications to share cache space. In addition, application data access patterns change over time, so the system should be adaptive in its memory allocation. In this thesis, we address both multi-tenancy (where a single cache is used for mul- tiple applications) and dynamic workloads (changing access patterns) using a model that relates the cache size to the application miss ratio, known as a miss ratio curve. Intuitively, the larger the cache, the less likely the system will need to fetch the data from the database. Our efficient, online construction of the miss ratio curve allows us to determine a near optimal memory allocation given the available system memory, while adapting to changing data access patterns. We show that our model outper- forms an existing state-of-the-art sharing model, Memshare, in terms of cache hit ratio and does so at a lower time cost. We show that average hit ratio is consistently 1 percentage point greater and 99.9th percentile latency is reduced by as much as 2.9% under standard web application workloads containing millions of requests
Graph Locality Prefetcher for Graph Database
This work presents a hardware prefetcher to improve the performance of accessing graph data representing large and complex networks. We represent complex networks as graphs, and queries amount to traversals on the graph. Unlike conventional memory hierarchies that exploit spatial and temporal locality, we observe that graph traversals do not necessarily exhibit these same notions of locality. This results in degraded performance of the memory hierarchy. Consequently, our hardware prefetcher exploits locality that is intrinsic to graph traversals, which we call graph-locality to improve the performance of the memory hierarchy. We design and evaluate our prototype using a micro-architectural simulator, and deploy benchmarks from GDBench that is oriented to evaluate the performance of graph database systems.1 yea