1,373 research outputs found
Recommended from our members
Flexible Pointer Analysis Using Assign-Fetch Graphs
We propose a new abstraction for pointer analysis that represents reads and writes to memory instead of traditional points-to relations. Compared to points-to graphs, our Assign-Fetch Graph (AFG) leads to concise procedure summaries that can be used in any calling context. Also, its flexibility supports new analysis techniques with different trade-offs between speed and precision. For efficiency, we build a summary for each procedure that assumes distinct pointers from the environment are not aliased and restore soundness when the summary is used in a context with aliases. We present two pointer analysis techniques based on our AFG. The first takes the flow-insensitive view adopted by many authors; the second considers statement ordering. In addition to being more precise, we find that this "flow-aware" analysis runs faster. We conclude with experimental results showing it is practical
Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
Pregel is a popular distributed computing model for dealing with large-scale
graphs. However, it can be tricky to implement graph algorithms correctly and
efficiently in Pregel's vertex-centric model, especially when the algorithm has
multiple computation stages, complicated data dependencies, or even
communication over dynamic internal data structures. Some domain-specific
languages (DSLs) have been proposed to provide more intuitive ways to implement
graph algorithms, but due to the lack of support for remote access --- reading
or writing attributes of other vertices through references --- they cannot
handle the above mentioned dynamic communication, causing a class of Pregel
algorithms with fast convergence impossible to implement.
To address this problem, we design and implement Palgol, a more declarative
and powerful DSL which supports remote access. In particular, programmers can
use a more declarative syntax called chain access to naturally specify dynamic
communication as if directly reading data on arbitrary remote vertices. By
analyzing the logic patterns of chain access, we provide a novel algorithm for
compiling Palgol programs to efficient Pregel code. We demonstrate the power of
Palgol by using it to implement several practical Pregel algorithms, and the
evaluation result shows that the efficiency of Palgol is comparable with that
of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape
Graph Locality Prefetcher for Graph Database
This work presents a hardware prefetcher to improve the performance of accessing graph data representing large and complex networks. We represent complex networks as graphs, and queries amount to traversals on the graph. Unlike conventional memory hierarchies that exploit spatial and temporal locality, we observe that graph traversals do not necessarily exhibit these same notions of locality. This results in degraded performance of the memory hierarchy. Consequently, our hardware prefetcher exploits locality that is intrinsic to graph traversals, which we call graph-locality to improve the performance of the memory hierarchy. We design and evaluate our prototype using a micro-architectural simulator, and deploy benchmarks from GDBench that is oriented to evaluate the performance of graph database systems.1 yea
Recommended from our members
Summary-Based Pointer Analysis Framework for Modular Bug Finding
Modern society is irreversibly dependent on computers and, consequently, on software. However, as the complexity of programs increase, so does the number of defects within them. To alleviate the problem, automated techniques are constantly used to improve software quality. Static analysis is one such approach in which violations of correctness properties are searched and reported. Static analysis has many advantages, but it is necessarily conservative because it symbolically executes the program instead of using real inputs, and it considers all possible executions simultaneously. Being conservative often means issuing false alarms, or missing real program errors. Pointer variables are a challenging aspect of many languages that can force static analysis tools to be overly conservative. It is often unclear what variables are affected by pointer-manipulating expressions, and aliasing between variables is one of the banes of program analysis. To alleviate that, a common solution is to allow the programmer to provide annotations such as declaring a variable as unaliased in a given scope, or providing special constructs such as the "never-null" pointer of Cyclone. However, programmers rarely keep these annotations up-to-date. The solution is to provide some form of pointer analysis, which derives useful information about pointer variables in the program. An appropriate pointer analysis equips the static tool so that it is capable of reporting more errors without risking too many false alarms. This dissertation proposes a methodology for pointer analysis that is specially tailored for "modular bug finding." It presents a new analysis space for pointer analysis, defined by finer-grain "dimensions of precision," which allows us to explore and evaluate a variety of different algorithms to achieve better trade-offs between analysis precision and efficiency. This framework is developed around a new abstraction for computing points-to sets, the Assign-Fetch Graph, that has many interesting features. Empirical evaluation shows promising results, as some unknown errors in well-known applications were discovered
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
Sparse matrix multiplication is an important kernel for large-scale graph
processing and other data-intensive applications. In this paper, we implement
various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times
sparse (SpGEMM) algorithms, evaluating their performance running in a
distributed memory setting on GPUs. Our RDMA-based implementations use the
NVSHMEM communication library for direct, asynchronous one-sided communication
between GPUs. We compare our asynchronous implementations to state-of-the-art
bulk synchronous GPU libraries as well as a CUDA-aware MPI implementation of
the SUMMA algorithm. We find that asynchronous RDMA-based implementations are
able to offer favorable performance compared to bulk synchronous
implementations, while also allowing for the straightforward implementation of
novel work stealing algorithms
- …