22 research outputs found
Dynamic Graphs on the GPU
We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4–14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures
Recommended from our members
Virtual Clay Modeling using Adaptive Distance Fields
This paper describes an approach for the parametrization and modeling of objects represented by adaptive distance fields (ADFs). ADFs support the construction of powerful solid modeling tools. They can represent surfaces of arbitrary and even changing topology, while providing a more intuitive user interface than control-point based structures such as B-splines. Using the octree structure, an adaptively refined quadrilateral mesh is constructed that is topologically equivalent to the surface. The mesh is then projected onto the surface using multiple projection and smoothing steps. The resulting mesh serves as the ``interface'' for interactive modeling operations and high-quality rendering
Recommended from our members
RXMesh: A GPU Mesh Data Structure
We propose a new static high-performance mesh data structure for triangle surface meshes on the GPU. Our data structure is carefully designed for parallel execution while capturing mesh locality and confining data access, as much as possible, within the GPU's fast shared memory. We achieve this by subdividing the mesh into patches and representing these patches compactly using a matrix-based representation. Our patching technique is decorated with ribbons, thin mesh strips around patches that eliminate the need to communicate between different computation thread blocks, resulting in consistent high throughput. We call our data structure RXMesh: Ribbon-matriX Mesh. We hide the complexity of our data structure behind a flexible but powerful programming model that helps deliver high performance by inducing load balance even in highly irregular input meshes. We show the efficacy of our programming model on common geometry processing applications—mesh smoothing and filtering, geodesic distance, and vertex normal computation. For evaluation, we benchmark our data structure against well-optimized GPU and (single and multi-core) CPU data structures and show significant speedups
Recommended from our members
Essentials of Parallel Graph Analytics
We identify the graph data structure, frontiers, operators, an iterative loop structure, and convergence conditions as essential components of graph analytics systems based on the native-graph approach. Using these essential components, we propose an abstraction that captures all the significant programming models within graph analytics, such as bulk-synchronous, asynchronous, shared-memory, message-passing, and push vs. pull traversals. Finally, we demonstrate the power of our abstraction with an elegant modern C++ implementation of single-source shortest path and its required components
Recommended from our members
Maximum Clique Enumeration on the GPU
We present an iterative breadth-first approach to maximum clique enumeration on the GPU. The memory required to store all of the intermediate clique candidates poses a significant challenge. To mitigate this issue, we employ a variety of strategies to prune away non-maximum candidates and present a thorough examination of the performance and memory benefits of each of these options. We also explore a windowing strategy as a middle-ground between breadth-first and depth-first approaches, and investigate the resulting tradeoff between parallel efficiency and memory usage. Our results demonstrate that when we are able to manage the memory requirements, our approach achieves high throughput for large graphs indicating this approach is a good choice for GPU performance. We demonstrate an average speedup of 1.9x over previous parallel work, and obtain our best performance on graphs with low average degree
Recommended from our members
A GPU Multiversion B-Tree
We introduce a GPU B-Tree that supports snapshots and offers updates, point queries, and linearizable multipoint queries. The supported operations can be performed in a phase-concurrent, asynchronous, or fully-concurrent fashion. Our B-Tree uses cache-line-sized nodes linked together to form a version list and a GPU epoch-based reclamation scheme to reclaim older nodes' versions safely. Our data structure supports snapshots with minimal overhead in point queries (1.04Ă— slower) and insertions (1.11Ă— slower) versus a B-Tree that does not support versioning. Our linearizable B-Tree performs similarly to the non-linearizable baseline for read-heavy workloads and 2.39Ă— slower for write-heavy workloads when performing concurrent range queries and insertions. In addition, we introduce different GPU-aware snapshot scopes that allow the use of our data structure for phase-concurrent (synchronous), stream-concurrent (asynchronous), and on-device fully-concurrent operations
Recommended from our members
A Programming Model for GPU Load Balancing
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when developing irregular-parallel algorithms on the GPU, and also improve the overall performance characteristics for such applications by allowing a quick path to experimentation with a variety of existing load-balancing techniques. Consequently, we also hope that by separating the concerns of load-balancing from work processing within our abstraction, managing and extending existing code to future architectures becomes easier
Recommended from our members
Dynamic Mesh Processing on the GPU
We propose a system for dynamic triangle mesh processing entirely on the GPU. Our system offers an efficient data structure that allows fast updates of the underlying mesh connectivity and attributes. Our data structure partitions the mesh into small patches which allows processing all dynamic updates for each patch within the GPU's fast shared memory. This allows us to rely on speculative processing for conflict handling, which has low rollback cost while maximizing parallelism and reducing the cost of locking. Our system also introduces a new programming model for dynamic mesh processing. The programming model offers concise semantics for dynamic updates, relieving the user from having to worry about conflicting updates in the context of parallel execution. Our programming model relies on the cavity operator, which is a general mesh update operator that formulates any dynamic operation as an element reinsertion by removing a set of mesh elements and inserting others in the created void. We used our system to implement Delaunay edge flips and isotropic remeshing applications on the GPU. Our system achieves a 3—18x speedup on large models compared to multithreaded CPU solutions. Despite our additional dynamic features, our data structure also outperforms state-of-the-art GPU static data structures in terms of speed and memory requirements
Recommended from our members
RXMesh: A GPU Mesh Data Structure
We propose a new static high-performance mesh data structure for triangle surface meshes on the GPU. Our data structure is carefully designed for parallel execution while capturing mesh locality and confining data access, as much as possible, within the GPU's fast shared memory. We achieve this by subdividing the mesh into patches and representing these patches compactly using a matrix-based representation. Our patching technique is decorated with ribbons, thin mesh strips around patches that eliminate the need to communicate between different computation thread blocks, resulting in consistent high throughput. We call our data structure RXMesh: Ribbon-matriX Mesh. We hide the complexity of our data structure behind a flexible but powerful programming model that helps deliver high performance by inducing load balance even in highly irregular input meshes. We show the efficacy of our programming model on common geometry processing applications—mesh smoothing and filtering, geodesic distance, and vertex normal computation. For evaluation, we benchmark our data structure against well-optimized GPU and (single and multi-core) CPU data structures and show significant speedups