446 research outputs found
Dynamic Graphs on the GPU
We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4–14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures
On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters
The predominance of Kohn-Sham density functional theory (KS-DFT) for the
theoretical treatment of large experimentally relevant systems in molecular
chemistry and materials science relies primarily on the existence of efficient
software implementations which are capable of leveraging the latest advances in
modern high performance computing (HPC). With recent trends in HPC leading
towards in increasing reliance on heterogeneous accelerator based architectures
such as graphics processing units (GPU), existing code bases must embrace these
architectural advances to maintain the high-levels of performance which have
come to be expected for these methods. In this work, we purpose a three-level
parallelism scheme for the distributed numerical integration of the
exchange-correlation (XC) potential in the Gaussian basis set discretization of
the Kohn-Sham equations on large computing clusters consisting of multiple GPUs
per compute node. In addition, we purpose and demonstrate the efficacy of the
use of batched kernels, including batched level-3 BLAS operations, in achieving
high-levels of performance on the GPU. We demonstrate the performance and
scalability of the implementation of the purposed method in the NWChemEx
software package by comparing to the existing scalable CPU XC integration in
NWChem.Comment: 26 pages, 9 figure
Large-Scale Gaussian Processes via Alternating Projection
Gaussian process (GP) hyperparameter optimization requires repeatedly solving
linear systems with kernel matrices. To address the prohibitive
time complexity, recent work has employed fast iterative
numerical methods, like conjugate gradients (CG). However, as datasets increase
in magnitude, the corresponding kernel matrices become increasingly
ill-conditioned and still require space without
partitioning. Thus, while CG increases the size of datasets GPs can be trained
on, modern datasets reach scales beyond its applicability. In this work, we
propose an iterative method which only accesses subblocks of the kernel matrix,
effectively enabling \emph{mini-batching}. Our algorithm, based on alternating
projection, has per-iteration time and space complexity,
solving many of the practical challenges of scaling GPs to very large datasets.
Theoretically, we prove our method enjoys linear convergence and empirically we
demonstrate its robustness to ill-conditioning. On large-scale benchmark
datasets up to four million datapoints our approach accelerates training by a
factor of 2 to 27 compared to CG
Trajectory Similarity Measurement: An Efficiency Perspective
Trajectories that capture object movement have numerous applications, in
which similarity computation between trajectories often plays a key role.
Traditionally, the similarity between two trajectories is quantified by means
of heuristic measures, e.g., Hausdorff or ERP, that operate directly on the
trajectories. In contrast, recent studies exploit deep learning to map
trajectories to d-dimensional vectors, called embeddings. Then, some distance
measure, e.g., Manhattan or Euclidean, is applied to the embeddings to quantify
trajectory similarity. The resulting similarities are inaccurate: they only
approximate the similarities obtained using the heuristic measures. As distance
computation on embeddings is efficient, focus has been on achieving embeddings
yielding high accuracy.
Adopting an efficiency perspective, we analyze the time complexities of both
the heuristic and the learning-based approaches, finding that the time
complexities of the former approaches are not necessarily higher. Through
extensive experiments on open datasets, we find that, on both CPUs and GPUs,
only a few learning-based approaches can deliver the promised higher
efficiency, when the embeddings can be pre-computed, while heuristic approaches
are more efficient for one-off computations. Among the learning-based
approaches, the self-attention-based ones are the fastest to learn embeddings
that also yield the highest accuracy for similarity queries. These results have
implications for the use of trajectory similarity approaches given different
application requirements
- …