Search CORE

678 research outputs found

A pilgrimage to gravity on GPUs

Author: A. Ahmad
A. Gualandris
A. Tanikawa
E. Gaburov
E. Holmberg
E.N. Dorband
G.J. Sussman
J. Barnes
J. Bédorf
J. Bédorf
J. Goodman
J. Makino
J.H. Applegate
J.R. Hurley
K. Nitadori
L. Nyland
M. Fujii
P. Hut
R. Spurzem
R. Spurzem
R. Spurzem
R. Yokota
R.G. Belleman
R.H. Miller
S. Harfst
S. Inagaki
S. Portegies Zwart
S. Portegies Zwart
S. Portegies Zwart
S. von Hoerner
S.F. Portegies Zwart
S.F. Portegies Zwart
S.J. Aarseth
S.J. Aarseth
T. Fukushige
T.S. van Albada
W. Dehnen
W. Dehnen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2012
Field of study

In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Leiden University Scholary Publications

Persistent Homology Guided Force-Directed Graph Layouts

Author: Hajij Mustafa
Rosen Paul
Scheidegger Carlos
Suh Ashley
Wang Bei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets

arXiv.org e-Print Archive

Crossref

Recommended from our members

A High-Performance Domain-Specific Language and Code Generator for General N-body Problems

Author: Aghababaie Beni Laleh
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

General N-body problems are a set of problems in which an update to a single element in the system depends on every other element. N-body problems are ubiquitous, with applications in various domains ranging from scientific computing simulations in molecular dynamics, astrophysics, acoustics, and fluid dynamics all the way to computer vision, data mining and machine learning problems. Different N-body algorithms have been designed and implemented in these various fields. However, there is a big gap between the algorithm one designs on paper and the code that runs efficiently on a parallel system. It is time-consuming to write fast, parallel, and scalable code for these problems. On the other hand, the sheer scale and growth of modern scientific datasets necessitate exploiting the power of both parallel and approximation algorithms where there is a potential to trade-off accuracy for performance. The main problem that we are tackling in this thesis is how to automatically generate asymptotically optimal N-body algorithms from the high-level specification of the problem. We combine the body of work in performance optimizations, compilers and the domain of N-body problems to build a unified system where domain scientists can write programs at the high level while attaining performance of code written by an expert at the low level.In order to generate a high-performance, scalable code for this group of problems, we take the following steps in this thesis; first, we propose a unified algorithmic framework named PASCAL in order to address the challenge of designing a general algorithmic template to represent the class of N-body problems. PASCAL utilizes space-partitioning trees and user-controlled pruning/approximations to reduce the asymptotic runtime complexity from linear to logarithmic in the number of data points. In PASCAL, we design an algorithm that automatically generates conditions for pruning or approximation of an N-body problem considering the problem's definition. In order to evaluate PASCAL, we developed tree-based algorithms for six well-known problems: k-nearest neighbors, range search, minimum spanning tree, kernel density estimation, expectation maximization, and Hausdorff distance. We show that applying domain-specific optimizations and parallelization to the algorithms written in PASCAL achieves 10x to 230x speedup compared to state-of-the-art libraries on a dual-socket Intel Xeon processor with 16 cores on real-world datasets. Second, we extend the PASCAL framework to build PASCAL-X that adds support for NUMA-aware parallelization. PASCAL-X also presents insights on the influence of tuning parameters. Tuning parameters such as leaf size (influences the shape of the tree) and cut-off level (controls the granularity of tasks) of the space-partitioning trees result in performance improvement of up to 4.6x. A key goal is to generate scalable and high-performance code automatically without sacrificing productivity. That implies minimizing the effort the users have to put in to generate the desired high-performance code. Another critical factor is the adaptivity, which indicates the amount of effort that is required to extend the high-performance code generation to new N-body problems. Finally, we consider these factors and develop a domain-specific language and code generator named Portal, which is built on top of PASCAL-X. Portal's language design is inspired by the mathematical representation of N-body problems, resulting in an intuitive language for rapid implementation of a variety of problems. Portal's back-end is designed and implemented to generate optimized, parallel, and scalable implementations for multi-core systems. We demonstrate that the performance achieved by using Portal is comparable to that of expert hand-optimized code while providing productivity for domain scientists. For instance, using Portal for the k-nearest neighbors problem gains performance that is similar to the hand-optimized code, while reducing the lines of code by 68x. To the best of our knowledge, there are no known libraries or frameworks that implement parallel asymptotically optimal algorithms for the class of general N-body problems and this thesis primarily aims to fill this gap. Finally, we present a case study of Portal for the real-world problem of face clustering. In this case study, we show that Portal not only provides a fast solution for the face clustering problem with similar accuracy as the state-of-the-art algorithm, but also it provides productivity by implementing the face clustering algorithm in only 14 lines of Portal code

eScholarship - University of California

Approximated and User Steerable tSNE for Progressive Visual Analytics

Author: Eisemann Elmar
Höllt Thomas
Lelieveldt Boudewijn P. F.
Pezzotti Nicola
van der Maaten Laurens
Vilanova Anna
Publication venue
Publication date: 01/01/2016
Field of study

Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis

arXiv.org e-Print Archive

Repository TU/e

TU Delft Repository

Pure OAI Repository

Leiden University Scholary Publications

The Cluster Multipole Algorithm for Far-Field Computations

Author: Patel Rakesh R.
Publication venue: ODU Digital Commons
Publication date: 01/07/1998
Field of study

Computer simulations of N-body systems are beneficial to study the overall behavior of a number of physical systems in fields such as astrophysics, molecular dynamics, and computational fluid dynamics. A new approach for computer simulations of N-body systems is proposed in this research. The new algorithm is called the Cluster Multipole Algorithm (CMA). The goals of the new algorithm are to improve the applicability to non-point sources and to provide more control on the accuracy over current algorithms. The algorithm is targeted to applications that do not require rebuilding the data structure about the system every time step due to current limitations in the construction of the data structure. Examples of slowly changing systems can be found in molecular dynamics, capacitance, and computational fluid dynamics simulations. As the data structure development is improved, the new algorithm will be applicable to a wider range of applications. The CMA exhibits the flexibility of both Appel\u27s algorithm and the Fast Multipole Method (FMM) without sacrificing the order of computation (O(N)) for well structured clusters. The CMA provides more control on the accuracy of computations as compared to both the FMM and Appel\u27s algorithm resulting in enhanced performance. A set of requirements are imposed on the data structures which are applicable, to maintain O(N) computation. However, the algorithm is capable of handling a wide range of data structures beyond the FMM

Old Dominion University

Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences

Author: De Sterck Hans
Kang Yanming
Tran Giang
Publication venue
Publication date: 20/10/2023
Field of study

Transformer-based models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of self-attention with respect to the input length hinders the applicability of Transformer-based models to long sequences. To address this, we present Fast Multipole Attention, a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention for sequences of length

n

from

\mathcal{O}(n^2)

\mathcal{O}(n \log n)

O(n)

, while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into

\mathcal{O}( \log n)

levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. The overall complexity of Fast Multipole Attention is

\mathcal{O}(n)

\mathcal{O}(n \log n)

, depending on whether the queries are down-sampled or not. This multi-level divide-and-conquer strategy is inspired by fast summation methods from

n

-body physics and the Fast Multipole Method. We perform evaluation on autoregressive and bidirectional language modeling tasks and compare our Fast Multipole Attention model with other efficient attention variants on medium-size datasets. We find empirically that the Fast Multipole Transformer performs much better than other efficient transformers in terms of memory size and accuracy. The Fast Multipole Attention mechanism has the potential to empower large language models with much greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences

arXiv.org e-Print Archive