678 research outputs found

    A pilgrimage to gravity on GPUs

    Get PDF
    In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

    Persistent Homology Guided Force-Directed Graph Layouts

    Full text link
    Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets

    Approximated and User Steerable tSNE for Progressive Visual Analytics

    Full text link
    Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis

    The Cluster Multipole Algorithm for Far-Field Computations

    Get PDF
    Computer simulations of N-body systems are beneficial to study the overall behavior of a number of physical systems in fields such as astrophysics, molecular dynamics, and computational fluid dynamics. A new approach for computer simulations of N-body systems is proposed in this research. The new algorithm is called the Cluster Multipole Algorithm (CMA). The goals of the new algorithm are to improve the applicability to non-point sources and to provide more control on the accuracy over current algorithms. The algorithm is targeted to applications that do not require rebuilding the data structure about the system every time step due to current limitations in the construction of the data structure. Examples of slowly changing systems can be found in molecular dynamics, capacitance, and computational fluid dynamics simulations. As the data structure development is improved, the new algorithm will be applicable to a wider range of applications. The CMA exhibits the flexibility of both Appel\u27s algorithm and the Fast Multipole Method (FMM) without sacrificing the order of computation (O(N)) for well structured clusters. The CMA provides more control on the accuracy of computations as compared to both the FMM and Appel\u27s algorithm resulting in enhanced performance. A set of requirements are imposed on the data structures which are applicable, to maintain O(N) computation. However, the algorithm is capable of handling a wide range of data structures beyond the FMM

    Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences

    Full text link
    Transformer-based models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of self-attention with respect to the input length hinders the applicability of Transformer-based models to long sequences. To address this, we present Fast Multipole Attention, a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention for sequences of length nn from O(n2)\mathcal{O}(n^2) to O(nlogn)\mathcal{O}(n \log n) or O(n)O(n), while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into O(logn)\mathcal{O}( \log n) levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. The overall complexity of Fast Multipole Attention is O(n)\mathcal{O}(n) or O(nlogn)\mathcal{O}(n \log n), depending on whether the queries are down-sampled or not. This multi-level divide-and-conquer strategy is inspired by fast summation methods from nn-body physics and the Fast Multipole Method. We perform evaluation on autoregressive and bidirectional language modeling tasks and compare our Fast Multipole Attention model with other efficient attention variants on medium-size datasets. We find empirically that the Fast Multipole Transformer performs much better than other efficient transformers in terms of memory size and accuracy. The Fast Multipole Attention mechanism has the potential to empower large language models with much greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences
    corecore