Search CORE

16 research outputs found

Improving the scalability of parallel N-body applications with an event driven constraint based execution model

Author: Aarseth SJ
Alfieri RA
Bonachea D
Chandra R
Dekate C
El-Ghazawi T
Hewitt C
Kale L
Message Passing Interface Forum
O’Shea BW
Salmon JK
Singh JP
Publication venue: 'SAGE Publications'
Publication date: 23/09/2011
Field of study

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

arXiv.org e-Print Archive

Crossref

Visualization analysis of astrophysics n-bodied problem using image morphological processing techniques

Author: Borrelli Thomas J.
Publication venue: RIT Scholar Works
Publication date: 01/01/2007
Field of study

This project’s primary goal is to detect points of interest within the output data resulting from running a simulation of the Astrophysics N-Bodied problem (GRAPEcluster). Morphological Image Processing techniques will be applied to the visualized data in order to detect areas of interest within the original data. Several Morphological Image Processing techniques will be used and the results compared in the analysis. The final output of the VRAD (Visualization of Raw Astrophysics Data) System will be two-fold: first, the VRAD system will output a text file that contains the x, y and z coordinates of each region of interest in each time slice that is examined; second, the VRAD system will output three image files for each time-slice with the 2D regions of interest highlighted by a bounding box. In this way the VRAD system can act as a stand-alone program or be used in conjunction with the Spiegel visualization framework

RIT Scholar Works

Extreme scale parallel NBody algorithm with event driven constraint based execution model

Author: Dekate Chirag
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

Traditional scientific applications such as Computational Fluid Dynamics, Partial Differential Equations based numerical methods (like Finite Difference Methods, Finite Element Methods) achieve sufficient efficiency on state of the art high performance computing systems and have been widely studied / implemented using conventional programming models. For emerging application domains such as Graph applications scalability and efficiency is significantly constrained by the conventional systems and their supporting programming models. Furthermore technology trends like multicore, manycore, heterogeneous system architectures are introducing new challenges and possibilities. Emerging technologies are requiring a rethinking of approaches to more effectively expose the underlying parallelism to the applications and the end-users. This thesis explores the space of effective parallel execution of ephemeral graphs that are dynamically generated. The standard particle based simulation, solved using the Barnes-Hut algorithm is chosen to exemplify the dynamic workloads. In this thesis the workloads are expressed using sequential execution semantics, a conventional parallel programming model - shared memory semantics and semantics of an innovative execution model designed for efficient scalable performance towards Exascale computing called ParalleX. The main outcomes of this research are parallel processing of dynamic ephemeral workloads, enabling dynamic load balancing during runtime, and using advanced semantics for exposing parallelism in scaling constrained applications

CiteSeerX

Louisiana State University

Recommended from our members

A High-Performance Domain-Specific Language and Code Generator for General N-body Problems

Author: Aghababaie Beni Laleh
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

General N-body problems are a set of problems in which an update to a single element in the system depends on every other element. N-body problems are ubiquitous, with applications in various domains ranging from scientific computing simulations in molecular dynamics, astrophysics, acoustics, and fluid dynamics all the way to computer vision, data mining and machine learning problems. Different N-body algorithms have been designed and implemented in these various fields. However, there is a big gap between the algorithm one designs on paper and the code that runs efficiently on a parallel system. It is time-consuming to write fast, parallel, and scalable code for these problems. On the other hand, the sheer scale and growth of modern scientific datasets necessitate exploiting the power of both parallel and approximation algorithms where there is a potential to trade-off accuracy for performance. The main problem that we are tackling in this thesis is how to automatically generate asymptotically optimal N-body algorithms from the high-level specification of the problem. We combine the body of work in performance optimizations, compilers and the domain of N-body problems to build a unified system where domain scientists can write programs at the high level while attaining performance of code written by an expert at the low level.In order to generate a high-performance, scalable code for this group of problems, we take the following steps in this thesis; first, we propose a unified algorithmic framework named PASCAL in order to address the challenge of designing a general algorithmic template to represent the class of N-body problems. PASCAL utilizes space-partitioning trees and user-controlled pruning/approximations to reduce the asymptotic runtime complexity from linear to logarithmic in the number of data points. In PASCAL, we design an algorithm that automatically generates conditions for pruning or approximation of an N-body problem considering the problem's definition. In order to evaluate PASCAL, we developed tree-based algorithms for six well-known problems: k-nearest neighbors, range search, minimum spanning tree, kernel density estimation, expectation maximization, and Hausdorff distance. We show that applying domain-specific optimizations and parallelization to the algorithms written in PASCAL achieves 10x to 230x speedup compared to state-of-the-art libraries on a dual-socket Intel Xeon processor with 16 cores on real-world datasets. Second, we extend the PASCAL framework to build PASCAL-X that adds support for NUMA-aware parallelization. PASCAL-X also presents insights on the influence of tuning parameters. Tuning parameters such as leaf size (influences the shape of the tree) and cut-off level (controls the granularity of tasks) of the space-partitioning trees result in performance improvement of up to 4.6x. A key goal is to generate scalable and high-performance code automatically without sacrificing productivity. That implies minimizing the effort the users have to put in to generate the desired high-performance code. Another critical factor is the adaptivity, which indicates the amount of effort that is required to extend the high-performance code generation to new N-body problems. Finally, we consider these factors and develop a domain-specific language and code generator named Portal, which is built on top of PASCAL-X. Portal's language design is inspired by the mathematical representation of N-body problems, resulting in an intuitive language for rapid implementation of a variety of problems. Portal's back-end is designed and implemented to generate optimized, parallel, and scalable implementations for multi-core systems. We demonstrate that the performance achieved by using Portal is comparable to that of expert hand-optimized code while providing productivity for domain scientists. For instance, using Portal for the k-nearest neighbors problem gains performance that is similar to the hand-optimized code, while reducing the lines of code by 68x. To the best of our knowledge, there are no known libraries or frameworks that implement parallel asymptotically optimal algorithms for the class of general N-body problems and this thesis primarily aims to fill this gap. Finally, we present a case study of Portal for the real-world problem of face clustering. In this case study, we show that Portal not only provides a fast solution for the face clustering problem with similar accuracy as the state-of-the-art algorithm, but also it provides productivity by implementing the face clustering algorithm in only 14 lines of Portal code

eScholarship - University of California

Estudio e implementación del algoritmo Barnes-Hut para el cálculo de la interacción gravitatoria entre N-cuerpos

Author: Aguirre Pascual de Zulueta Maia
Publication venue
Publication date: 18/06/2020
Field of study

[EN] In this work, we provide an original implementation of the Barnes-Hut algorithm in Python 3.7 both in 2D and 3D. This algorithm solves approximately the N-body problem and is well known for achieving order by treating nearby bodies as single individuals when observed from a far enough distance. Besides, we came up with a clever scheme for grouping those bodies: a proof of concept that turned out to perform accurately. Further, we analyzed the validity range of our prototype and correlated it to the direct-sum algorithm by means of a selected set of test examples. Our implementation of the BH algorithm relies heavily on a deeply nested tree data structure. As such, its manipulation is fundamentelly recursive and highly complex. This is the reason why we additionally have included an extended explanation, with drawings and schemes, of the rather cumbersome bookkeeping strategy involved in its use. Finally, we have uploaded the full code to the GitHub platform and thereby made it publicly available.[ES] Este trabajo proporciona una implementación original del algoritmo Barnes-Hut en Python 3.7, tanto en 2D como en 3D. Este algoritmo resuelve el problema de N-cuerpos de forma aproximada y es conocido por lograr el orden al agrupar los cuerpos cercanos en un solo individuo cuando son observados desde una distancia lo suficientemente lejana. Además, concebimos un esquema original para agrupar esos cuerpos; esta prueba de concepto resultó funcionar con precisión. Asimismo, analizamos el rango de validez de nuestro prototipo y lo contrastamos con el algoritmo de suma directa mediante un conjunto seleccionado de ejemplos. Nuestra implementación del algoritmo BH depende en gran medida de una estructura de datos de tipo árbol profundamente anidada. Como tal, su manipulación es fundamentalmente recursiva y altamente compleja. Esta es la razón por la que hemos incluido una explicación extendida, con dibujos y esquemas, de la estrategia de contabilidad bastante engorrosa involucrada en su uso. Finalmente, hemos subido el código completo a la plataforma GitHub y, por lo tanto, lo hemos puesto a disposición del público

Archivo Digital para la Docencia y la Investigación

Estudio e implementación del algoritmo Barnes-Hut para el cálculo de la interacción gravitatoria entre N-cuerpos

Author: Aguirre Pascual de Zulueta Maia
Publication venue
Publication date: 18/06/2020
Field of study

Archivo Digital para la Docencia y la Investigación

Distribution independent parallel algorithms and software for hierarchical methods with applications to computational electromagnetics

Author: Hariharan Bhanu
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2003
Field of study

Octrees are tree data structures used to represent multidimensional points in space. They are widely used in supporting hierarchical methods for scientific applications such as the N-body problem, molecular dynamics and smoothed particle hydrodynamics. The size of an octree is known to be dependent on the spatial distribution of points in the computational domain and is not just a function of the number of points. For this reason, run-time of an algorithm using octree that depends on the size of the octree is unknown for arbitrary distributions. In this thesis, we present the design and implementation of parallel algorithms for construction of compressed octrees and queries that are typically used by hierarchical methods. Our parallel algorithms and implementation strategies perform well irrespective of the spatial distribution of data, are communication efficient, and require no explicit load balancing. We also developed a software library which provides the functionality of parallel tree construction and various queries on compressed octrees. The purpose of the library is to enable rapid development of applications and to allow application developers to use efficient parallel algorithms without necessity of having detailed knowledge of the algorithms or of implementing them. To demonstrate the performance of our algorithms and to show the effectiveness of the library, we developed a complete end-to-end parallel electromagnetics code for computing the scattered electromagnetic fields from a Perfect Electrically Conducting surface. We used the functions provided by the software library to develop a Fast Multipole Method based solution to this problem. Experimental results show that our algorithms scale well and have bounded communication irrespective of the shape of the scatterer

Digital Repository @ Iowa State University (ISU)