Search CORE

255 research outputs found

Scalable Fast Multipole Methods on Heterogeneous Architecture

Author: Hu Qi
Publication venue
Publication date: 01/01/2013
Field of study

The N-body problem appears in many computational physics simulations. At each time step the computation involves an all-pairs sum whose complexity is quadratic, followed by an update of particle positions. This cost means that it is not practical to solve such dynamic N-body problems on large scale. To improve this situation, we use both algorithmic and hardware approaches. Our algorithmic approach is to use the Fast Multipole Method (FMM), which is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop, to reduce such quadratic complexity to linear with guaranteed accuracy. Our hardware approach is to use heterogeneous clusters, which comprised of nodes that contain multi-core CPUs tightly coupled with accelerators, such as graphics processors unit (GPU) as our underline parallel processing hardware, on which efficient implementations require highly non-trivial re-designed algorithms. In this dissertation, we fundamentally reconsider the FMM algorithms on heterogeneous architectures to achieve a significant improvement over recent/previous implementations in literature and to make the algorithm ready for use as a workhorse simulation tool for both time-dependent vortex flow problems and for boundary element methods. Our major contributions include: 1. Novel FMM data structures using parallel construction algorithms for dynamic problems. 2. A fast hetegenenous FMM algorithm for both single and multiple computing nodes. 3. An efficient inter-node communication management using fast parallel data structures. 4. A scalable FMM algorithm using novel Helmholz decomposition for Vortex Methods (VM). The proposed algorithms can handle non-uniform distributions with irregular partition shapes to achieve workload balance and their MPI-CUDA implementations are highly tuned up and demonstrate the state of the art performances

Digital Repository at the University of Maryland

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Author: Cheung Alvin
Kemper Alfons
Palkar Shoumik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/06/2018
Field of study

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

arXiv.org e-Print Archive

Crossref

Irregular Coarse-Grain Data Parallelism under LPARX

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1996
Field of study

Crossref

Static Analysis-based Debugging, Certification, Testing, and Optimization with CiaoPP

Author: Albert Albiol Elvira
Arenas Sánchez Purificación
Bueno Carrillo Francisco
Carro Liñares Manuel
Casas Amadeo
Correas Fernandez Jesús
Haemmerlé R.
Hermenegildo Manuel V.
López García Pedro
Mera E.
Morales J.
Méndez-Lojo Mario
Navas J.
Puebla Sánchez Alvaro Germán
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2010
Field of study

Facilitate the development of safe, eﬃcient programs. Approach: •Next-generation, higher-level, multiparadigm prog. languages. •Improved program development environments. •A framework (CiaoPP) which integrates: •Debugging. •Veriﬁcation and certiﬁcation. •Testing. •Optimization (optimized compilation, parallelization, ...

Archivo Digital UPM

Graphical Reasoning in Compact Closed Categories for Quantum Computation

Author: A Asperti
A Bundy
A Schfürr
AK Pati
AP Sexton
B Coecke
D Janssens
GM Kelly
H Ehrig
J Kock
LC Paulson
Lucas Dixon
M Pollet
PPP Velasco
PW Shor
R Prince
R Raussendorf
Ross Duncan
S Abramsky
S Abramsky
W Wootters
Publication venue
Publication date: 01/01/2009
Field of study

Compact closed categories provide a foundational formalism for a variety of important domains, including quantum computation. These categories have a natural visualisation as a form of graphs. We present a formalism for equational reasoning about such graphs and develop this into a generic proof system with a fixed logical kernel for equational reasoning about compact closed categories. Automating this reasoning process is motivated by the slow and error prone nature of manual graph manipulation. A salient feature of our system is that it provides a formal and declarative account of derived results that can include `ellipses'-style notation. We illustrate the framework by instantiating it for a graphical language of quantum computation and show how this can be used to perform symbolic computation.Comment: 21 pages, 9 figures. This is the journal version of the paper published at AIS

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Oxford University Research Archive

Scalable parallel molecular dynamics algorithms for organic systems

Author: Vemparala Satyavani
Publication venue: LSU Digital Commons
Publication date: 01/01/2003
Field of study

A scalable parallel algorithm, Macro-Molecular Dynamics (MMD), has been developed for large-scale molecular dynamics simulations of organic macromolecules, based on space-time multi-resolution techniques and dynamic management of distributed lists. The algorithm also includes the calculation of long range forces using Fast Multipole Method (FMM). FMM is based on the octree data structure, in which each parent cell is divided into 8 child cells and this division continues until the cell size is equal to the non-bonded interaction cutoff length. Due to constant number of operations performed at each stage of the octree, the FMM algorithm scales as O(N). Design and analysis of MMD and FMM algorithms are presented. Scalability tests are performed on three tera-flop machines: 1024-processor Intel Xeon-based Linux cluster, SuperMike at LSU, 1184-processor IBM SP4 Marcellus and the 512-processor Compaq AlphaServer Emerald at the U.S. Army Engineer Research and Development Center (ERDC) MSRC. The tests show that the Linux cluster outperforms the SP4 for the MMD application. The tests also show significant effects of memory- and cache-sharing on the performance

Louisiana State University

Verified lifting of stencil computations

Author: Alur Rajeev
Alvin Cheung
Amarasinghe Saman P
Armando Solar-Lezama
Catanzaro Bryan
Heroux Michael A.
Jeon Jinseong
Kamil Shoaib
Karbyshev Aleksandr
Lezama Armando Solar
Mallinson A.C.
Plotkin Gordon D
Reynolds John C.
Shachar Itzhaky
Shoaib Kamil
Vasco Diego
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2016
Field of study

This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.United States. Department of Energy. Office of Science (Award DE-SC0008923)United States. Department of Energy. Office of Science (Award DE-SC0005288

DSpace@MIT

Crossref

Composable architecture for rack scale big data computing

Author: Abali Bulent
Chang Victor
Franke Hubertus
Kesavan Mukil
Li Chung-Sheng
Parris Colin
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

The rapid growth of cloud computing, both in terms of the spectrum and volume of cloud workloads, necessitate re-visiting the traditional rack-mountable servers based datacenter design. Next generation datacenters need to offer enhanced support for: (i) fast changing system configuration requirements due to workload constraints, (ii) timely adoption of emerging hardware technologies, and (iii) maximal sharing of systems and subsystems in order to lower costs. Disaggregated datacenters, constructed as a collection of individual resources such as CPU, memory, disks etc., and composed into workload execution units on demand, are an interesting new trend that can address the above challenges. In this paper, we demonstrated the feasibility of composable systems through building a rack scale composable system prototype using PCIe switch. Through empirical approaches, we develop assessment of the opportunities and challenges for leveraging the composable architecture for rack scale cloud datacenters with a focus on big data and NoSQL workloads. In particular, we compare and contrast the programming models that can be used to access the composable resources, and developed the implications for the network and resource provisioning and management for rack scale architecture

Southampton (e-Prints Soton)

Crossref