Search CORE

67,693 research outputs found

A New Parallel N-body Gravity Solver: TPM

Author: Xu Guohong
Publication venue: 'University of Chicago Press'
Publication date: 09/09/1994
Field of study

We have developed a gravity solver based on combining the well developed Particle-Mesh (PM) method and TREE methods. It is designed for and has been implemented on parallel computer architectures. The new code can deal with tens of millions of particles on current computers, with the calculation done on a parallel supercomputer or a group of workstations. Typically, the spatial resolution is enhanced by more than a factor of 20 over the pure PM code with mass resolution retained at nearly the PM level. This code runs much faster than a pure TREE code with the same number of particles and maintains almost the same resolution in high density regions. Multiple time step integration has also been implemented with the code, with second order time accuracy. The performance of the code has been checked in several kinds of parallel computer configuration, including IBM SP1, SGI Challenge and a group of workstations, with the speedup of the parallel code on a 32 processor IBM SP2 supercomputer nearly linear (efficiency

\approx 80\%

) in the number of processors. The computation/communication ratio is also very high (

\sim 50

), which means the code spends

95\%

of its CPU time in computation.Comment: 21 Pages Latex file Figures available from anonymous ftp to astro.princeton.edu under /xu/tpm.ps, POP-57

arXiv.org e-Print Archive

Crossref

CERN Document Server

Recommended from our members

Architecture and Applications of DADO: A Large-Scale Parallel Computer for Artificial Intelligence

Author: Miranker Daniel P.
Shaw David Elliot
Stolfo Salvatore
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1983
Field of study

As part of our research on very high performance parallel architectures, we have been investigating; machine architectures specially adapted to the highly efficient implementation of artificial intelligence (AI) software. In the course of our research we designed DADO, a highly parallel, VLSI-based, tree-structured machine, and implemented a high-speed algorithm for production systems on a simulator for DADO. Subsequent research has convinced us that DADO can support many other AI applications, including the very rapid execution of PROLOG programs, and a large share of the symbolic processing typical of contemporary knowledge-based systems. In this brief report, we outline the hardware design of a moderate size DADO prototype, comprising 1023 processing elements, which is currently under construction at Columbia University. We then sketch the software base being implemented on a small 15 processing element prototype system including several applications written in PPL/M, a high-level language designed for specifying parallel computations on DADO

Columbia University Academic Commons

High Performance Implementation of Planted Motif Problem using Suffix trees

Author
Publication venue
Publication date
Field of study

In this paper we present a high performance implementation of suffix tree based solution to the planted motif problem on two different parallel architectures: NVIDIA GPU and Intel Multicore machines. An (l,d) planted motif problem(PMP) is defined as: Given a sequence of n DNA sequences, each of length L, find M, the set of sequences(or motifs) of length l which have atleast one d-neighbor in each of the n sequences. Here, a d-neighbor of a sequence is a sequence of same length that differs in at-most d positions. PMP is a well studied problem in computational biology. It is useful in developing methods for finding transcription factor binding sites, sequence classification and for building phylogenetic trees. The problem is computationally challenging to solve, for example a (19,7) PMP takes 9.9 hours on a sequential machine. Many approaches to solve planted motif problem can be found in literature. One approach is based on use of suffix tree data structure. Though suffix tree based methods are the most efficient ones for solving large planted motif problems on sequential machines, they are quite difficult to parallelize. We present suffix tree based parallel solutions for PMP on NVIDIA GPU and Intel Multicore architectures that are efficient and scalable. The solutions are based on a suffix tree algorithm previously presented but use extensive adaptation to individual architectures to ensure that the implementations work efficiently and scale well

CiteSeerX

MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

Author: Hu Linjia
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2017
Field of study

With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

Michigan Technological University

Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

Author: Descombes Stéphane
Duarte Max
Dumont Thierry
Guillet Thomas
Louvet Violaine
Massot Marc
Publication venue: 'Cellule MathDoc/CEDRAM'
Publication date: 14/10/2016
Field of study

A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL-UJM

The SMAI journal of computational mathematics

Numérisation de Documents Anciens Mathématiques

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1

An Implementation of List Successive Cancellation Decoder with Large List Size for Polar Codes

Author: Chen Ji
Fan YouZhe
Jin Jie
Li Bin
Tsui Chi-ying
Xia ChenYang
Zeng ChongYang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/05/2018
Field of study

Polar codes are the first class of forward error correction (FEC) codes with a provably capacity-achieving capability. Using list successive cancellation decoding (LSCD) with a large list size, the error correction performance of polar codes exceeds other well-known FEC codes. However, the hardware complexity of LSCD rapidly increases with the list size, which incurs high usage of the resources on the field programmable gate array (FPGA) and significantly impedes the practical deployment of polar codes. To alleviate the high complexity, in this paper, two low-complexity decoding schemes and the corresponding architectures for LSCD targeting FPGA implementation are proposed. The architecture is implemented in an Altera Stratix V FPGA. Measurement results show that, even with a list size of 32, the architecture is able to decode a codeword of 4096-bit polar code within 150 us, achieving a throughput of 27MbpsComment: 4 pages, 4 figures, 4 tables, Published in 27th International Conference on Field Programmable Logic and Applications (FPL), 201

arXiv.org e-Print Archive

Crossref