Search CORE

76 research outputs found

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

Visibility-Related Problems on Parallel Computational Models

Author: Gurla Himabindu
Publication venue: ODU Digital Commons
Publication date: 01/04/1996
Field of study

Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

Old Dominion University

A Work-Optimal Algorithm on log delta n Processors for a P-Complete Problem

Author: Gustedt Jens
Telle Jan Arne
Publication venue: HAL CCSD
Publication date: 01/01/2001
Field of study

We present a parallel algorithm for the Lexicographically First Maximal Independent Set Problem on graphs with bounded degree 3 that is work-optimal on a shared memory machine with up to

\log ^\delta n

processors, for any

0 < \delta < 1

. Since this problem is P-complete it follows (assuming \mathcalNC \not \mathcalP) that the algorithmics of coarse grained parallel machines and of fine grained parallel machines differ substantially

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs

Author: Pablo Moscato
Riveros Carlos
Ujaldon-Martinez Manuel
Publication venue
Publication date: 01/01/2014
Field of study

It is being increasingly accepted that traditional statistical Single Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of SNPs interactions identify additional SNPs that contribute to disease but that do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software, our method computes all boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform. Here, we contribute with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was also supported by the Australian Research Council Future Fellowship to Prof. Moscato, by a funded grant from the ARC Discovery Project Scheme and by the Ministry of Education of Spain under Project TIN2006-01078 and mobility grant PR2011-0144. We also thank NVIDIA for hardware donation under CUDA Teaching and Research Center awards

University of Newcastle's Digital Repository

Repositorio Institucional Universidad de Málaga

List Ranking on a Coarse Grained Multiprocessor

Author: Gustedt Jens
Guérin Lassous Isabelle
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

We present a deterministic algorithm for the List Ranking Problem on a Coarse Grained p-Multiprocessor (CGM) that is only a factor of log*(p) away from optimality. This statement holds as well for counting communication rounds where it achieves O(log(p) log*(p)) and for the required communication cost and total computation time where it achieves O(n log*(p)). We report on experimental studies of that algorithm on a variety of platforms that show the validity of the chosen CGM-model, and also show the possible gains and limits of such an algorithm. Finally, we suggest to extend CGM model by the communication blow up to allow better a priori predictions of communication costs of algorithms

INRIA a CCSD electronic archive server

List Ranking on PC Clusters

Author: Gustedt Jens
Guérin Lassous Isabelle
Publication venue: HAL CCSD
Publication date: 01/01/2000
Field of study

We present two algorithms for the List Ranking Problem in the Coarse Grained Multicomputer model (CGM for short): if

p

is the number of processors and

n

the size of the list, then we give a deterministic one that achieves

O(\log p \log^* p)

communication rounds and

O(n \log^* p)

for the required communication cost and total computation time; and a randomized one that requires

O(\log p)

communication rounds and

O(n)

for the required communication cost and total computation time. We report on experimental studies of these algorithms on a PC cluster interconnected by a Myrinet network. As far as we know, it is the first portable code on this problem that runs on a cluster. With these experimental studies, we study the validity of the chosen CGM-model, and also show the possible gains and limits of such algorithms for PC clusters

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

PARTI primitives for unstructured and block structured problems

Author: A. Sussman
André
Ashcraft
Baden
Baxter
Berger
Berryman
Brezany
Chase
Chen
Cheung
Choudhary
Chrisochoides
D. Mavriplis
Das
Foster
Fox
Fox
Gerndt
Hatcher
Havlak
Hiranandani
Hiranandani
J. Saltz
K. Crowley
Kennedy
Koelbel
Lemke
Li
Li
Li
Lonsdale
Lu
Lui
Mavriplis
Mirchandaney
Pothen
R. Das
R. Ponnusamy
Rogers
Rosing
S. Gupta
Saltz
Saltz
Saltz
Simon
Tseng
Vatsa
Williams
Williams
Zima
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Automatic visual recognition using parallel machines

Author: Chen Yui-Liang
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

Digital Commons @ New Jersey Institute of Technology (NJIT)