76 research outputs found

    Highly parallel computation

    Get PDF
    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

    Visibility-Related Problems on Parallel Computational Models

    Get PDF
    Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

    A Work-Optimal Algorithm on log delta n Processors for a P-Complete Problem

    Get PDF
    We present a parallel algorithm for the Lexicographically First Maximal Independent Set Problem on graphs with bounded degree 3 that is work-optimal on a shared memory machine with up to logδn\log ^\delta n processors, for any 0<δ<10 < \delta < 1. Since this problem is P-complete it follows (assuming \mathcalNC \not \mathcalP) that the algorithmics of coarse grained parallel machines and of fine grained parallel machines differ substantially

    Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs

    Get PDF
    It is being increasingly accepted that traditional statistical Single Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of SNPs interactions identify additional SNPs that contribute to disease but that do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software, our method computes all boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform. Here, we contribute with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was also supported by the Australian Research Council Future Fellowship to Prof. Moscato, by a funded grant from the ARC Discovery Project Scheme and by the Ministry of Education of Spain under Project TIN2006-01078 and mobility grant PR2011-0144. We also thank NVIDIA for hardware donation under CUDA Teaching and Research Center awards

    List Ranking on a Coarse Grained Multiprocessor

    Get PDF
    We present a deterministic algorithm for the List Ranking Problem on a Coarse Grained p-Multiprocessor (CGM) that is only a factor of log*(p) away from optimality. This statement holds as well for counting communication rounds where it achieves O(log(p) log*(p)) and for the required communication cost and total computation time where it achieves O(n log*(p)). We report on experimental studies of that algorithm on a variety of platforms that show the validity of the chosen CGM-model, and also show the possible gains and limits of such an algorithm. Finally, we suggest to extend CGM model by the communication blow up to allow better a priori predictions of communication costs of algorithms

    List Ranking on PC Clusters

    Get PDF
    We present two algorithms for the List Ranking Problem in the Coarse Grained Multicomputer model (CGM for short): if pp is the number of processors and nn the size of the list, then we give a deterministic one that achieves O(logplogp)O(\log p \log^* p) communication rounds and O(nlogp)O(n \log^* p) for the required communication cost and total computation time; and a randomized one that requires O(logp)O(\log p) communication rounds and O(n)O(n) for the required communication cost and total computation time. We report on experimental studies of these algorithms on a PC cluster interconnected by a Myrinet network. As far as we know, it is the first portable code on this problem that runs on a cluster. With these experimental studies, we study the validity of the chosen CGM-model, and also show the possible gains and limits of such algorithms for PC clusters

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture
    corecore