10 research outputs found

    Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks.</p> <p>Results</p> <p>We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from <url>http://cbe.ipk-gatersleben.de</url>.</p> <p>Conclusions</p> <p>The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.</p

    Exploiting different levels of parallelism in the biological sequence comparison problem

    Get PDF
    In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multi- processor machine like the current IBM Cell BE.Peer ReviewedPostprint (published version

    Accelerated Profile HMM Searches

    Get PDF
    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

    Modern Computational Techniques for the HMMER Sequence Analysis

    Get PDF

    Applications on emerging paradigms in parallel computing

    Get PDF
    The area of computing is seeing parallelism increasingly being incorporated at various levels: from the lowest levels of vector processing units following Single Instruction Multiple Data (SIMD) processing, Simultaneous Multi-threading (SMT) architectures, and multi/many-cores with thread-level shared memory and SIMT parallelism, to the higher levels of distributed memory parallelism as in supercomputers and clusters, and scaling them to large distributed systems as server farms and clouds. All together these form a large hierarchy of parallelism. Developing high-performance parallel algorithms and efficient software tools, which make use of the available parallelism, is inevitable in order to harness the raw computational power these emerging systems have to offer. In the work presented in this thesis, we develop architecture-aware parallel techniques on such emerging paradigms in parallel computing, specifically, parallelism offered by the emerging multi- and many-core architectures, as well as the emerging area of cloud computing, to target large scientific applications. First, we develop efficient parallel algorithms to compute optimal pairwise alignments of genomic sequences on heterogeneous multi-core processors, and demonstrate them on the IBM Cell Broadband Engine. Then, we develop parallel techniques for scheduling all-pairs computations on heterogeneous systems, including clusters of Cell processors, and NVIDIA graphics processors. We compare the performance of our strategies on Cell, GPU and Intel Nehalem multi-core processors. Further, we apply our algorithms to specific applications taken from the areas of systems biology, fluid dynamics and materials science: pairwise Mutual Information computations for reconstruction of gene regulatory networks; pairwise Lp-norm distance computations for coherent structures discovery in the design of flapping-wing Micro Air Vehicles, and construction of stochastic models for a set of properties of heterogeneous materials. Lastly, in the area of cloud computing, we propose and develop an abstract framework to enable computations in parallel on large tree structures, to facilitate easy development of a class of scientific applications based on trees. Our framework, in the style of Google\u27s MapReduce paradigm, is based on two generic user-defined functions through which a user writes an application. We implement our framework as a generic programming library for a large cluster of homogeneous multi-core processor, and demonstrate its applicability through two applications: all-k-nearest neighbors computations, and Fast Multipole Method (FMM) based simulations

    Desarrollo de un workflow genérico para el modelado de problemas de barrido paramétrico en sistemas distribuidos

    Get PDF
    This work presents the development and experimental validation of a generic workflow model applicable to any parameter sweep problem: the Parameter Sweep Scientific Workflow (PSWF) model. As part of it, a model for the monitoring and management of scientific workflows on distributed systems is developed. This model, Star Superscalar Status (SsTAT), is applicable to the StarSs programming model family. PSWF and SsTAT can be used by the scientific community as a reference for solving problems using the parameter sweep strategy. As an integral part of the work, the treatment of the parameter sweep problem is formalized. This is achieved by developing a general solution based on the PSNSS (Parameter Sweep Nested Summation Symbol) algorithm, using both the original sequential and a concurrent approach. Both versions are implemented and validated, showing its applicability to all automatable PSWF lifecycle phases. Load testing shows that large-scale parameter sweep problems can efficiently be addressed with the proposed approach. In addition, the SsTAT monitoring and management generic model is instantiated for a Grid environment. Thus, an operational implementation of SsTAT based on GRIDSs, GSTAT (GRID Superscalar Status), is developed. A series of tests performed on an actual heterogeneous Grid of computers shows that GSTAT can appropriately develop their functionality even in an environment so demanding as that. As a practical case, the model proposed here is applied to determining the molecular potential energy hypersurfaces. For this purpose, a specific instance of the workflow, called PSHYP (Parameter Sweep Hypersurfaces), is created.En este trabajo se presenta el desarrollo y validación experimental de un modelo de workflow genérico, aplicable a cualquier problema de barrido de parámetros, denominado Parameter Sweep Scientific Workflow (PSWF). Asimismo, se diseña y prueba un modelo de monitorización y gestión de workflows científicos, en sistemas distribuidos, designado como SsTAT (Star Superscalar Status) que es aplicable a la familia de modelos de programación Star Superscalar (StarSs). Los modelos PSWF y SsTAT pueden ser utilizados por la comunidad científica como referencia a la hora de resolver problemas mediante la estrategia de barrido de parámetros. Como parte integral del trabajo se formaliza el tratamiento del problema del barrido de parámetros, desarrollándose una solución general concretada en el algoritmo PSNSS (Parameter Sweep Nested Summation Symbol) en su versión secuencial y concurrente. Ambas versiones se implementan y validan, mostrándose su aplicabilidad a todas las fases automatizables del ciclo de vida PSWF. Mediante la realización de varias pruebas de carga se comprueba que el tratamiento de problemas de barrido de parámetros de gran envergadura puede abordarse eficientemente con la aproximación propuesta. A su vez, el modelo genérico de monitorización y gestión SsTAT se particulariza para un entorno Grid, generándose una implementación operativa del mismo, basada en GRIDSs, denominada GSTAT (GRID Superscalar Status). La realización de una serie de pruebas sobre un Grid real de computadores heterogéneo muestra que GSTAT desarrolla apropiadamente sus funciones incluso en un entorno tan exigente como este. Como caso práctico, se aplica el modelo aquí propuesto a la obtención de la hipersuperficie de energía potencial molecular generando a tal efecto un workflow específico denominado PSHYP (Parameter Sweep Hypersurfaces

    Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications.” IBM

    No full text
    This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine recently developed at IBM. In particular we focus on two highly popular bioinformatics applications – FASTA and ClustalW. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell processing platform. The price and power advantages afforded by the Cell processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware. 1 Computational Biology and High

    Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications.” IBM

    No full text
    This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine (Cell/B.E.) recently developed at IBM. In particular we focus on three highly popular bioinformatics applications – FASTA, ClustalW, and HMMER. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell/B.E. processing platform. The price and power advantages afforded by the Cell/B.E. processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware. 1 Computational Biology and High-Performance Computing With the discovery of the structure of DNA and the development of new techniques for sequencing the entire genome of organisms, biology is rapidly moving towards a dataintensive, computational science. Biologists search for biomolecular sequence data to compare with other known genomes in order to determine functions and improve understanding of biochemical pathways. Computational biology has been aided by recen
    corecore