160 research outputs found

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Computational Biology and Chemistry

    Get PDF
    The use of computers and software tools in biochemistry (biology) has led to a deep revolution in basic sciences and medicine. Bioinformatics and systems biology are the direct results of this revolution. With the involvement of computers, software tools, and internet services in scientific disciplines comprising biology and chemistry, new terms, technologies, and methodologies appeared and established. Bioinformatic software tools, versatile databases, and easy internet access resulted in the occurrence of computational biology and chemistry. Today, we have new types of surveys and laboratories including “in silico studies” and “dry labs” in which bioinformaticians conduct their investigations to gain invaluable outcomes. These features have led to 3-dimensioned illustrations of different molecules and complexes to get a better understanding of nature

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    Decomposing Genomics Algorithms: Core Computations for Accelerating Genomics

    Get PDF
    Technological advances in genomic analyses and computing sciences has led to a burst in genomics data. With those advances, there has also been parallel growth in dedicated accelerators for specific genomic analyses. However, biologists are in need of a reconfigurable machine that can allow them to perform multiple analyses without needing to go for dedicated compute platforms for each analysis. This work addresses the first steps in the design of such a reconfigurable machine. We hypothesize that this machine design can consist of some accelerators of computations common across various genomic analyses. This work studies a subset of genomic analyses and identifies such core computations. We further investigate the possibility of further accelerating through a deeper analysis of the computation primitives.National Science Foundation (NSF CNS 13-37732); Infosys; IBM Faculty Award; Office of the Vice Chancellor for Research, University of Illinois at Urbana-ChampaignOpe

    Optimización de algoritmos bioinspirados en sistemas heterogéneos CPU-GPU.

    Get PDF
    Los retos científicos del siglo XXI precisan del tratamiento y análisis de una ingente cantidad de información en la conocida como la era del Big Data. Los futuros avances en distintos sectores de la sociedad como la medicina, la ingeniería o la producción eficiente de energía, por mencionar sólo unos ejemplos, están supeditados al crecimiento continuo en la potencia computacional de los computadores modernos. Sin embargo, la estela de este crecimiento computacional, guiado tradicionalmente por la conocida “Ley de Moore”, se ha visto comprometido en las últimas décadas debido, principalmente, a las limitaciones físicas del silicio. Los arquitectos de computadores han desarrollado numerosas contribuciones multicore, manycore, heterogeneidad, dark silicon, etc, para tratar de paliar esta ralentización computacional, dejando en segundo plano otros factores fundamentales en la resolución de problemas como la programabilidad, la fiabilidad, la precisión, etc. El desarrollo de software, sin embargo, ha seguido un camino totalmente opuesto, donde la facilidad de programación a través de modelos de abstracción, la depuración automática de código para evitar efectos no deseados y la puesta en producción son claves para una viabilidad económica y eficiencia del sector empresarial digital. Esta vía compromete, en muchas ocasiones, el rendimiento de las propias aplicaciones; consecuencia totalmente inadmisible en el contexto científico. En esta tesis doctoral tiene como hipótesis de partida reducir las distancias entre los campos hardware y software para contribuir a solucionar los retos científicos del siglo XXI. El desarrollo de hardware está marcado por la consolidación de los procesadores orientados al paralelismo masivo de datos, principalmente GPUs Graphic Processing Unit y procesadores vectoriales, que se combinan entre sí para construir procesadores o computadores heterogéneos HSA. En concreto, nos centramos en la utilización de GPUs para acelerar aplicaciones científicas. Las GPUs se han situado como una de las plataformas con mayor proyección para la implementación de algoritmos que simulan problemas científicos complejos. Desde su nacimiento, la trayectoria y la historia de las tarjetas gráficas ha estado marcada por el mundo de los videojuegos, alcanzando altísimas cotas de popularidad según se conseguía más realismo en este área. Un hito importante ocurrió en 2006, cuando NVIDIA (empresa líder en la fabricación de tarjetas gráficas) lograba hacerse con un hueco en el mundo de la computación de altas prestaciones y en el mundo de la investigación con el desarrollo de CUDA “Compute Unified Device Arquitecture. Esta arquitectura posibilita el uso de la GPU para el desarrollo de aplicaciones científicas de manera versátil. A pesar de la importancia de la GPU, es interesante la mejora que se puede producir mediante su utilización conjunta con la CPU, lo que nos lleva a introducir los sistemas heterogéneos tal y como detalla el título de este trabajo. Es en entornos heterogéneos CPU-GPU donde estos rendimientos alcanzan sus cotas máximas, ya que no sólo las GPUs soportan el cómputo científico de los investigadores, sino que es en un sistema heterogéneo combinando diferentes tipos de procesadores donde podemos alcanzar mayor rendimiento. En este entorno no se pretende competir entre procesadores, sino al contrario, cada arquitectura se especializa en aquella parte donde puede explotar mejor sus capacidades. Donde mayor rendimiento se alcanza es en estos clústeres heterogéneos, donde múltiples nodos son interconectados entre sí, pudiendo dichos nodos diferenciarse no sólo entre arquitecturas CPU-GPU, sino también en las capacidades computacionales dentro de estas arquitecturas. Con este tipo de escenarios en mente, se presentan nuevos retos en los que lograr que el software que hemos elegido como candidato se ejecuten de la manera más eficiente y obteniendo los mejores resultados posibles. Estas nuevas plataformas hacen necesario un rediseño del software para aprovechar al máximo los recursos computacionales disponibles. Se debe por tanto rediseñar y optimizar los algoritmos existentes para conseguir que las aportaciones en este campo sean relevantes, y encontrar algoritmos que, por su propia naturaleza sean candidatos para que su ejecución en dichas plataformas de alto rendimiento sea óptima. Encontramos en este punto una familia de algoritmos denominados bioinspirados, que utilizan la inteligencia colectiva como núcleo para la resolución de problemas. Precisamente esta inteligencia colectiva es la que les hace candidatos perfectos para su implementación en estas plataformas bajo el nuevo paradigma de computación paralela, puesto que las soluciones pueden ser construidas en base a individuos que mediante alguna forma de comunicación son capaces de construir conjuntamente una solución común. Esta tesis se centrará especialmente en uno de estos algoritmos bioinspirados que se engloba dentro del término metaheurísticas bajo el paradigma del Soft Computing, el Ant Colony Optimization “ACO”. Se realizará una contextualización, estudio y análisis del algoritmo. Se detectarán las partes más críticas y serán rediseñadas buscando su optimización y paralelización, manteniendo o mejorando la calidad de sus soluciones. Posteriormente se pasará a implementar y testear las posibles alternativas sobre diversas plataformas de alto rendimiento. Se utilizará el conocimiento adquirido en el estudio teórico-práctico anterior para su aplicación a casos reales, más en concreto se mostrará su aplicación sobre el plegado de proteínas. Todo este análisis es trasladado a su aplicación a un caso concreto. En este trabajo, aunamos las nuevas plataformas hardware de alto rendimiento junto al rediseño e implementación software de un algoritmo bioinspirado aplicado a un problema científico de gran complejidad como es el caso del plegado de proteínas. Es necesario cuando se implementa una solución a un problema real, realizar un estudio previo que permita la comprensión del problema en profundidad, ya que se encontrará nueva terminología y problemática para cualquier neófito en la materia, en este caso, se hablará de aminoácidos, moléculas o modelos de simulación que son desconocidos para los individuos que no sean de un perfil biomédico.Ingeniería, Industria y Construcció

    Deep learning for peptide identification from metaproteomics datasets

    Get PDF
    This article explores a proposed deep-learning-based algorithm called DeepFilter for improving peptide identifications from a collection of tandem mass spectra. The authors find that DeepFilter is believed to generalize properly to new, previously unseen peptide-spectrum-matches and can be readily applied in peptide identification from metaproteomics data

    Graphics Processing Unit Accelerated Coarse-Grained Protein-Protein Docking

    Get PDF
    Graphics processing unit (GPU) architectures are increasingly used for general purpose computing, providing the means to migrate algorithms from the SISD paradigm, synonymous with CPU architectures, to the SIMD paradigm. Generally programmable commodity multi-core hardware can result in significant speed-ups for migrated codes. Because of their computational complexity, molecular simulations in particular stand to benefit from GPU acceleration. Coarse-grained molecular models provide reduced complexity when compared to the traditional, computationally expensive, all-atom models. However, while coarse-grained models are much less computationally expensive than the all-atom approach, the pairwise energy calculations required at each iteration of the algorithm continue to cause a computational bottleneck for a serial implementation. In this work, we describe a GPU implementation of the Kim-Hummer coarse-grained model for protein docking simulations, using a Replica Exchange Monte-Carlo (REMC) method. Our highly parallel implementation vastly increases the size- and time scales accessible to molecular simulation. We describe in detail the complex process of migrating the algorithm to a GPU as well as the effect of various GPU approaches and optimisations on algorithm speed-up. Our benchmarking and profiling shows that the GPU implementation scales very favourably compared to a CPU implementation. Small reference simulations benefit from a modest speedup of between 4 to 10 times. However, large simulations, containing many thousands of residues, benefit from asynchronous GPU acceleration to a far greater degree and exhibit speed-ups of up to 1400 times. We demonstrate the utility of our system on some model problems. We investigate the effects of macromolecular crowding, using a repulsive crowder model, finding our results to agree with those predicted by scaled particle theory. We also perform initial studies into the simulation of viral capsids assembly, demonstrating the crude assembly of capsid pieces into a small fragment. This is the first implementation of REMC docking on a GPU, and the effectuate speed-ups alter the tractability of large scale simulations: simulations that otherwise require months or years can be performed in days or weeks using a GPU

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
    corecore