2,880 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

    Full text link
    We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in abstract, modular, re-usable and combinable forms, which lets us generate versatile solvers from little set of Paraiso source codes. We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A single source code less than 500 lines can be used to generate solvers of arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both manual annotation based tuning and evolutionary computing based automated tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational Science and Discover

    Accelerated Profile HMM Searches

    Get PDF
    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

    A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

    Get PDF
    GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

    Good Features to Correlate for Visual Tracking

    Full text link
    During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.Comment: Accepted version of IEEE Transactions on Image Processin

    Manycore high-performance computing in bioinformatics

    Get PDF
    Mining the increasing amount of genomic data requires having very efficient tools. Increasing the efficiency can be obtained with better algorithms, but one could also take advantage of the hardware itself to reduce the application runtimes. Since a few years, issues with heat dissipation prevent the processors from having higher frequencies. One of the answers to maintain Moore's Law is parallel processing. Grid environments provide tools for effective implementation of coarse grain parallelization. Recently, another kind of hardware has attracted interest: multicore processors. Graphic processing units (GPUs) are a first step towards massively multicore processors. They allow everyone to have some teraflops of cheap computing power in its personal computer. The CUDA library (released in 2007) and the new standard OpenCL (specified in 2008) make programming of such devices very convenient. OpenCL is likely to gain a wide industrial support and to become a standard of choice for parallel programming. In all cases, the best speedups are obtained when combining precise algorithmic studies with a knowledge of the computing architectures. This is especially true with the memory hierarchy: the algorithms have to find a good balance between using large (and slow) global memories and some fast (but small) local memories. In this chapter, we will show how those manycore devices enable more efficient bioinformatics applications. We will first give some insights into architectures and parallelism. Then we will describe recent implementations specifically designed for manycore architectures, including algorithms on sequence alignment and RNA structure prediction. We will conclude with some thoughts about the dissemination of those algorithms and implementations: are they today available on the bookshelf for everyone

    Methodology for the Accelerated Reliability Analysis and Prognosis of Underground Cables based on FPGA

    Get PDF
    Dependable electrical power distribution systems demand high reliability levels that cause increased maintenance costs to the utilities. Often, the extra costs are the result of unnecessary maintenance procedures, which can be avoided by monitoring the equipment and predicting the future system evolution by means of statistical methods (prognostics). The present thesis aims at designing accurate methods for predicting the degradation of high and medium voltage underground Cross-Linked Polyethylene (XLPE) cables within an electrical power distribution grid, and predicting their remaining useful life, in order inform maintenance procedures. However, electric power distribution grids are large, components interact with each other, and they degrade with time and use. Solving the statistics of the predictive models of the power grids currently requires long numerical simulations that demand large computational resources and long simulation times even when using advanced parallel architectures. Often, approximate models are used in order to reduce the simulation time and the required resources. In this context, Field Programmable Gate Arrays (FPGAs) can be employed to accelerate the simulation of these stochastic processes. However, the adaptation of the physicsbased degradation models of underground cables for FPGA simulation can be complex. Accordingly, this thesis proposes an FPGA-based framework for the on-line monitoring and prognosis of underground cables based on an electro-thermal degradation model that is adapted for its accelerated simulation in the programmable logic of an FPGA.Energia elektrikoaren banaketa-sare konfidagarriek fidagarritasun maila altuak eskatzen dituzte, eta honek beraien mantenketa kostuen igoera dakar. Kostu hauen arrazoia beraien bizitzan goizegi egiten diren mantenketa prozesuei dagokie askotan, eta hauek eragoztea posible da, ekipamenduaren monitorizazioa eginez eta sistemaren etorkizuneko eboluzioa aurrez estimatuz (prognosia). Tesi honen helburua lurpeko tentsio altu eta ertaineko Cross-Linked Polyethylene (XLPE) kable sistemen eboluzioa eta geratzen zaien bizitza aurreikusiko duten metodo egokiak definitzea izango da, banaketa-sare elektriko baten barruan, ondoren mantenketa prozesu optimo bat ahalbidetuko duena. Hala ere, sistema hauek oso jokaera dinamikoa daukate. Konponente ezberdinek beraien artean elkar eragiten dute eta degradatu egiten dira denboran eta erabileraren ondorioz. Estatistika hauen soluzio analitikoa lortzea ezinezkoa da gaur egun, eta errekurtso asko eskatzen dituen simulazio luzeak behar ditu zenbakizko erantzun bat lortzeko, arkitektura paralelo aurreratuak erabili arren. Field Programmable Gate Array (FPGA)k prozesu estokastiko hauen simulazioa azkartzeko erabil daitezke, baina lurpeko kableen degradazio prozesuen modelo fisikoak FPGA exekuziorako egokitzea konplexua izan daiteke. Beraz, tesi honek FPGA baten logika programagarrian azeleratu ahal izateko egokitua izan den degradazio elektrotermiko modelo baten oinarritutako monitorizazio eta prognosi metodologia bat proposatzen du.Las redes de distribución de energía eléctrica confiables requieren de altos niveles de fiabilidad, que causan un mayor coste de mantenimiento a las empresas distribuidoras. Frecuentemente los costes adicionales son el resultado de procedimientos de mantenimiento innecesarios, que se pueden evitar por medio de la monitorización de los equipos y la predicción de la evolución futura del sistema, por medio de métodos estadísticos (prognosis). La presente tesis pretende desarrollar métodos adecuados para la predicción de la degradación futura de cables de alta y media tensión Cross-Linked Polyethylene (XLPE) soterrados, dentro de una red de distribución eléctrica, y predecir su tiempo de vida restante, para definir una secuencia de mantenimiento óptima. Sin embargo, las redes de distribución eléctrica son grandes, y compuestas por componentes que interactúan entre sí y se degradan con el tiempo y el uso. En la actualidad, resolver estas estadísticas predictivas requieren grandes simulaciones numéricas que requieren de grandes recursos computacionales y largos tiempos de simulación, incluso utilizando arquitecturas paralelas avanzadas. Las Field Programmable Gate Array (FPGA) pueden ser utilizadas para acelerar las simulaciones de estos procesos estocásticos, pero la adaptación de los modelos físicos de degradación de cables soterrados para su simulación en una FPGA puede ser complejo. Así, esta tesis propone el desarrollo de una metodología de monitorización y prognosis cables soterrados, basado en un modelo de degradación electro-térmico que está adaptado para su simulación acelerada en la lógica programable de una FPGA
    corecore