Search CORE

792 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Prediction of secondary structures for large RNA molecules

Author: Mathuriya Amrita
Publication venue: Georgia Institute of Technology
Publication date: 12/01/2009
Field of study

The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

Scholarly Materials And Research @ Georgia Tech

Recommended from our members

Computational Strategies for Scalable Genomics Analysis.

Author: Shi Lizhen
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications

eScholarship - University of California

Smith-Waterman Protein Search with OpenCL on an FPGA

Author: Botella Juan Guillermo
De Giusti Armando Eduardo
García Sanchez Carlos
Naiouf Marcelo
Prieto-Matias Manuel
Rucci Enzo
Publication venue
Publication date: 07/10/2019
Field of study

The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. Unfortunately, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many scenarios. Previous research has shown that massively parallel architectures such as GPUs and FPGAs are able to mitigate the computational problems and achieve impressive speedups. In this paper we explore SW acceleration on an FPGA with OpenCL. We efficiently exploit data and thread-level parallelism on an Altera Stratix V FPGA, obtaining up to 39 GCUPS with less than 25 watt of power consumption.Facultad de Informátic

Servicio de Difusión de la Creación Intelectual

Smith-Waterman Protein Search with OpenCL on an FPGA

Author: Botella Juan Guillermo
De Giusti Armando Eduardo
García Sanchez Carlos
Naiouf Marcelo
Prieto-Matias Manuel
Rucci Enzo
Publication venue
Publication date: 22/08/2015
Field of study

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Author: Lan H
Liu W
Lu M
Tan G
Vasilakos AV
Yin Z
Publication venue: 'Elsevier BV'
Publication date: 23/08/2022
Field of study

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics

OPUS - University of Technology Sydney