MPI-dot2dot: A Parallel Tool to Find DNA Tandem Repeats on Multicore Clusters

Abstract

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Tandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related with certain human diseases. Therefore, identifying and classifying TRs have become a highly important task in bioinformatics in order to analyze their disorders and relationships with illnesses. Dot2dot, a tool recently developed to find TRs, provides more accurate results than the previous state-of-the-art, but it requires a long execution time even when using multiple threads. This work presents MPI-dot2dot, a novel version of this tool that combines MPI and OpenMP so that it can be executed in a cluster of multicore nodes and thus reduces its execution time. The performance of this new parallel implementation has been tested using different real datasets. Depending on the characteristics of the input genomes, it is able to obtain the same biological results as Dot2dot but more than 100 times faster on a 16-node multicore cluster (384 cores). MPI-dot2dot is publicly available to download from https://sourceforge.net/projects/mpi-dot2dot.This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), and by Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019-2022 and Consolidation Program of Competitive Reference Groups, under Grants ED431G 2019/01 and ED431C 2021/30, respectively). The authors would like to thank the Galician Supercomputing Center (CESGA) for providing access to the Finis Terrae II supercomputer. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    Similar works