1,445 research outputs found
Parallelization of ARACNe, an Algorithm for the Reconstruction of Gene Regulatory Networks
[Abstract] Gene regulatory networks are graphical representations of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression. There are different computational approaches for the reverse engineering of these networks. Most of them require all gene-gene evaluations using different mathematical methods such as Pearson/Spearman correlation, Mutual Information or topology patterns, among others. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) is one of the most effective and widely used tools to reconstruct gene regulatory networks. However, the high computational cost of ARACNe prevents its use over large biologic datasets. In this work, we present a hybrid MPI/OpenMP parallel implementation of ARACNe to accelerate its execution on multi-core clusters, obtaining a speedup of 430.46 using as input a dataset with 41,100 genes and 108 samples and 32 nodes (each of them with 24 cores).Ministerio de Economía y Competitividad; TIN2016-75845-PXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/0
GesEPOC 2021 and GOLD 2021: perfect is the enemy of good
Carta científic
Fiuncho: a program for any-order epistasis detection in CPU clusters
Financiado para publicación en acceso aberto: CRUE/CISUG[Abstract]: Epistasis can be defined as the statistical interaction of genes during the expression of a phenotype. It is believed that it plays a fundamental role in gene expression, as individual genetic variants have reported a very small increase in disease risk in previous Genome-Wide Association Studies. The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used. This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach. It supports epistasis interactions of any order, and when compared with other exhaustive methods, it is on average 358, 7 and 3 times faster than MDR, MPI3SNP and BitEpi, respectively.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de Investigación de Galicia accreditation 2019–2022, Grant no. ED431G 2019/01), Consolidation Program of Competitive Research (Grant no. ED431C 2021/30), and the FPU Program of the Ministry of Education of Spain (Grant no. FPU16/01333).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
Applying dynamic balancing to improve the performance of MPI parallel genomics applications
© ACM 2024. This is the author's version of the work. It is posted here for
your personal use. Not for redistribution. The definitive Version of Record
was published in Proceedings of the 39th ACM/SIGAPP Symposium on
Applied Computing (SAC '24).[Absctract]: Genomics applications are becoming more and more important in the field of bioinformatics, as they allow researchers to extract meaningful information from the huge amount of data generated by the new sequencing technologies. The analysis of these data is a very time consuming task and, therefore, the use of High Performance Computing (HPC) and parallel processing techniques is essential. Although the structure of these applications can be easily adapted to parallel systems by distributing the data to be processed among the available processors, load imbalance is a usual cause of performance degradation. In this paper we propose a dynamic load balancing method based on MPI RMA one-sided communications to minimize the synchronization among processes and the overhead due to communications while improving the workload balance. The strategy is applied, as a case study, to ParRADMeth, an MPI/OpenMP parallel application for the identification of Differential Methylated Regions (DMRs). Results show that the new version of the tool outperforms the previous one in all cases, achieving high performance and scalability. For example, our approach is up to 243 times faster than the sequential version and 1.74 times faster than the previous parallel version when processing a real dataset on a cluster with 8 nodes, each one with 32 CPU cores.This work has been supported by grants PID2019-104184RB-I00
and PID2022-136435NB-I00, both grants funded by MCIN/AEI/
10.13039/501100011033, PID2022 also funded by "ERDF A way of
making Europe", EU; the Ministry of Universities of Spain under
grant FPU21/03408; and by Xunta de Galicia and FEDER funds
(Centro de Investigación de Galicia accreditation 2019-2022 and
Consolidation Program of Competitive Reference Groups, under
Grants ED431G 2019/01 and ED431C 2021/30, respectively)Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
PARamrfinder: detecting allele-specific DNA methylation on multicore clusters
Financiado para publicación en acceso aberto: CRUE-CSIC[Abstract]: The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, the high computational cost of the bioinformatic tools developed for this purpose prevents their application to large-scale datasets. Hence, much faster tools are required to further progress in this research field. In this work we present PARamrfinder, a parallel tool that applies a statistical model to identify ASM in data from high-throughput short-read bisulfite sequencing. It is based on the state-of-the-art sequential tool amrfinder, which is able to detect ASM at regional level from Bisulfite Sequencing (BS-Seq) experiments in the absence of Single Nucleotide Polymorphism information. PARamrfinder provides the same Allelically Methylated Regions as amrfinder but at significantly reduced runtime thanks to exploiting the compute capabilities of common multicore CPU clusters and MPI RMA operations to attain an efficient dynamic workload balance. As an example, our tool is up to 567 times faster for real data experiments on a cluster with 8 nodes, each one containing two 16-core processors. The source code of PARamrfinder, as well as a reference manual, is available at https://github.com/UDC-GAC/PARamrfinder.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 and PID2022-136435NB-I00 / AEI / 10.13039/501100011033), PID2022 also funded by “ERDF A way of making Europe”. It was also supported by the Ministry of Universities of Spain under grant FPU21/03408, and by Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019–2022 and Consolidation Program of Competitive Reference Groups, under Grants ED431G 2019/01 and ED431C 2021/30, respectively).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis
[Abstract] Background and objectives
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers.
Methods
The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability.
Results
The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method.
Conclusions
The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome.Ministerio de Economía y Competitividad; TIN2013-42148-PPortugal. Fundação para a Ciência e a Tecnologia; POCI-01-0145-FEDER-006961Portugal. Fundação para a Ciência e a Tecnologia; UID/EEA/50014/2013Portugal. Fundação para a Ciência e a Tecnologia; SFRH/BPD/111177/2015
A SIMD Algorithm for the Detection of Epistatic Interactions of Any Order
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Epistasis is a phenomenon in which a phenotype outcome is determined by the interaction of genetic variation at two or more loci and it cannot be attributed to the additive combination of effects corresponding to the individual loci. Although it has been more than 100 years since William Bateson introduced this concept, it still is a topic under active research. Locating epistatic interactions is a computationally expensive challenge that involves analyzing an exponentially growing number of combinations. Authors in this field have resorted to a multitude of hardware architectures in order to speed up the search, but little to no attention has been paid to the vector instructions that current CPUs include in their instruction sets. This work extends an existing third-order exhaustive algorithm to support the search of epistasis interactions of any order and discusses multiple SIMD implementations of the different functions that compose the search using Intel AVX Intrinsics. Results using the GCC and the Intel compiler show that the 512-bit explicit vector implementation proposed here performs the best out of all of the other implementations evaluated. The proposed 512-bit vectorization accelerates the original implementation of the algorithm by an average factor of 7 and 12, for GCC and the Intel Compiler, respectively, in the scenarios tested.This work is supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, grant no. ED431G2019/01), Consolidation Program of Competitive Research (grant no. ED431C 2021/30), the FPU Program of the Ministry of Education of Spain (grant no. FPU16/01333), and the Universidade da Coruña/CISUG for funding the open access chargeXunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C2021/3
The Influence of Functional Movement and Strength upon Linear and Change of Direction Speed in Male and Female Basketball Players
The present study aimed to analyse the relationship between functional movement and strength variables upon
linear speed (Ls) and change of direction (COD) based on gender. It also aimed to identify the determinants of performance
of Ls and COD according to gender. Fifty basketball players (54% female) completed the assessment in which the weightbearing
dorsiflexion test, the Y-balance test, the unilateral countermovement jump, the unilateral drop jump, the
unilateral triple hop test, Ls and CODs were performed. Speed variables were divided according to time execution into
"low-performance" and "high-performance" to establish a comparison between performance groups. Strength variables
significantly influenced speed tests’ performance in both genders (p < 0.05). For males, the greater the Ls, the higher the
change of direction deficit (p < 0.001). Multiple regression analysis revealed that a long and vertical stretch-shortening
cycle (SSC) was the most influential physical ability for speed performance in females (45–65% variance explained; p <
0.001), while in males, a short and horizontal SSC played a significant role (30–61% variance explained; p < 0.022).
These results suggest that gender should be considered in programming strength training to improve speed, as each
gender will benefit most from the application of different force-orientations and different SSC. Also, the faster the male
players were in Ls, the less efficient they were in the COD performance. This is why for men, it would be recommended
to perform eccentric exercises along with deceleration and technique drills to improve COD speed.This paper is part of the first author's doctoral thesis carried out in the Doctoral
Programme of the University of Huelva (Spain), thanks to the support and funding of the Formación del
Profesorado Universitario Programme (FPU22/01057), run by the Ministerio de Ciencias, Innovación y
Universidades, Government of Spain. This study was also supported by the Grupo de Educación,
Motricidad e Investigaciónonubense (HUM643) and by the Centro de Investigación en Pensamiento
Contemporáneo e Innovación parael Desarrollo Social (COIDESO) of the University of Huelva (Spain)
PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models
[Abstract] Background
Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table.
Results
This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use.
Conclusions
PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases.This study and publication costs were funded by the Ministry of Science and Innovation of Spain (grant PID2019-104184RB-I00/AEI/10.13039/501100011033) and by Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de Investigación de Galicia accreditation, grant ED431G 2019/01; Consolidation Program of Competitive Reference Groups, grant ED431C 2021/30). CP was funded by the Ministry of Education of Spain (grant FPU16/01333). The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscriptXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
- …