3,373 research outputs found
Organic matter content influence on soil phy-sical properties
[Abstract] Soil physical characteristics of agricultural soils with a range of texture and organic matter content, i. e., dry and wet pore space organisation, were investigated. In order to study the specific effect of organic matter for each soil, frequently both grassland and cultivated adjacent land were sampled. Because of the complexity of the soil particle structure, measurements were performed at the textural level on 2-3 mm aggregates. The compactness of grassland horizons was found to be lower than that of its cultivated counterparts. Mercury intrusion porosimetry showed that lacunar pores prevailed, whose volume increased as organic carbon content increased. The volume of clay-fabric pores was very small and did not appear to depend on the variation in organic matter content. Water content near saturation increased with increasing organic matter content and for potentials of about 1,500 kPa water retention curves tended to converge. Pore size distribution patterns as measured mercury intrusion porosimetry and derivedfrom water retention characteristics were compared. The low shrinkage potential of moderately coarse and medium textured soils was also verified. A lack of potential for regeneration of good soil structure by fragmentation was deduced from the shrinkage curves
PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models
[Abstract] Background
Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table.
Results
This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use.
Conclusions
PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases.This study and publication costs were funded by the Ministry of Science and Innovation of Spain (grant PID2019-104184RB-I00/AEI/10.13039/501100011033) and by Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de InvestigaciĂłn de Galicia accreditation, grant ED431G 2019/01; Consolidation Program of Competitive Reference Groups, grant ED431C 2021/30). CP was funded by the Ministry of Education of Spain (grant FPU16/01333). The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscriptXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
Fiuncho: a program for any-order epistasis detection in CPU clusters
Financiado para publicación en acceso aberto: CRUE/CISUG[Abstract]: Epistasis can be defined as the statistical interaction of genes during the expression of a phenotype. It is believed that it plays a fundamental role in gene expression, as individual genetic variants have reported a very small increase in disease risk in previous Genome-Wide Association Studies. The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used. This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach. It supports epistasis interactions of any order, and when compared with other exhaustive methods, it is on average 358, 7 and 3 times faster than MDR, MPI3SNP and BitEpi, respectively.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de Investigación de Galicia accreditation 2019–2022, Grant no. ED431G 2019/01), Consolidation Program of Competitive Research (Grant no. ED431C 2021/30), and the FPU Program of the Ministry of Education of Spain (Grant no. FPU16/01333).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
Applying dynamic balancing to improve the performance of MPI parallel genomics applications
© ACM 2024. This is the author's version of the work. It is posted here for
your personal use. Not for redistribution. The definitive Version of Record
was published in Proceedings of the 39th ACM/SIGAPP Symposium on
Applied Computing (SAC '24).[Absctract]: Genomics applications are becoming more and more important in the field of bioinformatics, as they allow researchers to extract meaningful information from the huge amount of data generated by the new sequencing technologies. The analysis of these data is a very time consuming task and, therefore, the use of High Performance Computing (HPC) and parallel processing techniques is essential. Although the structure of these applications can be easily adapted to parallel systems by distributing the data to be processed among the available processors, load imbalance is a usual cause of performance degradation. In this paper we propose a dynamic load balancing method based on MPI RMA one-sided communications to minimize the synchronization among processes and the overhead due to communications while improving the workload balance. The strategy is applied, as a case study, to ParRADMeth, an MPI/OpenMP parallel application for the identification of Differential Methylated Regions (DMRs). Results show that the new version of the tool outperforms the previous one in all cases, achieving high performance and scalability. For example, our approach is up to 243 times faster than the sequential version and 1.74 times faster than the previous parallel version when processing a real dataset on a cluster with 8 nodes, each one with 32 CPU cores.This work has been supported by grants PID2019-104184RB-I00
and PID2022-136435NB-I00, both grants funded by MCIN/AEI/
10.13039/501100011033, PID2022 also funded by "ERDF A way of
making Europe", EU; the Ministry of Universities of Spain under
grant FPU21/03408; and by Xunta de Galicia and FEDER funds
(Centro de InvestigaciĂłn de Galicia accreditation 2019-2022 and
Consolidation Program of Competitive Reference Groups, under
Grants ED431G 2019/01 and ED431C 2021/30, respectively)Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
PARamrfinder: detecting allele-specific DNA methylation on multicore clusters
Financiado para publicación en acceso aberto: CRUE-CSIC[Abstract]: The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, the high computational cost of the bioinformatic tools developed for this purpose prevents their application to large-scale datasets. Hence, much faster tools are required to further progress in this research field. In this work we present PARamrfinder, a parallel tool that applies a statistical model to identify ASM in data from high-throughput short-read bisulfite sequencing. It is based on the state-of-the-art sequential tool amrfinder, which is able to detect ASM at regional level from Bisulfite Sequencing (BS-Seq) experiments in the absence of Single Nucleotide Polymorphism information. PARamrfinder provides the same Allelically Methylated Regions as amrfinder but at significantly reduced runtime thanks to exploiting the compute capabilities of common multicore CPU clusters and MPI RMA operations to attain an efficient dynamic workload balance. As an example, our tool is up to 567 times faster for real data experiments on a cluster with 8 nodes, each one containing two 16-core processors. The source code of PARamrfinder, as well as a reference manual, is available at https://github.com/UDC-GAC/PARamrfinder.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 and PID2022-136435NB-I00 / AEI / 10.13039/501100011033), PID2022 also funded by “ERDF A way of making Europe”. It was also supported by the Ministry of Universities of Spain under grant FPU21/03408, and by Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019–2022 and Consolidation Program of Competitive Reference Groups, under Grants ED431G 2019/01 and ED431C 2021/30, respectively).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3
A SIMD Algorithm for the Detection of Epistatic Interactions of Any Order
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Epistasis is a phenomenon in which a phenotype outcome is determined by the interaction of genetic variation at two or more loci and it cannot be attributed to the additive combination of effects corresponding to the individual loci. Although it has been more than 100 years since William Bateson introduced this concept, it still is a topic under active research. Locating epistatic interactions is a computationally expensive challenge that involves analyzing an exponentially growing number of combinations. Authors in this field have resorted to a multitude of hardware architectures in order to speed up the search, but little to no attention has been paid to the vector instructions that current CPUs include in their instruction sets. This work extends an existing third-order exhaustive algorithm to support the search of epistasis interactions of any order and discusses multiple SIMD implementations of the different functions that compose the search using Intel AVX Intrinsics. Results using the GCC and the Intel compiler show that the 512-bit explicit vector implementation proposed here performs the best out of all of the other implementations evaluated. The proposed 512-bit vectorization accelerates the original implementation of the algorithm by an average factor of 7 and 12, for GCC and the Intel Compiler, respectively, in the scenarios tested.This work is supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, grant no. ED431G2019/01), Consolidation Program of Competitive Research (grant no. ED431C 2021/30), the FPU Program of the Ministry of Education of Spain (grant no. FPU16/01333), and the Universidade da Coruña/CISUG for funding the open access chargeXunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C2021/3
- …