33 research outputs found

    High-Order Epistasis Detection in High Performance Computing Systems

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] Nos últimos anos, os estudos de asociación do xenoma completo (Genome-Wide Association Studies, GWAS) están a gañar moita popularidade de cara a buscar unha explicación xenética á presenza ou ausencia de certas enfermidades nos humanos.Hai un consenso nestes estudos sobre a existencia de interaccións xenéticas que condicionan a expresión de enfermidades complexas, un fenómeno coñecido como epistasia. Esta tese céntrase no estudo deste fenómeno empregando a computación de altas prestacións (High-Performance Computing, HPC) e dende a súa perspectiva estadística: a desviación da expresión dun fenotipo como a suma dos efectos individuais de múltiples variantes xenéticas. Con este obxectivo desenvolvemos unha primeira ferramenta, chamada MPI3SNP, que identifica interaccións de tres variantes a partir dun conxunto de datos de entrada. MPI3SNP implementa unha busca exhaustiva empregando un test de asociación baseado na Información Mutua, e explota os recursos de clústeres de CPUs ou GPUs para acelerar a busca. Coa axuda desta ferramenta avaliamos o estado da arte da detección de epistasia a través dun estudo que compara o rendemento de vintesete ferramentas. A conclusión máis importante desta comparativa é a incapacidade dos métodos non exhaustivos de atopar interacción ante a ausencia de efectos marxinais (pequenos efectos de asociación das variantes individuais que participan na epistasia). Por isto, esta tese continuou centrándose na optimización da busca exhaustiva de epistasia. Por unha parte, mellorouse a eficiencia do test de asociación a través dunha implantación vectorial do mesmo. Por outro lado, creouse un algoritmo distribuído que implementa unha busca exhaustiva capaz de atopar epistasia de calquera orden. Estes dous fitos lógranse en Fiuncho, unha ferramenta que integra toda a investigación realizada, obtendo un rendemento en clústeres de CPUs que supera a todas as súas alternativas no estado da arte. Adicionalmente, desenvolveuse unha libraría para simular escenarios biolóxicos con epistasia chamada Toxo. Esta libraría permite a simulación de epistasia seguindo modelos de interacción xenética existentes para orde alto.[Resumen] En los últimos años, los estudios de asociación del genoma completo (Genome- Wide Association Studies, GWAS) están ganando mucha popularidad de cara a buscar una explicación genética a la presencia o ausencia de ciertas enfermedades en los seres humanos. Existe un consenso entre estos estudios acerca de que muchas enfermedades complejas presentan interacciones entre los diferentes genes que intervienen en su expresión, un fenómeno conocido como epistasia. Esta tesis se centra en el estudio de este fenómeno empleando la computación de altas prestaciones (High-Performance Computing, HPC) y desde su perspectiva estadística: la desviación de la expresión de un fenotipo como suma de los efectos de múltiples variantes genéticas. Para ello se ha desarrollado una primera herramienta, MPI3SNP, que identifica interacciones de tres variantes a partir de un conjunto de datos de entrada. MPI3SNP implementa una búsqueda exhaustiva empleando un test de asociación basado en la Información Mutua, y explota los recursos de clústeres de CPUs o GPUs para acelerar la búsqueda. Con la ayuda de esta herramienta, hemos evaluado el estado del arte de la detección de epistasia a través de un estudio que compara el rendimiento de veintisiete herramientas. La conclusión más importante de esta comparativa es la incapacidad de los métodos no exhaustivos de localizar interacciones ante la ausencia de efectos marginales (pequeños efectos de asociación de variantes individuales pertenecientes a una relación epistática). Por ello, esta tesis continuó centrándose en la optimización de la búsqueda exhaustiva. Por un lado, se mejoró la eficiencia del test de asociación a través de una implementación vectorial del mismo. Por otra parte, se diseñó un algoritmo distribuido que implementa una búsqueda exhaustiva capaz de encontrar relaciones epistáticas de cualquier tamaño. Estos dos hitos se logran en Fiuncho, una herramienta que integra toda la investigación realizada, obteniendo un rendimiento en clústeres de CPUs que supera a todas sus alternativas del estado del arte. A mayores, también se ha desarrollado una librería para simular escenarios biológicos con epistasia llamada Toxo. Esta librería permite la simulación de epistasia siguiendomodelos de interacción existentes para orden alto.[Abstract] In recent years, Genome-Wide Association Studies (GWAS) have become more and more popular with the intent of finding a genetic explanation for the presence or absence of particular diseases in human studies. There is consensus about the presence of genetic interactions during the expression of complex diseases, a phenomenon called epistasis. This thesis focuses on the study of this phenomenon, employingHigh- Performance Computing (HPC) for this purpose and from a statistical definition of the problem: the deviation of the expression of a phenotype from the addition of the individual contributions of genetic variants. For this purpose, we first developedMPI3SNP, a programthat identifies interactions of three variants froman input dataset. MPI3SNP implements an exhaustive search of epistasis using an association test based on the Mutual Information and exploits the resources of clusters of CPUs or GPUs to speed up the search. Then, we evaluated the state-of-the-art methods with the help of MPI3SNP in a study that compares the performance of twenty-seven tools. The most important conclusion of this study is the inability of non-exhaustive approaches to locate epistasis in the absence of marginal effects (small association effects of individual variants that partake in an epistasis interaction). For this reason, this thesis continued focusing on the optimization of the exhaustive search. First, we improved the efficiency of the association test through a vector implementation of this procedure. Then, we developed a distributed algorithm capable of locating epistasis interactions of any order. These two milestones were achieved in Fiuncho, a program that incorporates all the research carried out, obtaining the best performance in CPU clusters out of all the alternatives of the state-of-the-art. In addition, we also developed a library to simulate particular scenarios with epistasis called Toxo. This library allows for the simulation of epistasis that follows existing interaction models for high-order interactions

    Fiuncho: a program for any-order epistasis detection in CPU clusters

    Get PDF
    Financiado para publicación en acceso aberto: CRUE/CISUG[Abstract]: Epistasis can be defined as the statistical interaction of genes during the expression of a phenotype. It is believed that it plays a fundamental role in gene expression, as individual genetic variants have reported a very small increase in disease risk in previous Genome-Wide Association Studies. The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used. This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach. It supports epistasis interactions of any order, and when compared with other exhaustive methods, it is on average 358, 7 and 3 times faster than MDR, MPI3SNP and BitEpi, respectively.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de Investigación de Galicia accreditation 2019–2022, Grant no. ED431G 2019/01), Consolidation Program of Competitive Research (Grant no. ED431C 2021/30), and the FPU Program of the Ministry of Education of Spain (Grant no. FPU16/01333).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    A SIMD Algorithm for the Detection of Epistatic Interactions of Any Order

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Epistasis is a phenomenon in which a phenotype outcome is determined by the interaction of genetic variation at two or more loci and it cannot be attributed to the additive combination of effects corresponding to the individual loci. Although it has been more than 100 years since William Bateson introduced this concept, it still is a topic under active research. Locating epistatic interactions is a computationally expensive challenge that involves analyzing an exponentially growing number of combinations. Authors in this field have resorted to a multitude of hardware architectures in order to speed up the search, but little to no attention has been paid to the vector instructions that current CPUs include in their instruction sets. This work extends an existing third-order exhaustive algorithm to support the search of epistasis interactions of any order and discusses multiple SIMD implementations of the different functions that compose the search using Intel AVX Intrinsics. Results using the GCC and the Intel compiler show that the 512-bit explicit vector implementation proposed here performs the best out of all of the other implementations evaluated. The proposed 512-bit vectorization accelerates the original implementation of the algorithm by an average factor of 7 and 12, for GCC and the Intel Compiler, respectively, in the scenarios tested.This work is supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/501100011033), the Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, grant no. ED431G2019/01), Consolidation Program of Competitive Research (grant no. ED431C 2021/30), the FPU Program of the Ministry of Education of Spain (grant no. FPU16/01333), and the Universidade da Coruña/CISUG for funding the open access chargeXunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C2021/3

    Fast search of third-order epistatic interactions on CPU and GPU clusters

    Get PDF
    [Abstract] Genome-Wide Association Studies (GWASs), analyses that try to find a link between a given phenotype (such as a disease) and genetic markers, have been growing in popularity in the recent years. Relations between phenotypes and genotypes are not easy to identify, as most of the phenotypes are a product of the interaction between multiple genes, a phenomenon known as epistasis. Many authors have resorted to different approaches and hardware architectures in order to mitigate the exponential time complexity of the problem. However, these studies make some compromises in order to keep a reasonable execution time, such as limiting the number of genetic markers involved in the interaction, or discarding some of these markers in an initial filtering stage. This work presents MPI3SNP, a tool that implements a three-way exhaustive search for cluster architectures with the aim of mitigating the exponential growth of the run-time. Modern cluster solutions usually incorporate GPUs. Thus, MPI3SNP includes implementations for both multi-CPU and multi-GPU clusters. To contextualize the performance achieved, MPI3SNP is able to analyze an input of 6300 genetic markers and 3200 samples in less than 6 min using 768 CPU cores or 4 min using 8 NVIDIA K80 GPUs. The source code is available at https://github.com/chponte/mpi3snp.Ministerio de Economía y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia and FEDER funds; ED431G/01Consolidation Program of Competitive Research; ED431C 2017/04Ministerio de Educación; FPU16/0133

    PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models

    Get PDF
    [Abstract] Background Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table. Results This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use. Conclusions PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases.This study and publication costs were funded by the Ministry of Science and Innovation of Spain (grant PID2019-104184RB-I00/AEI/10.13039/501100011033) and by Xunta de Galicia and FEDER funds of the EU (CITIC-Centro de Investigación de Galicia accreditation, grant ED431G 2019/01; Consolidation Program of Competitive Reference Groups, grant ED431C 2021/30). CP was funded by the Ministry of Education of Spain (grant FPU16/01333). The funders did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscriptXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    Evaluation of Existing Methods for High-Order Epistasis Detection

    Get PDF
    [Abstract] Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.10.13039/501100010801-Xunta de Galicia (Grant Number: ED431C2016-037, ED431C2017/04 and ED431G2019/01) 10.13039/501100003176-Ministerio de Educacion Cultura y Deporte (Grant Number: FPU16/01333) 10.13039/501100003329-Ministerio de Economia y Competitividad (Grant Number: CGL2016-75482-P, PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/50110 and TIN2016-75845-P)Xunta de Galicia; ED431C2016-037Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0

    Toxo: A Library for Calculating Penetrance Tables of High-Order Epistasis Models

    Get PDF
    [Abstract] Background Epistasis is defined as the interaction between different genes when expressing a specific phenotype. The most common way to characterize an epistatic relationship is using a penetrance table, which contains the probability of expressing the phenotype under study given a particular allele combination. Available simulators can only create penetrance tables for well-known epistasis models involving a small number of genes and under a large number of limitations. Results Toxo is a MATLAB library designed to calculate penetrance tables of epistasis models of any interaction order which resemble real data more closely. The user specifies the desired heritability (or prevalence) and the program maximizes the table’s prevalence (or heritability) according to the input epistatic model boundaries. Conclusions Toxo extends the capabilities of existing simulators that define epistasis using penetrance tables. These tables can be directly used as input for software simulators such as GAMETES so that they are able to generate data samples with larger interactions and more realistic prevalences/heritabilities.This research was supported by the Ministry of Economy and Competitiveness of Spain (CGL2016-75482-P), the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (TIN2016-75845-P), the Xunta de Galicia (Grupo de Referencia Competitiva, ED431C2016-037), the Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, ref. ED431G2019/01), Consolidation Program of Competitive Research (ED431C 2017/04) and the FPU Program of the Ministry of Education of Spain (FPU16/01333)Xunta de Galicia; ED431C2016-037Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0

    StarHorse results for spectroscopic surveys + Gaia DR3: Chrono-chemical populations in the solar vicinity, the genuine thick disk, and young-alpha rich stars

    Full text link
    The Gaia mission has provided an invaluable wealth of astrometric data for more than a billion stars in our Galaxy. The synergy between Gaia astrometry, photometry, and spectroscopic surveys give us comprehensive information about the Milky Way. Using the Bayesian isochrone-fitting code StarHorse, we derive distances and extinctions for more than 10 million unique stars observed by both Gaia Data Release 3 as well as public spectroscopic surveys: GALAH DR3, LAMOST DR7 LRS, LAMOST DR7 MRS, APOGEE DR17, RAVE DR6, SDSS DR12 (optical spectra from BOSS and SEGUE), Gaia-ESO DR5 survey, and Gaia RVS part of Gaia DR3 release. We use StarHorse for the first time to derive stellar age for main-sequence turnoff and subgiant branch stars (MSTO-SGB), around 2.5 million stars with age uncertainties typically around 30%, 15% for only SGB stars, depending on the resolution of the survey. With the derived ages in hand, we investigate the chemical-age relations. In particular, the α\alpha and neutron-capture element ratios versus age in the solar neighbourhood show trends similar to previous works, validating our ages. We use the chemical abundances from local subgiant samples of GALAH DR3, APOGEE DR17 and LAMOST MRS DR7 to map groups with similar chemical compositions and StarHorse ages with the dimensionality reduction technique t-SNE and the clustering algorithm HDBSCAN. We identify three distinct groups in all three samples. Their kinematic properties confirm them to be the genuine chemical thick disk, the thin disk and a considerable number of young alpha-rich stars. We confirm that the genuine thick disk's kinematics and age properties are radically different from those of the thin disk and compatible with high-redshift (z\approx2) star-forming disks with high dispersion velocities.Comment: 27 pages, 19 figures. Accepted for publication in Astronomy & Astrophysics. Catalogues can be downloaded at https://data.aip.de

    Modeling the natural gas knocking behaviour using gas-phase infrared spectra and multivariate calibration

    Get PDF
    [Abstract] To assess the knocking properties of natural gas (NG) when it is used as fuel for vehicles is vital to optimize the design and functioning of their motors. Analytical efforts in this field are needed as the engines used to define it empirically are not available anymore, and existent mathematical algorithms yield different accuracy. The hybridization of gas-phase infrared spectrometry and partial least squares multivariate regression is presented first time to address the determination of the methane number (MN) of NG samples. It circumvents the need for the previous knowledge of the NG composition required to apply dedicated equations. The use of true NG samples to develop the models is also quite new in the field. Proof-of-concept studies were made with synthetic spectra and, then, a collection of liquefied NG samples for which MN values were computed by the National Physics Laboratory algorithm (NPL) from their sample composition were used to develop operative models. Additional validation was made with a collection of synthetic standard mixtures prepared for two European projects (EMRP LNG II and EMPIR LNG III) whose service methane numbers (SMN) were measured with an engine. The FTIR-PLS approach yielded statistically unbiased predictions with average standard errors around 0.4% MN when compared to the NPL-MN and SMN values, and standard deviations of the means ca. 1% MN. The approach is fast, cost effective as it involves standard instrumentation, and can be considered compliant with the green chemistry principles.This work is part of the EMPIR 16ENG09 project ‘Metrological support for LNG and LBG as transport fuel (LNG III)’. This project has received funding from the EMPIR programme co-financed by the Participant States and from the European Union's Horizon 2020 Research and Innovation programme. The authors from TU Braunschweig would like to thank IAV, Mahle, MAN Truck & Bus and Motortech for their support in preparing the test engine. The Group of Applied Analytical Chemistry of the University of A Coruña acknowledges Mestrelab, Reganosa and Naturgy for hiring its services for FTIR method developmentFinanciado para publicación en acceso aberto: Universidade da Coruña/CISU

    Programa Curricular de Educación Básica Alternativa. Ciclo Avanzado

    Get PDF
    El programa curricular de EBA para el ciclo avanzado se ha estructurado en dos partes: la primera, contiene los aspectos generales como el perfil de egreso, las características de los estudiantes, las características y organización de la modalidad, los enfoques transversales; asimismo, las orientaciones para la planificación y la evaluación, orientaciones para el desarrollo de los aprendizajes, la tutoría y orientación educativa, y los espacios educativos. En la segunda parte, se presenta las competencias transversales y las competencias organizadas en las áreas curriculares, así como, los desempeños que se alinean a los niveles de los estándares prescritos para los ciclos de la modalidad
    corecore