308 research outputs found

    Methods in statistical genomics

    Get PDF
    The objective of this book is to describe procedures for analyzing genome-wide association studies (GWAS). Some of the material is unpublished and contains commentary and unpublished research; other chapters (Chapters 4 through 7) have been published in other journals. Each previously published chapter investigates a different genomics model, but all focus on identifying the strengths and limitations of various statistical procedures that have been applied to different GWAS scenarios.Publishe

    Using population biobanks to understand complex traits, rare diseases, and their shared genetic architecture

    Get PDF
    The study of the role of genetic variability in common traits has led to a growing number of studies aimed at representing whole populations. These studies gather multiple layers of information on healthy and non-healthy individuals at large scales, constituting what is known as population biobanks.In this thesis I took advantage of the potential of these population biobanks to measure the influence of genetic variation in common and rare traits. I explored the mechanisms behind these by exploring their interaction with conditions, physiological measurements, and habits in general and healthy population. First, I used the Lifelines cohort, with genetic information of Dutch population. Here, my colleagues and I explored traits with different levels of genetic influence we uncovered associations between both Blood type and dairy consumption with human gut microbiome function and composition, and we identified a protective factor for a rare type of cardiomyopathy with potential use for diagnosis.Additionally, within a global collaboration across world-wide biobanks totaling > 2 million individuals, we demonstrated the robustness of the connections between genetic variation and 14 different diseases across the populations. We also provided methodological guidance for the combination of the effects of genetic variation to calculate the risk of disease in studies including biobanks with populations of different ethnic backgrounds.Overall, my PhD research contributed on identifying and validating which factors are relevant for potential clinical applications, and provided guidelines to be used in future genetic studies on common traits and diseases at a global scale

    Genetic landscape of multiple sclerosis susceptibility by leveraging multi-omics data

    Get PDF
    The main objective of the research studies presented in this thesis is to study the genetic variants and the expression of genes that relate to Multiple Sclerosis (MS). MS is a polygenic disease with HLA-DRB1*15:01 allele as a strong risk factor. Currently there are more than 200 non-HLA regions identified for MS. However, most of the risk loci identified in those studies are primarily driven by the relapsing-remitting form of MS (RRMS). To identify risk factors specific for the primary progressive form of MS (PPMS) which is a smaller group of MS patients, we have examined the exomes of PPMS and RRMS patients matching to population based controls in a case-control study setting and reported risk variants and mutations that are associated to PPMS and RRMS. The context of this study is during the ‘post-GWAS’ era, when researchers are primarily focused to understand the functional consequences of the genetic risk factors. Using the possibilities of transcriptomic and genotyping data, genes that correlate to the risk loci are identified in relevant cell types of MS. Several statistical methods are implemented to characterize the risk loci and replicate the findings in the context of disease. MicroRNAs (miRNAs), small non-coding RNAs which regulate gene expression at post-transcriptional level, have been identified dysregulated in autoimmune diseases, including MS. We used experimental autoimmune encephalomyelitis (EAE), a commonly used animal model for MS to understand the role of miRNA in the immune activation of EAE. Next generation sequencing (NGS) methods were widely applied in all of these studies specifically at transcriptomic and genomic level of the disease. NGS methods are data intensive but have higher reliability. To test the reliability, we compared reported gene expression measurements for ostensibly similar tissue samples collected from different RNA-seq studies. We found an overall consistency on expression data obtained from different studies and identified the factors contributing to systematic differences. This thesis gives an overview of progresses happening in the area of MS genetics, EAE model for neuroinflammation and omics data analysis to address genetic regulation of disease

    Genetic association analysis of complex diseases through information theoretic metrics and linear pleiotropy

    Get PDF
    The main goal of this thesis was to help in the identification of genetic variants that are responsible for complex traits, combining both linear and nonlinear approaches. First, two one-locus approaches were proposed. The first one defined and characterized a novel nonlinear test of genetic association, based on the mutual information measure. This test takes into account the genetic structure of the population. It was applied to the GAW17 dataset and compared to the standard linear test of association. Since the solution of the GAW17 simulation model was known, this study served to characterize the performance of the proposed nonlinear methods in comparison to the linear one. The proposed nonlinear test was able to recover the results obtained with linear methods but also detected an additional SNP in a gene related with the phenotype. In addition, the performance of both tests in terms of their accuracy in classification (AUC) was similar. In contrast, the second approach was an exploratory study on the relationship between SNP variability among species and SNP association with disease, at different genetic regions. Two sets of SNPs were compared, one containing deleterious SNPs and the other defined by neutral SNPs. Both sets were stratified depending on the region where the polymorphisms were located, a feature that may have influenced their conservation across species. It was observed that, for most functional regions, SNPs associated to diseases tend to be significantly less variable across species than neutral SNPs. Second, a novel nonlinear methodology for multiloci genetic association was proposed with the goal of detecting association between combinations of SNPs and a phenotype. The proposed method was based on the mutual information of statistical significance, called MISS. This approach was compared with MLR, the standard linear method used for genetic association based on multiple linear regressions. Both were applied as a relevance criterion of a new multi-solution floating feature selection algorithm (MSSFFS), proposed in the context of multi-loci genetic association for complex diseases. Both were also compared with MECPM, an algorithm for searching predictive multi-loci interactions with a criterion of maximum entropy. The three methods were tested on the SNPs of the F7 gene, and the FVII levels in blood, with the data from the GAIT project. The proposed nonlinear method (MISS) improved the results of traditional genetic association methods, detecting new SNP-SNP interactions. Most of the obtained sets of SNPs were in concordance with the functional results found in the literature where the obtained SNPs have been described as functional elements correlated with the phenotype. Third, a linear methodological framework for the simultaneous study of several phenotypes was proposed. The methodology consisted in building new phenotypic variables, named metaphenotypes, that capture the joint activity of sets of phenotypes involved in a metabolic pathway. These new variables were used in further association tests with the aim of identifying genetic elements related with the underlying biological process as a whole. As a practical implementation, the methodology was applied to the GAIT project dataset with the aim of identifying genetic markers that could be related to the coagulation process as a whole and thus to thrombosis. Three mathematical models were used for the definition of metaphenotypes, corresponding to one PCA and two ICA models. Using this novel approach, already known associations were retrieved but also new candidates were proposed as regulatory genes with a global effect on the coagulation pathway as a whole

    Strategies for the intelligent integration of genetic variance information in multiscale models of neurodegenerative diseases

    Get PDF
    A more complete understanding of the genetic architecture of complex traits and diseases can maximize the utility of human genetics in disease screening, diagnosis, prognosis, and therapy. Undoubtedly, the identification of genetic variants linked to polygenic and complex diseases is of supreme interest for clinicians, geneticists, patients, and the public. Furthermore, determining how genetic variants affect an individual’s health and transmuting this knowledge into the development of new medicine can revolutionize the treatment of most common deleterious diseases. However, this requires the correlation of genetic variants with specific diseases, and accurate functional assessment of genetic variation in human DNA sequencing studies is still a nontrivial challenge in clinical genomics. Assigning functional consequences and clinical significances to genetic variants is an important step in human genome interpretation. The translation of the genetic variants into functional molecular mechanisms is essential in disease pathogenesis and, eventually in therapy design. Although various statistical methods are helpful to short-list the genetic variants for fine-mapping investigation, demonstrating their role in molecular mechanism requires knowledge of functional consequences. This undoubtedly requires comprehensive investigation. Experimental interpretation of all the observed genetic variants is still impractical. Thus, the prediction of functional and regulatory consequences of the genetic variants using in-silico approaches is an important step in the discovery of clinically actionable knowledge. Since the interactions between phenotypes and genotypes are multi-layered and biologically complex. Such associations present several challenges and simultaneously offer many opportunities to design new protocols for in-silico variant evaluation strategies. This thesis presents a comprehensive protocol based on a causal reasoning algorithm that harvests and integrates multifaceted genetic and biomedical knowledge with various types of entities from several resources and repositories to understand how genetic variants perturb molecular interaction, and initiate a disease mechanism. Firstly, as a case study of genetic susceptibility loci of Alzheimer’s disease, I reviewed and summarized all the existing methodologies for Genome Wide Association Studies (GWAS) interpretation, currently available algorithms, and computable modelling approaches. In addition, I formulated a new approach for modelling and simulations of genetic regulatory networks as an extension of the syntax of the Biological Expression Language (OpenBEL). This could allow the representation of genetic variation information in cause-and-effect models to predict the functional consequences of disease-associated genetic variants. Secondly, by using the new syntax of OpenBEL, I generated an OpenBEL model for Alzheimer´s Disease (AD) together with genetic variants including their DNA, RNA or protein position, variant type and associated allele. To better understand the role of genetic variants in a disease context, I subsequently tried to predict the consequences of genetic variation based on the functional context provided by the network model. I further explained that how genetic variation information could help to identify candidate molecular mechanisms for aetiologically complex diseases such as Alzheimer’s disease (AD) and Parkinson’s disease (PD). Though integration of genetic variation information can enhance the evidence base for shared pathophysiology pathways in complex diseases, I have addressed to one of the key questions, namely the role of shared genetic variants to initiate shared molecular mechanisms between neurodegenerative diseases. I systematically analysed shared genetic variation information of AD and PD and mapped them to find shared molecular aetiology between neurodegenerative diseases. My methodology highlighted that a comprehensive understanding of genetic variation needs integration and analysis of all omics data, in order to build a joint model to capture all datasets concurrently. Moreover genomic loci should be considered to investigate the effects of GWAS variants rather than an individual genetic variant, which is hard to predict in a biologically complex molecular mechanism, predominantly to investigate shared pathology

    Cellular dissection of psoriasis for transcriptome analyses and the post-GWAS era

    Get PDF
    Abstract Background Genome-scale studies of psoriasis have been used to identify genes of potential relevance to disease mechanisms. For many identified genes, however, the cell type mediating disease activity is uncertain, which has limited our ability to design gene functional studies based on genomic findings. Methods We identified differentially expressed genes (DEGs) with altered expression in psoriasis lesions (n = 216 patients), as well as candidate genes near susceptibility loci from psoriasis GWAS studies. These gene sets were characterized based upon their expression across 10 cell types present in psoriasis lesions. Susceptibility-associated variation at intergenic (non-coding) loci was evaluated to identify sites of allele-specific transcription factor binding. Results Half of DEGs showed highest expression in skin cells, although the dominant cell type differed between psoriasis-increased DEGs (keratinocytes, 35%) and psoriasis-decreased DEGs (fibroblasts, 33%). In contrast, psoriasis GWAS candidates tended to have highest expression in immune cells (71%), with a significant fraction showing maximal expression in neutrophils (24%, P < 0.001). By identifying candidate cell types for genes near susceptibility loci, we could identify and prioritize SNPs at which susceptibility variants are predicted to influence transcription factor binding. This led to the identification of potentially causal (non-coding) SNPs for which susceptibility variants influence binding of AP-1, NF-κB, IRF1, STAT3 and STAT4. Conclusions These findings underscore the role of innate immunity in psoriasis and highlight neutrophils as a cell type linked with pathogenetic mechanisms. Assignment of candidate cell types to genes emerging from GWAS studies provides a first step towards functional analysis, and we have proposed an approach for generating hypotheses to explain GWAS hits at intergenic loci.http://deepblue.lib.umich.edu/bitstream/2027.42/109537/1/12920_2013_Article_485.pd

    Genomic analysis of divergently selected experimental lines in rabbit

    Full text link
    Tesis por compendio[ES] La selección divergente puede cambiar las frecuencias de los marcadores genéticos en direcciones opuestas, produciéndose frecuencias alélicas intermedias en estos marcadores cuando ambas líneas divergentes son consideradas conjuntamente en los análisis genéticos. Por lo tanto, los experimentos de selección divergente aumentan el poder de detección para estudios de asociación de genoma completo (GWAS) y para estudios de escaneo genómico por medio de métodos de huellas de selección. GWASs bayesianos, utilizando el modelo Bayes B, se implementaron para analizar datos genómicos de los caracteres de tamaño de camada del experimento de capacidad uterina con 181 hembras. Las asociaciones fueron evaluadas calculando los factores de Bayes para cada SNP, y calculando los porcentajes de la varianza genómica para cada ventana no solapada de 1-Mb. Los GWASs descubrieron SNPs asociados con el número total de gazapos al parto y los embriones implantados. Además, se revelaron regiones genómicas relevantes para el número total de gazapos al parto (1 región), el número de nacidos vivos (1 región), los embriones implantados (3 regiones) y la tasa de ovulación (5 regiones). Los porcentajes de varianza genómica que explicaban los anteriores caracteres de tamaño de camada fueron 39,48%, 10,36%, 37,21% y 3,95%, respectivamente, en un modelo que excluye el efecto línea; y 7.36%, 1.27%, 15.87% y 3.95%, respectivamente, en un modelo con el efecto línea. La región genómica localizada en el cromosoma del conejo (OCU) 17 en 70.0 - 73.3 Mb se consideró como un nuevo locus de carácter cuantitativo (QTL) asociado a caracteres reproductivos en conejos, ya que esta región fue encontrada solapada para el número total de gazapos al parto, el número de nacidos vivos y los embriones implantados. El gen de la proteína morfogenética ósea 4, BMP4, es el principal gen candidato prometedor dentro del nuevo QTL. Una combinación de GWASs fueron implementados para analizar los datos genómicos del experimento de la grasa intramuscular con 480 conejos. Los métodos de GWASs incluyeron un método bayesiano, modelo Bayes B; y un método frecuentista, regresiones de marcadores únicos con los datos ajustados por el parentesco genómico. Este estudio reveló cuatro regiones genómicas relevantes en OCU1 (1 región), OCU8 (2 regiones) y OCU13 (1 región) asociadas con la grasa intramuscular. La región asociada más importante estaba en OCU8 en 24.59 - 26.95 Mb, y explicó el 7.34% de la varianza genómica. El bajo porcentaje explicado por las principales regiones genómicas relevantes indica un gran componente poligénico para la grasa intramuscular. Los análisis funcionales recuperaron genes vinculados con las rutas y funciones de los metabolismos de energía, carbohidratos y lípidos. Además, se realizó un estudio de escaneo genómico usando conejos del experimento de selección divergente para grasa intramuscular, y usando tres métodos de firmas de selección: índice de fijación de Wright (Fst), coeficiente de verosimilitud compuesto entre poblaciones (XP-CLR) y extensión de homocigosidad de los haplotipos entre poblaciones (XP-EHH). Los resultados mostraron múltiples huellas de selección en todo el genoma del conejo. Ninguna de estas huellas de selección concuerda con las regiones genómicas asociadas con la grasa intramuscular, provenientes de los resultados de los GWASs. En síntesis, los resultados de ambos experimentos, GWASs y el estudio de escaneo genómico, sugieren que la arquitectura genómica de la grasa intramuscular en el conejo parece ser altamente poligénica y sus variantes causales serían apenas detectables. Este estudio demuestra que la detección de variantes causales y marcadores genéticos asociados depende de las hipotéticas arquitecturas genómicas de los caracteres, independientemente de las exitosas respuestas logradas en los dos experimentos de selección divergente. Hasta la fecha, estos hallaz[CA] La selecció divergent pot alterar les freqüències dels marcadors genètics en direccions oposades, donant lloc a freqüències al·lèliques intermèdies quan les dos línies divergents es consideren conjuntament en els anàlisis genètics. Per tant, els experiments de selecció divergents augmenten el poder de detecció en estudis d'associació de genoma ampli (GWAS) i en estudis d'exploració genòmica a través de mètodes de signatures de selecció. GWASs bayesians, utilitzant el model Bayes B, es van implementar per a analitzar dades genòmiques de caràcters de grandària de ventrada de l'experiment de capacitat uterina amb 181 conilles femelles. Les associacions es van provar calculant els factors de Bayes per a cada SNP, i calculant els percentatges de la variància genòmica per a cada finestra no superposada d'1-Mb. Els GWASs van descobrir SNPs associats amb el número total de llorigons al part i els embrions implantats. A més, es van revelar regions genòmiques rellevants per al número total de llorigons al part (1 regió), el número de nascuts vius (1 regió), els embrions implantats (3 regions) i la taxa d'ovulació (5 regions). Els percentatges de variància genòmica que explicaven els anteriors caràcters de grandària de ventrada van ser 39,48%, 10,36%, 37,21% i 3,95%, respectivament, sota un model que exclou l'efecte de línia; i 7.36%, 1.27%, 15.87% i 3.95%, respectivament, sota un model amb efecte de línia. La regió genòmica situada en el cromosoma del conill (OCU) 17 en 70.0 - 73.3 Mb es va considerar com un nou locus de caràcters quantitatius (QTL) associat a caràcters reproductius en conills, ja que aquesta regió es va superposar per al número total de llorigons al part, el número de nascuts vius i els embrions implantats. El gen de la proteïna morfogenètica òssia 4, BMP4, és el principal gen candidat prometedor dins del nou QTL. Una combinació de GWASs es van implementar per a analitzar les dades genòmiques de l'experiment del greix intramuscular amb 480 conills. Els mètodes GWASs van incloure un mètode bayesià, model Bayes B; i un mètode frecuentista, regressions de marcadors únics amb les dades ajustades pel parentiu genòmico. Aquest estudi va revelar quatre regions genòmiques rellevants en OCU1 (1 regió), OCU8 (2 regions) i OCU13 (1 regió) associades amb el greix intramuscular. La regió associada més important estava en OCU8 en 24.59 - 26.95 Mb, i va explicar el 7.34% de la variància genòmica. El baix percentatge explicat per les principals regions genòmiques rellevants indica un gran component poligènic per al greix intramuscular. Els anàlisis funcionals van recuperar gens relacionats amb les rutes i la funció d'energia, metabolismes de carbohidrats i lípids. A més, es va realitzar un estudi d'exploració del genoma usant conills de l'experiment de selecció divergent per a greix intramuscular, i usant tres mètodes de signatures de selecció: índex de fixació de Wright (Fst), coeficient de versemblança compost entre poblacions (XP-CLR) i extensió de homocigosidad dels haplotipos entre poblacions (XP-EHH). Els resultats van mostrar múltiples petjades de selecció en tot el genoma del conill. Cap d'aquestes petjades de selecció concorda amb les regions genòmiques associades a partir dels resultats dels GWASs. En síntesi, els resultats dels dos experiments, GWASs i estudi d'exploració del genoma, suggereixen que l'arquitectura genòmica del greix intramuscular en el conill sembla ser altament poligènica i les seues variants causals serien a penes detectables. Aquest estudi demostra que la detecció de variants causals i marcadors genètics associats depèn de les hipotètiques arquitectures genòmiques dels caràcters, independentment de les respostes reeixides en els dos experiments de selecció divergents. Fins ara, aquestes troballes no tindrien implicacions valuoses per als programes de cria de conills.[EN] Divergent selection can alter frequencies of genetic markers in opposite directions, leading to intermediate allelic frequencies when both divergent lines are jointly considered in the genetic analyses. Therefore, divergent selection experiments increase the detection power for genome wide association studies (GWAS) and for genomic scan studies through methods of selection signatures. Bayesian GWASs using Bayes B model was used to analyse genomic data of litter size traits of the uterine capacity experiment with 181 does. The associations were tested by computing Bayes factors for each SNP, and by computing percentages of the genomic variance for each 1-Mb non-overlapping window. The GWASs uncovered SNPs associated with total number born and implanted embryos. Moreover, relevant genomic regions were revealed for total number born (1 region), number born alive (1 region), implanted embryos (3 regions), and ovulation rate (5 regions). The percentages of genomic variance that accounted for these litter size traits were 39,48%, 10.36%, 37.21%, and 3.95%, respectively, under a model excluding line effect; and 7.36%, 1.27%, 15.87%, and 3.95%, respectively, under a model with line effect. The genomic region located on the rabbit chromosome (OCU) 17 in 70.0 - 73.3 Mb was deemed as a novel quantitative trait locus (QTL) of reproductive traits in rabbits, since this region was found overlapped for total number born, number born alive and implanted embryos. Bone morphogenetic protein 4 gene, BMP4, is the main promising candidate gene within the novel QTL. A combination of GWASs were performed for analysing the genomic data of the intramuscular fat experiment with 480 rabbits. The GWAS methods included a Bayesian method, Bayes B model; and a frequentist method, single marker regressions with the data adjusted by genomic relatedness. This study revealed four relevant genomic regions in OCU1 (1 region), OCU8 (2 regions) and OCU13 (1 region) associated with intramuscular fat. The most important associated region was on OCU8 in 24.59 - 26.95 Mb, and accounted for 7.34% of the genomic variance. The low percentage explained by the main relevant genomic regions indicates a large polygenic component for intramuscular fat. Functional analyses retrieved genes linked to pathways and function of energy, carbohydrate and lipid metabolisms. In addition, a genome scan study was performed using rabbits from the divergent selection experiment for intramuscular fat, and using three methods of selection signatures: Wright's fixation index (Fst), cross population composite likelihood ratio (XP-CLR) and cross population extended haplotype homozygosity (XP-EHH). The results showed multiple selection signatures across the rabbit genome. None of these selection signatures agreed with the associated genomic regions from GWAS findings. In synthesis, the results of both experiments, GWAS and genome scan study, suggest that the genomic architecture of intramuscular fat in rabbit seems to be highly polygenic and their causative variants would be hardly detectable. This study demonstrates that detection of causative variants and associated genetic markers depends on the hypothetical genomic architectures of traits, regardless of the successful responses attained in the two divergent selection experiments. Hitherto, these findings would not have worthwhile implications for the rabbit breeding programs.Sosa Madrid, BS. (2020). Genomic analysis of divergently selected experimental lines in rabbit [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/141376TESISCompendi

    Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

    Get PDF
    Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.Siirretty Doriast

    Grand Celebration: 10th Anniversary of the Human Genome Project

    Get PDF
    In 1990, scientists began working together on one of the largest biological research projects ever proposed. The project proposed to sequence the three billion nucleotides in the human genome. The Human Genome Project took 13 years and was completed in April 2003, at a cost of approximately three billion dollars. It was a major scientific achievement that forever changed the understanding of our own nature. The sequencing of the human genome was in many ways a triumph for technology as much as it was for science. From the Human Genome Project, powerful technologies have been developed (e.g., microarrays and next generation sequencing) and new branches of science have emerged (e.g., functional genomics and pharmacogenomics), paving new ways for advancing genomic research and medical applications of genomics in the 21st century. The investigations have provided new tests and drug targets, as well as insights into the basis of human development and diagnosis/treatment of cancer and several mysterious humans diseases. This genomic revolution is prompting a new era in medicine, which brings both challenges and opportunities. Parallel to the promising advances over the last decade, the study of the human genome has also revealed how complicated human biology is, and how much remains to be understood. The legacy of the understanding of our genome has just begun. To celebrate the 10th anniversary of the essential completion of the Human Genome Project, in April 2013 Genes launched this Special Issue, which highlights the recent scientific breakthroughs in human genomics, with a collection of papers written by authors who are leading experts in the field

    Genetics and metabolomics of elite athletes: Genome-wide association study and Metabolomics profiling of elite athletes

    Get PDF
    AIM: The outstanding performance of elite athletes is a product of a complex interaction between genetic and environmental factors. The aims of this study was to compare differences in genetic and metabolic profiles among different classes of elite athletes and to identify genetically-influenced metabolic profiles (metabotypes) underlying these differences. METHODS: Genome-wide association study (GWAS) was conducted in 1259 elite athlete samples using Drug core BeadChip arrays, followed by non-targeted metabolomics of 692 serum samples. Genotype distribution, differences in metabolic levels and genetically-influenced metabotypes were compared between high and moderate endurance and power sports as well as among sports with different cardiovascular demands (CVD). RESULTS: Out of 341385 SNPs, two novel associations are reported for endurance status including rs56330321 in ATP2B2 (p=1.47E-7) and rs2635438 in SYNE1 (p=2.54E-7). A meta-analysis confirmed the association of rs56330321 and rs2635438 with endurance athlete status at GWAS level of significance. Metabolomics analysis of 740 metabolites was performed in in 191 (discovery cohort) and 500 (replication cohort) elite athletes. These studies revealed changes in various metabolites involved in steroid biosynthesis, fatty acid oxidation, oxidative stress response, xenobiotics and various mediators of cell signaling among different groups of endurance, power and CVD athletes. By combining GWAS with metabolomics profiling data (mGWAS), 19 common variant metabolic quantitative trait loci (mQTLs) were identified, of which 5 were novel. When focusing on metabolites associated with endurance, power and CVD, 4 common variant mQTLs were found, of which one novel mQTL linking 4-androsten-3alpha,17alpha-diol monosulfate and SULT2A1 involved in steroid sulfation was identified in association with endurance. CONCLUSIONS: GWAS, metabolomics and mGWAS of elite athletes identified novel markers associated with elite athletic performance with a potential application in biomarker discovery in relation to elite athletic performance
    corecore