2,707 research outputs found
Strategies For Improving Epistasis Detection And Replication
Genome-wide association studies (GWAS) have been extensively critiqued for their perceived inability to adequately elucidate the genetic underpinnings of complex disease. Of particular concern is “missing heritability,” or the difference between the total estimated heritability of a phenotype and that explained by GWAS-identified loci. There are numerous proposed explanations for this missing heritability, but a frequently ignored and potentially vastly informative alternative explanation is the ubiquity of epistasis underlying complex phenotypes.
Given our understanding of how biomolecules interact in networks and pathways, it is not unreasonable to conclude that the effect of variation at individual genetic loci may non-additively depend on and should be analyzed in the context of their interacting partners. It has been recognized for over a century that deviation from expected Mendelian proportions can be explained by the interaction of multiple loci, and the epistatic underpinnings of phenotypes in model organisms have been extensively experimentally quantified. Therefore, the dearth of inspiring single locus GWAS hits for complex human phenotypes (and the inconsistent replication of these between populations) should not be surprising, as one might expect the joint effect of multiple perturbations to interacting partners within a functional biological module to be more important than individual main effects.
Current methods for analyzing data from GWAS are not well-equipped to detect epistasis or replicate significant interactions. The multiple testing burden associated with testing each pairwise interaction quickly becomes nearly insurmountable with increasing numbers of loci. Statistical and machine learning approaches that have worked well for other types of high-dimensional data are appealing and may be useful for detecting epistasis, but potentially require tweaks to function appropriately. Biological knowledge may also be leveraged to guide the search for epistasis candidates, but requires context-appropriate application (as, for example, two loci with significant main effects may not have a significant interaction, and vice versa).
Rather than renouncing GWAS and the wealth of associated data that has been accumulated as a failure, I propose the development of new techniques and incorporation of diverse data sources to analyze GWAS data in an epistasis-centric framework
Mapping gene associations in human mitochondria using clinical disease phenotypes
Nuclear genes encode most mitochondrial proteins, and their mutations cause diverse and debilitating clinical disorders. To date, 1,200 of these mitochondrial genes have been recorded, while no standardized catalog exists of the associated clinical phenotypes. Such a catalog would be useful to develop methods to analyze human phenotypic data, to determine genotype-phenotype relations among many genes and diseases, and to support the clinical diagnosis of mitochondrial disorders. Here we establish a clinical phenotype catalog of 174 mitochondrial disease genes and study associations of diseases and genes. Phenotypic features such as clinical signs and symptoms were manually annotated from full-text medical articles and classified based on the hierarchical MeSH ontology. This classification of phenotypic features of each gene allowed for the comparison of diseases between different genes. In turn, we were then able to measure the phenotypic associations of disease genes for which we calculated a quantitative value that is based on their shared phenotypic features. The results showed that genes sharing more similar phenotypes have a stronger tendency for functional interactions, proving the usefulness of phenotype similarity values in disease gene network analysis. We then constructed a functional network of mitochondrial genes and discovered a higher connectivity for non-disease than for disease genes, and a tendency of disease genes to interact with each other. Utilizing these differences, we propose 168 candidate genes that resemble the characteristic interaction patterns of mitochondrial disease genes. Through their network associations, the candidates are further prioritized for the study of specific disorders such as optic neuropathies and Parkinson disease. Most mitochondrial disease phenotypes involve several clinical categories including neurologic, metabolic, and gastrointestinal disorders, which might indicate the effects of gene defects within the mitochondrial system. The accompanying knowledgebase (http://www.mitophenome.org/) supports the study of clinical diseases and associated genes
Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci.
Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact
Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods
Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype) in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals aged 30-45 years. PGMRA involves biclustering genotype and lipidome data independently followed by their inter-domain integration based on hypergeometric tests of the number of shared individuals. Pathway enrichment analysis was performed on the SNP sets to identify their associated biological processes. We identified 93 statistically significant (hypergeometric p-value \u3c 0.01) lipidome-genotype relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes. Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs and participants, thus representing most distinct subgroups. We identified 30 significantly enriched biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome subgroups through which the identified genetic variants can influence and regulate plasma lipid related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the studied Finnish population that may have distinct disease trajectories and therefore could be useful in precision medicine research
Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods
Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism
and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover
phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype)
in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals
aged 30–45 years. PGMRA involves biclustering genotype and lipidome data independently followed
by their inter-domain integration based on hypergeometric tests of the number of shared individuals.
Pathway enrichment analysis was performed on the SNP sets to identify their associated biological
processes. We identified 93 statistically significant (hypergeometric p-value < 0.01) lipidomegenotype
relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes.
Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs
and participants, thus representing most distinct subgroups. We identified 30 significantly enriched
biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome
subgroups through which the identified genetic variants can influence and regulate plasma lipid
related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the
studied Finnish population that may have distinct disease trajectories and therefore could be useful in
precision medicine research.Research Council of FinlandSocial Insurance Institution of FinlandCompetitive State Research Financing of Expert Responsibility area of Kuopio, Tampere and Turku University HospitalsJuho Vainio FoundationPaavo Nurmi FoundationFinnish Foundation for Cardiovascular ResearchFinnish Cultural Foundation
Finnish IT center for scienceSigrid Juselius FoundationTampere Tuberculosis FoundationEmil Aaltonen FoundationYrjo Jahnsson FoundationSigne and Ane Gyllenberg FoundationDiabetes Research Foundation of Finnish Diabetes Association 322098
286284
134309
126925
121584
124282
255381
256474
283115
319060
320297
314389
338395
330809
104821
129378
117797
141071
INFRAIA-2016-1-730897Horizon 2020European Research Council (ERC)
European Commission 349708Tampere University Hospital Supporting FoundationFinnish Society of Clinical ChemistrySpanish Government RTI2018-098983-B-100Laboratoriolaaketieteen Edistamissaatio~SrIda Montinin saatioKalle Kaiharin saatioAarne Koskelon saatioFaculty of Medicine and Health Technology, Tampere UniversityProject HPC-EUROPA3 X51001
50191928EC Research Innovation Action under H2020 Programme 75532
- …