163 research outputs found

    Prioritizing single-nucleotide polymorphisms and variants associated with clinical mastitis

    Get PDF
    Next-generation sequencing technology has provided resources to easily explore and identify candidate single-nucleotide polymorphisms (SNPs) and variants. However, there remains a challenge in identifying and inferring the causal SNPs from sequence data. A problem with different methods that predict the effect of mutations is that they produce false positives. In this hypothesis, we provide an overview of methods known for identifying causal variants and discuss the challenges, fallacies, and prospects in discerning candidate SNPs. We then propose a three-point classification strategy, which could be an additional annotation method in identifying causalities

    Genome-wide association study for detecting autoimmune-disease-associated genetic pattern differences in specific HLA type carriers

    Get PDF
    The HLA locus variants are one of the strongest genetic predictors for most, if not all, human autoimmune diseases. The HLA locus genes include the antigen-presenting cell surface peptide encoding genes, which form an essential component in the maturation of the T-cell population in the thymus, and their subsequent activation in the periphery. Leveraging the modern population-wide genotype information that capture even the most polymorphic loci, this work sets the aim to design a case-control genome-wide association study (GWAS), that would result in the detection of non-HLA genetic variants that have a statistically different effect on an autoimmune disease in the carriers of certain HLA types, in comparison to the non-carriers. For the purpose of this aim, study groups are assembled based on specific HLA allele doses, so that for 42 HLA allele typesselected for this study there are 42 HLA-specific groups where every individual is a carrier of at least one copy of the HLA allele type. The effect sizes from the summary statistics of the HLA-specific GWASs are compared to a general population GWAS (which is done on all the participants of the Estonian Biobank in this case). The variants are considered relevant to this aim if their effect size is statisticallt different in the HLA-specific groups than they are in the general population GWAS

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Optimization, random resampling, and modeling in bioinformatics

    Get PDF
    Quantitative phenotypes regulated by multiple genes are prevalent in nature and many diseases falls into this category. High-throughput sequencing and high-performance computing provides a basis to understand quantitative phenotypes. However, finding a statistical approach correctly model the phenotypes remain a challenging problem. In this work, I present a resampling-based approach to obtain biological functional categories from gene set and apply the approach to analyze lithium-sensitivity of neurological diseases and cancer. Then, the non-parametrical permutation-based approach is applied to evaluate the performance of a GWAS modeling procedure. While the procedure performs well in statistics, search space reduction is required to address the computation challenge

    Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis

    Get PDF
    Background: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. Results: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case–control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. Conclusions: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case–control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway

    Strategies For Improving Epistasis Detection And Replication

    Get PDF
    Genome-wide association studies (GWAS) have been extensively critiqued for their perceived inability to adequately elucidate the genetic underpinnings of complex disease. Of particular concern is “missing heritability,” or the difference between the total estimated heritability of a phenotype and that explained by GWAS-identified loci. There are numerous proposed explanations for this missing heritability, but a frequently ignored and potentially vastly informative alternative explanation is the ubiquity of epistasis underlying complex phenotypes. Given our understanding of how biomolecules interact in networks and pathways, it is not unreasonable to conclude that the effect of variation at individual genetic loci may non-additively depend on and should be analyzed in the context of their interacting partners. It has been recognized for over a century that deviation from expected Mendelian proportions can be explained by the interaction of multiple loci, and the epistatic underpinnings of phenotypes in model organisms have been extensively experimentally quantified. Therefore, the dearth of inspiring single locus GWAS hits for complex human phenotypes (and the inconsistent replication of these between populations) should not be surprising, as one might expect the joint effect of multiple perturbations to interacting partners within a functional biological module to be more important than individual main effects. Current methods for analyzing data from GWAS are not well-equipped to detect epistasis or replicate significant interactions. The multiple testing burden associated with testing each pairwise interaction quickly becomes nearly insurmountable with increasing numbers of loci. Statistical and machine learning approaches that have worked well for other types of high-dimensional data are appealing and may be useful for detecting epistasis, but potentially require tweaks to function appropriately. Biological knowledge may also be leveraged to guide the search for epistasis candidates, but requires context-appropriate application (as, for example, two loci with significant main effects may not have a significant interaction, and vice versa). Rather than renouncing GWAS and the wealth of associated data that has been accumulated as a failure, I propose the development of new techniques and incorporation of diverse data sources to analyze GWAS data in an epistasis-centric framework

    High-throughput computational methods and software for quantitative trait locus (QTL) mapping

    Get PDF
    De afgelopen jaren zijn vele nieuwe technologieen zoals Tiling arrays en High throughput DNA sequencing een belangrijke rol gaan spelen binnen het onderzoeksveld van de systeem genetica. Voor onderzoekers is het extreem belangrijk om te begrijpen dat deze methodes hun manier van werken zullen gaan beinvloeden. Deit proefschrift beschrijft mogelijke oplossingen voor deze 'Big Data' lawine die systemen genetica heeft getroffen.Dit proefschrift beschrijft de werkzaamheden uitgevoerd aan het Groningen Bioinformatics Centre om slimmere en geoptimaliseerde algoritmen zoals Pheno2Geno en MQM te ontwikkelen en een systeem om 'collaborative' research mogelijk te maken genaamd xQTL werkbank om door middel van high-throughput systemen genetica data te analyseren.In recent years many new technologies such as tiling arrays and high-throughput sequencinghave come to play an important role in systems genetics research. For researchers it is ofthe utmost importance to understand how this affects their research. This work describespossible solutions to this ‘Big Data’ avalanche which has hit systems genetics.This thesis describes the work carried out during the author’s 4 year PHD project at theGroningen Bioinformatics Centre to develop smarter and more optimized algorithms suchas Pheno2Geno and MQM, and to use a collaborative approach such as xQTL workbench tostore and analyse high-throughput systems genetics data
    corecore