54 research outputs found

    Graph-Regularized Dual Lasso for Robust eQTL Mapping

    Get PDF
    Motivation: As a promising tool for dissecting the genetic basis of complex traits, expression quantitative trait loci (eQTL) mapping has attracted increasing research interest. An important issue in eQTL mapping is how to effectively integrate networks representing interactions among genetic markers and genes. Recently, several Lasso-based methods have been proposed to leverage such network information. Despite their success, existing methods have three common limitations: (i) a preprocessing step is usually needed to cluster the networks; (ii) the incompleteness of the networks and the noise in them are not considered; (iii) other available information, such as location of genetic markers and pathway information are not integrated. Results: To address the limitations of the existing methods, we propose Graph-regularized Dual Lasso (GDL), a robust approach for eQTL mapping. GDL integrates the correlation structures among genetic markers and traits simultaneously. It also takes into account the incompleteness of the networks and is robust to the noise. GDL utilizes graph-based regularizers to model the prior networks and does not require an explicit clustering step. Moreover, it enables further refinement of the partial and noisy networks. We further generalize GDL to incorporate the location of genetic makers and gene-pathway information. We perform extensive experimental evaluations using both simulated and real datasets. Experimental results demonstrate that the proposed methods can effectively integrate various available priori knowledge and significantly outperform the state-of-the-art eQTL mapping methods

    Toward Robust Group-Wise eQTL Mapping via Integrating Multi-Domain Heterogeneous Data

    Get PDF
    As a promising tool for dissecting the genetic basis of common diseases, expression quantitative trait loci (eQTL) study has attracted increasing research interest. Traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to biological pathways. This thesis studies the problem of identifying group-wise associations in eQTL mapping. Based on the intuition of group-wise association, we examine how the integration of heterogeneous prior knowledge on the correlation structures between SNPs, and between genes can improve the robustness and the interpretability of eQTL mapping. To obtain a more accurate knowledgebase on the interactions among SNPs and genes, we developed a robust and flexible approach that can incorporate multiple data sources and automatically identify noisy sources. Extensive experiments demonstrate the effectiveness of the proposed algorithms.Doctor of Philosoph

    Sparse regression models for unraveling group and individual associations in eQTL mapping

    Get PDF
    BackgroundAs a promising tool for dissecting the genetic basis of common diseases, expression quantitative trait loci (eQTL) study has attracted increasing research interest. Traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to biological pathways.ResultsTo alleviate this limitation, in this paper, we propose geQTL, a sparse regression method that can detect both group-wise and individual associations between SNPs and expression traits. geQTL can also correct the effects of potential confounders. Our method employs computationally efficient technique, thus it is able to fulfill large scale studies. Moreover, our method can automatically infer the proper number of group-wise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method. The results show that geQTL can effectively detect both individual and group-wise signals and outperforms the state-of-the-arts by a large margin.ConclusionsThis paper well illustrates that decoupling individual and group-wise associations for association mapping is able to improve eQTL mapping accuracy, and inferring individual and group-wise associations.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0986-9) contains supplementary material, which is available to authorized users

    Fast and robust group-wise eQTL mapping using sparse graphical models

    Get PDF
    Abstract Background Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. Results We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Conclusions Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes

    STATISTICAL METHODS IN GENETIC STUDIES

    Get PDF
    This dissertation includes three Chapters. A brief description of each chapter is organized as follows. In Chapter 1, we proposed a new method, called MF-TOWmuT, for genome-wide association studies with multiple genetic variants and multiple phenotypes using family samples. MF-TOWmuT uses kinship matrix to account for sample relatedness. It is worth mentioning that in simulations, we considered hidden polygenic effects and varied the proportion of variance contributed by it to generate phenotypes. Simulation studies show that MF-TOWmuT can preserve the type I error rates and is more powerful than several existing methods in different simulation scenarios, MFTOWmuT is also quite robust to the proportion of variance explained by invisible polygenic effects and to the direction of effects of genetic variants. In Chapter 2, we proposed a fast and efficient low rank penalized regression with the Elastic Net penalty for the eQTL mapping, called LORSEN. By considering the Elastic Net penalty instead of the L1 penalty, our method can overcome two crucial drawbacks of the L1 penalty, and outperforms two commonly used methods for the eQTL mapping, LORS and FastLORS, in many simulation scenarios in terms of average Area Under the Curve (AUC). In Chapter 3, we proposed a bipartite network-based penalized regression model for the eQTL mapping, called BiNetPeR. This method takes into account the SNPgene marginal association evidence to construct the SNP-gene bipartite network, then uses such a bipartite network to obtain the projected SNP network. Based on the normalized Laplacian matrix of the projected SNP network, we then formulate the eQTL mapping into a penalized regression model. Our simulation results show that our proposed method can maintain the appropriate false positive rate and outperforms two competing methods for the eQTL mapping, FastLORS and mtLasso2G

    Expression QTLs Mapping and Analysis: A Bayesian Perspective.

    Get PDF
    The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results

    Network‐based feature selection reveals substructures of gene modules responding to salt stress in rice

    Get PDF
    Rice, an important food resource, is highly sensitive to salt stress, which is directly related to food security. Although many studies have identified physiological mechanisms that confer tolerance to the osmotic effects of salinity, the link between rice genotype and salt tolerance is not very clear yet. Association of gene co‐expression network and rice phenotypic data under stress has penitential to identify stress‐responsive genes, but there is no standard method to associate stress phenotype with gene co‐expression network. A novel method for integration of gene co‐expression network and stress phenotype data was developed to conduct a system analysis to link genotype to phenotype. We applied a LASSO‐based method to the gene co‐expression network of rice with salt stress to discover key genes and their interactions for salt tolerance‐related phenotypes. Submodules in gene modules identified from the co‐expression network were selected by the LASSO regression, which establishes a linear relationship between gene expression profiles and physiological responses, that is, sodium/potassium condenses under salt stress. Genes in these submodules have functions related to ion transport, osmotic adjustment, and oxidative tolerance. We argued that these genes in submodules are biologically meaningful and useful for studies on rice salt tolerance. This method can be applied to other studies to efficiently and reliably integrate co‐expression network and phenotypic data
    corecore