310 research outputs found
A multi-marker test based on family data in genome-wide association study
<p>Abstract</p> <p>Background</p> <p>Complex diseases are believed to be the results of many genes and environmental factors. Hence, multi-marker methods that can use the information of markers from different genes are appropriate for mapping complex disease genes. There already have been several multi-marker methods proposed for case-control studies. In this article, we propose a multi-marker test called a Multi-marker Pedigree Disequilibrium Test (MPDT) to analyze family data from genome-wide association studies. If the parental phenotypes are available, we also propose a two-stage test in which a genomic screening test is used to select SNPs, and then the MPDT is used to test the association of the selected SNPs.</p> <p>Results</p> <p>We use simulation studies to evaluate the performance of the MPDT and the two-stage approach. The results show that the MPDT constantly outperforms the single marker transmission/disequilibrium test (TDT) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Comparing the power of the two-stage approach with that of the one-stage approach, which approach is more powerful depends on the value of the prevalence; when the prevalence is no less than 10%, the two-stage approach may be more powerful than the one-stage approach. Otherwise, the one-stage approach is more powerful.</p> <p>Conclusion</p> <p>The proposed MPDT, is more powerful than the single marker TDT. When the parental phenotypes are available and the prevalence is no less than 10%, the proposed two-stage approach is more powerful than the one-stage approach.</p
Detecting susceptibility genes for rheumatoid arthritis based on a novel sliding-window approach
With the recent rapid improvements in high-throughout genotyping techniques, researchers are facing a very challenging task of large-scale genetic association analysis, especially at the whole-genome level, without an optimal solution. In this study, we propose a new approach for genetic association analysis based on a variable-sized sliding-window framework. This approach employs principal component analysis to find the optimal window size. Using the bisection algorithm in window size searching, the proposed method tackles the exhaustive computation problem. It is more efficient and effective than currently available approaches. We conduct the genome-wide association study in Genetic Analysis Workshop 16 (GAW16) Problem 1 data using the proposed method. Our method successfully identified several susceptibility genes that have been reported by other researchers and additional candidate genes for follow-up studies
A method dealing with a large number of correlated traits in a linkage genome scan
We propose a method to perform linkage genome scans for many correlated traits in the Genetic Analysis Workshop 15 (GAW15) data. The proposed method has two steps: first, we use a clustering method to find the tight clusters of the traits and use the first principal component (PC) of the traits in each cluster to represent the cluster; second, we perform a linkage scan for each cluster by using the representative trait of the cluster. The results of applying the method to the GAW15 Problem 1 data indicate that most of the traits in the same cluster have the same regulators, and the representative trait measure, the first PC, can explain a large part of the total variation of all the traits in each cluster. Furthermore, considering one cluster of traits at a time may yield more linkage signals than considering traits individually
A computationally efficient clustering linear combination approach to jointly analyze multiple phenotypes for GWAS
There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure needs to be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared
Gene-Based Association Tests Using New Polygenic Risk Scores and Incorporating Gene Expression Data
Recently, gene-based association studies have shown that integrating genome-wide association studies (GWAS) with expression quantitative trait locus (eQTL) data can boost statistical power and that the genetic liability of traits can be captured by polygenic risk scores (PRSs). In this paper, we propose a new gene-based statistical method that leverages gene-expression measure-ments and new PRSs to identify genes that are associated with phenotypes of interest. We used a generalized linear model to associate phenotypes with gene expression and PRSs and used a score-test statistic to test the association between phenotypes and genes. Our simulation studies show that the newly developed method has correct type I error rates and can boost statistical power compared with other methods that use either gene expression or PRS in association tests. A real data analysis Figurebased on UK Biobank data for asthma shows that the proposed method is applicable to GWAS
A novel method for multiple phenotype association studies based on genotype and phenotype network
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy
A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics
There is strong evidence showing that joint analysis of multiple phenotypes in genome-wide association studies (GWAS) can increase statistical power when detecting the association between genetic variants and human complex diseases. We previously developed the Clustering Linear Combination (CLC) method and a computationally efficient CLC (ceCLC) method to test the association between multiple phenotypes and a genetic variant, which perform very well. However, both of these methods require individual-level genotypes and phenotypes that are often not easily accessible. In this research, we develop a novel method called sCLC for association studies of multiple phenotypes and a genetic variant based on GWAS summary statistics. We use the LD score regression to estimate the correlation matrix among phenotypes. The test statistic of sCLC is constructed by GWAS summary statistics and has an approximate Cauchy distribution. We perform a variety of simulation studies and compare sCLC with other commonly used methods for multiple phenotype association studies using GWAS summary statistics. Simulation results show that sCLC can control Type I error rates well and has the highest power in most scenarios. Moreover, we apply the newly developed method to the UK Biobank GWAS summary statistics from the XIII category with 70 related musculoskeletal system and connective tissue phenotypes. The results demonstrate that sCLC detects the most number of significant SNPs, and most of these identified SNPs can be matched to genes that have been reported in the GWAS catalog to be associated with those phenotypes. Furthermore, sCLC also identifies some novel signals that were missed by standard GWAS, which provide new insight into the potential genetic factors of the musculoskeletal system and connective tissue phenotypes
- …