335 research outputs found
Network inference in matrix-variate Gaussian models with non-independent noise
Inferring a graphical model or network from observational data from a large
number of variables is a well studied problem in machine learning and
computational statistics. In this paper we consider a version of this problem
that is relevant to the analysis of multiple phenotypes collected in genetic
studies. In such datasets we expect correlations between phenotypes and between
individuals. We model observations as a sum of two matrix normal variates such
that the joint covariance function is a sum of Kronecker products. This model,
which generalizes the Graphical Lasso, assumes observations are correlated due
to known genetic relationships and corrupted with non-independent noise. We
have developed a computationally efficient EM algorithm to fit this model. On
simulated datasets we illustrate substantially improved performance in network
reconstruction by allowing for a general noise distribution
Multicohort analysis of the maternal age effect on recombination
Several studies have reported that the number of crossovers increases with maternal age in humans, but others have found the opposite. Resolving the true effect has implications for understanding the maternal age effect on aneuploidies. Here, we revisit this question in the largest sample to date using single nucleotide polymorphism (SNP)-chip data, comprising over 6,000 meioses from nine cohorts. We develop and fit a hierarchical model to allow for differences between cohorts and between mothers. We estimate that over 10 years, the expected number of maternal crossovers increases by 2.1% (95% credible interval (0.98%, 3.3%)). Our results are not consistent with the larger positive and negative effects previously reported in smaller cohorts. We see heterogeneity between cohorts that is likely due to chance effects in smaller samples, or possibly to confounders, emphasizing that care should be taken when interpreting results from any specific cohort about the effect of maternal age on recombination
High throughput analysis of epistasis in genome-wide association studies with BiForce
Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS. Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits. Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Two-stage two-locus models in genome-wide association.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance
An imputation platform to enhance integration of rice genetic resources
As sequencing and genotyping technologies evolve, crop genetics researchers accumulate increasing numbers of genomic data sets from various genotyping platforms on different germplasm panels. Imputation is an effective approach to increase marker density of existing data sets toward the goal of integrating resources for downstream applications. While a number of imputation software packages are available, the limitations to utilization for the rice community include high computational demand and lack of a reference panel. To address these challenges, we develop the Rice Imputation Server, a publicly available web application leveraging genetic information from a globally diverse rice reference panel assembled here. This resource allows researchers to benefit from increased marker density without needing to perform imputation on their own machines. We demonstrate improvements that imputed data provide to rice genome-wide association (GWA) results of grain amylose content and show that the major functional nucleotide polymorphism is tagged only in the imputed data set
A Genomics England haplotype reference panel and imputation of UK Biobank
We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r2 = 0.75 for variants with minor allele frequencies as low as 2 × 10−4 in white British samples. The GEL-imputed UK Biobank genome-wide association analysis identified 70% of associations found by direct exome sequencing (P < 2.18 × 10−11), while extending testing of rare variants to the entire genome. Coding variants dominated the rare-variant genome-wide association results, implying less disruptive effects of rare non-coding variants
Rare deleterious coding variants in CHRNB3 from diverse ancestries confer protection from heavy smoking
Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes.
Although several lung cancer susceptibility loci have been identified, much of the heritability for lung cancer remains unexplained. Here 14,803 cases and 12,262 controls of European descent were genotyped on the OncoArray and combined with existing data for an aggregated genome-wide association study (GWAS) analysis of lung cancer in 29,266 cases and 56,450 controls. We identified 18 susceptibility loci achieving genome-wide significance, including 10 new loci. The new loci highlight the striking heterogeneity in genetic susceptibility across the histological subtypes of lung cancer, with four loci associated with lung cancer overall and six loci associated with lung adenocarcinoma. Gene expression quantitative trait locus (eQTL) analysis in 1,425 normal lung tissue samples highlights RNASET2, SECISBP2L and NRG1 as candidate genes. Other loci include genes such as a cholinergic nicotinic receptor, CHRNA2, and the telomere-related genes OFBC1 and RTEL1. Further exploration of the target genes will continue to provide new insights into the etiology of lung cancer
- …
