1,516 research outputs found

    A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data

    Full text link
    Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore multiple levels of representations of genetic variants, learn their internal patterns involved in the disease development, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new framework referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the nine competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and nine other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the nine other statistics.Comment: 64 pages including 12 figure

    Benchmarking of univariate pleiotropy detection methods, with an application to epilepsy phenotypes

    Get PDF
    Over the past decades, various methods have been used to scan the human genome to identify genetic variations associated with diseases, in particular with common, complex disorders. One of such approaches is the genome-wide association study (GWAS), which compares genetic variation between affected and healthy individuals to find genomic variants in the DNA sequence associated with a trait. GWAS are usually conducted separately for individual traits, and the same single nucleotide polymorphisms (SNP)/loci are associated with different traits in independent studies 7-10. These findings buttress the knowledge that most complex traits are correlated and have shared genetic architecture, therefore, sharing the same heritable risk factors11. Knowledge of the genetic risk factors can directly or indirectly contribute to improvements in risk assessment, drug target development, and ultimately in providing effective therapies to the affected individuals. Pleiotropy is the phenomenon of a hereditary unit affecting more than one trait, and the earliest reported evidence was provided by Mendel when he noted that some set of features were always observed together in a plant. Although this example could have been purely due to linkage and could be regarded as spurious pleiotropy in recent times, it opened up more discussion and research into pleiotropy, which has since been an active area of research12. In this work, I focused on complex epilepsies and the overlap in the genetic factors impacting their phenotypes. Epilepsy is a brain disorder comprising monogenic and common/complex forms characterized by recurrent partial or generalized seizures. However, the extent to which genetic variants contribute to the disorder and how much of the genetic contribution is shared between the different phenotypes is not yet fully understood. This motivated this project, where I benchmarked available pleiotropy detection approaches to select the best performing method in terms of power and false-positive rate to detect true pleiotropy. Then, I applied the selected method to summary statistics of focal epilepsy (FE) and genetic generalized epilepsy (GGE), provided by the International League Against Epilepsy Consortium (ILAE) on complex epilepsies and the EPI25 collaborative, to identify shared genetic factors in both phenotypes of epilepsy. Identifying pleiotropic SNPs or genes is an active area of research with multiple proposed approaches, broadly categorized into univariate and multivariate methods. Multivariate approaches have the limitation that they require all phenotypes to be measured in the same individual and their corresponding genotype data provided, which is often not the case since GWAS are usually performed per specific trait. However, various consortia studying complex traits readily share the summary statistics (effect sizes and p-values) from genome-wide association studies, making it easier to apply univariate pleiotropy detection approaches that combine these statistics to identify SNPs or loci with a concordant or discordant direction of effects. Therefore, in this project, I first compared the relative power and false-positive rate (FPR) performance of five univariate pleiotropy detection approaches, classic meta-analysis, cFDR, PLACO, ASSET, and CPBayes (see section 6.1), through simulation studies. After that, I applied the best-performing method to the analysis of phenotypes of epilepsy using actual data. The data simulation procedure was performed in 3 steps. First, a population of 1 million individuals of European ancestry was simulated via resampling using the HAPGEN2 software13 and haplotypes of central Europeans from the 1000 genomes project14. In the second phase of the simulation, disease SNPs were randomly selected and used for the additive liability threshold model (ALTM)15 to simulate multifactorial disease phenotypes from the simulated genetic data. As expected, the performance of the methods varied in terms of power and false positive rate (FPR). The variability between the methods is higher for FPR, while most methods are comparable in terms of power, especially for larger sample sizes and RR. Although the classical meta-analysis is very powerful, it is also riddled with a very high false-positive rate, making it less suitable for identifying pleiotropic loci. While all the methods performed well in terms of power, the ASSET method gave a better trade-off between power and FPR for the different simulation approaches. Applying ASSET to the two phenotypes of epilepsy, GGE and FE, resulted in identifying a new putative locus 17q21.32 while replicating locus 2q24.3, previously reported by the ILAE consortium 16. Further, applying the ASSET method to summary statistics of larger samples of epilepsy phenotypes resulted in the identification of loci 2q24.3 and 9q21.13. These findings corroborate the result obtained by the ILAE consortium through mega and meta-analysis. Classical meta-analysis (MA) is not recommended for pleiotropy detection, based on the simulation study results. Though MA demonstrated good power to detect pleiotropy, it also recorded high FPR across all simulation scenarios. However, the ASSET method is highly recommended as it kept the FPR low while demonstrating good power to detect pleiotropy. This study also contributed three new pleiotropic loci (2q24.3, 17q21.32, and 9q21.13) to understanding the relationship of genetic variation with epilepsy phenotypes and the inter-relationship between these phenotypes. Although the locus 17q21.32 could not be replicated in the larger sample set, it is not necessarily a false positive discovery. The locus was genome-wide significant for GGE but marginally significant for FE, which confirmed the trend observed in the FE cases in the EPI25 collaborative dataset, where no genome-wide significance result was found. Therefore, replication in an independent sample is desirable. One limitation of using the univariate pleiotropy detection approaches as seen with the classical MA is that one trait with a very low P-value could drive the observed pleiotropic association. Also, methods like cFDR and PLACO could only accommodate two traits, though this was not a challenge in this project. Despite these limitations, the presented work established a benchmark of the relative performance of the assessed methods and could also guide researchers in related fields in their future work. This study also contributed to understanding the shared genetic factors between GGE and FE with the expectation that larger sample sizes will lead to more discoveries

    Effects of urban living environments on mental health in adults

    Get PDF
    Urban-living individuals are exposed to many environmental factors that may combine and interact to influence mental health. While individual factors of an urban environment have been investigated in isolation, no attempt has been made to model how complex, real-life exposure to living in the city relates to brain and mental health, and how this is moderated by genetic factors. Using the data of 156,075 participants from the UK Biobank, we carried out sparse canonical correlation analyses to investigate the relationships between urban environments and psychiatric symptoms. We found an environmental profile of social deprivation, air pollution, street network and urban land-use density that was positively correlated with an affective symptom group (r = 0.22, P perm < 0.001), mediated by brain volume differences consistent with reward processing, and moderated by genes enriched for stress response, including CRHR1, explaining 2.01% of the variance in brain volume differences. Protective factors such as greenness and generous destination accessibility were negatively correlated with an anxiety symptom group (r = 0.10, P perm < 0.001), mediated by brain regions necessary for emotion regulation and moderated by EXD3, explaining 1.65% of the variance. The third urban environmental profile was correlated with an emotional instability symptom group (r = 0.03, P perm < 0.001). Our findings suggest that different environmental profiles of urban living may influence specific psychiatric symptom groups through distinct neurobiological pathways

    The GTEx Consortium atlas of genetic regulatory effects across human tissues

    Get PDF

    The GTEx Consortium atlas of genetic regulatory effects across human tissues

    Get PDF

    Genetic Interactions Affect Lung Function in Patients with Systemic Sclerosis.

    Get PDF
    Scleroderma, or systemic sclerosis (SSc), is an autoimmune disease characterized by progressive fibrosis of the skin and internal organs. The most common cause of death in people with SSc is lung disease, but the pathogenesis of lung disease in SSc is insufficiently understood to devise specific treatment strategies. Developing targeted treatments requires not only the identification of molecular processes involved in SSc-associated lung disease, but also understanding of how these processes interact to drive pathology. One potentially powerful approach is to identify alleles that interact genetically to influence lung outcomes in patients with SSc. Analysis of interactions, rather than individual allele effects, has the potential to delineate molecular interactions that are important in SSc-related lung pathology. However, detecting genetic interactions, or epistasis, in human cohorts is challenging. Large numbers of variants with low minor allele frequencies, paired with heterogeneous disease presentation, reduce power to detect epistasis. Here we present an analysis that increases power to detect epistasis in human genome-wide association studies (GWAS). We tested for genetic interactions influencing lung function and autoantibody status in a cohort of 416 SSc patients. Using Matrix Epistasis to filter SNPs followed by the Combined Analysis of Pleiotropy and Epistasis (CAPE), we identified a network of interacting alleles influencing lung function in patients with SSc. In particular, we identified a three-gene network comprising WNT5A, RBMS3, and MSI2, which in combination influenced multiple pulmonary pathology measures. The associations of these genes with lung outcomes in SSc are novel and high-confidence. Furthermore, gene coexpression analysis suggested that the interactions we identified are tissue-specific, thus differentiating SSc-related pathogenic processes in lung from those in skin
    corecore