10 research outputs found

    Genz and Mendell-Elston Estimation of the High-Dimensional Multivariate Normal Distribution

    Get PDF
    Statistical analysis of multinomial data in complex datasets often requires estimation of the multivariate normal (MVN) distribution for models in which the dimensionality can easily reach 10–1000 and higher. Few algorithms for estimating the MVN distribution can offer robust and efficient performance over such a range of dimensions. We report a simulation-based comparison of two algorithms for the MVN that are widely used in statistical genetic applications. The venerable Mendell- Elston approximation is fast but execution time increases rapidly with the number of dimensions, estimates are generally biased, and an error bound is lacking. The correlation between variables significantly affects absolute error but not overall execution time. The Monte Carlo-based approach described by Genz returns unbiased and error-bounded estimates, but execution time is more sensitive to the correlation between variables. For ultra-high-dimensional problems, however, the Genz algorithm exhibits better scale characteristics and greater time-weighted efficiency of estimation

    Contribution of Inbred Singletons to Variance Component Estimation of Heritability and Linkage

    Get PDF
    Objectives: An interesting consequence of consanguinity is that the inbred singleton becomes informative for genetic variance. We determine the contribution of an inbred singleton to variance component analysis of heritability and linkage. Methods: Statistical theory for the power of variance component analysis of quantitative traits is used to determine the expected contribution of an inbred singleton to likelihood-ratio tests of heritability and linkage. Results: In variance component models an inbred singleton contributes relatively little to a test of heritability, but can contribute substantively to a test of linkage. For small to moderate QTL effects and a level of inbreeding comparable to matings between first cousins (the preferred form of union in many human populations), an inbred singleton can carry nearly 25% the information of a non-inbred sibpair. In more highly inbred contexts available with experimental animal populations, nonhuman primate colonies, and some human subpopulations, the contribution of an inbred singleton relative to a sibpair can exceed 50%. Conclusions: Inbred individuals, even in isolation from other members of a sample, can contribute to variance component estimation and tests of heritability and linkage. Under certain conditions the informativeness of the inbred singleton can approach that of non-inbred sibpair

    Dopamine perturbation of gene co-expression networks reveals differential response in schizophrenia for translational machinery.

    Get PDF
    The dopaminergic hypothesis of schizophrenia (SZ) postulates that positive symptoms of SZ, in particular psychosis, are due to disturbed neurotransmission via the dopamine (DA) receptor D2 (DRD2). However, DA is a reactive molecule that yields various oxidative species, and thus has important non-receptor-mediated effects, with empirical evidence of cellular toxicity and neurodegeneration. Here we examine non-receptor-mediated effects of DA on gene co-expression networks and its potential role in SZ pathology. Transcriptomic profiles were measured by RNA-seq in B-cell transformed lymphoblastoid cell lines from 514 SZ cases and 690 controls, both before and after exposure to DA ex vivo (100 μM). Gene co-expression modules were identified using Weighted Gene Co-expression Network Analysis for both baseline and DA-stimulated conditions, with each module characterized for biological function and tested for association with SZ status and SNPs from a genome-wide panel. We identified seven co-expression modules under baseline, of which six were preserved in DA-stimulated data. One module shows significantly increased association with SZ after DA perturbation (baseline: P = 0.023; DA-stimulated: P = 7.8 × 10-5; ΔAIC = -10.5) and is highly enriched for genes related to ribosomal proteins and translation (FDR = 4 × 10-141), mitochondrial oxidative phosphorylation, and neurodegeneration. SNP association testing revealed tentative QTLs underlying module co-expression, notably at FASTKD2 (top P = 2.8 × 10-6), a gene involved in mitochondrial translation. These results substantiate the role of translational machinery in SZ pathogenesis, providing insights into a possible dopaminergic mechanism disrupting mitochondrial function, and demonstrates the utility of disease-relevant functional perturbation in the study of complex genetic etiologies

    Whole genome sequence data implicate RBFOX1 in epilepsy risk in baboons

    Get PDF
    Background: Baboons exhibit a genetic generalized epilepsy (GGE) that resembles juvenile myoclonic epilepsy and may represent a suitable genetic model for human epilepsy. The genetic underpinnings of epilepsy were investigated in a baboon colony at the Southwest National Primate Research Center (San Antonio, TX) through the analysis of whole-genome sequence (WGS) data. Methods: Baboon WGS data were obtained for 38 cases and 19 healthy controls from the NCBI Sequence Read Archive and, after standard QC filtering, two subsets of variants were examined: (1) 20,881 SNPs from baboon homologs of 19 candidate GGE genes; and (2) 36,169 protein-altering SNPs. Association tests were conducted in SOLAR, and gene set enrichment analyses (GSEA) and protein-protein interaction (PPI) network construction were performed on genome-wide significant association results (Pn= 441 genes). Results: Heritability for epileptic seizure in the pedigreed baboon sample was estimated at 0.76 (SE=0.77; P=0.07). A significant association was detected for an intronic SNP in RBFOX1 (P=5.92 × 10-6; adjusted P=0.016). For protein-altering variants, GSEA revealed significant positive enrichment for genes involved in the extracellular matrix structure (ECM; FDR=0.0072) and collagen formation (FDR=0.017). Conclusions: SNP association results implicate RBFOX1 in baboon epilepsy, a gene that plays a key role in neuronal excitation and transcriptomic regulation, and has been previously linked to human epilepsy, both focal and generalized. Moreover, protein-damaging variants from across the baboon genome exhibit a wider pattern of association that links collagen-containing ECM to epilepsy risk. These findings suggest a shared genetic etiology between baboon and human forms of GGE

    Dopamine perturbation of gene co-expression networks reveals differential response in schizophrenia for translational machinery

    Get PDF
    The dopaminergic hypothesis of schizophrenia (SZ) postulates that positive symptoms of SZ, in particular psychosis, are due to disturbed neurotransmission via the dopamine (DA) receptor D2 (DRD2). However, DA is a reactive molecule that yields various oxidative species, and thus has important non-receptor-mediated effects, with empirical evidence of cellular toxicity and neurodegeneration. Here we examine non-receptor-mediated effects of DA on gene co-expression networks and its potential role in SZ pathology. Transcriptomic profiles were measured by RNA-seq in B-cell transformed lymphoblastoid cell lines from 514 SZ cases and 690 controls, both before and after exposure to DA ex vivo (100 μM). Gene co-expression modules were identified using Weighted Gene Co-expression Network Analysis for both baseline and DA-stimulated conditions, with each module characterized for biological function and tested for association with SZ status and SNPs from a genome-wide panel. We identified seven co-expression modules under baseline, of which six were preserved in DA-stimulated data. One module shows significantly increased association with SZ after DA perturbation (baseline: P = 0.023; DA-stimulated: P = 7.8 × 10-5; ΔAIC = −10.5) and is highly enriched for genes related to ribosomal proteins and translation (FDR = 4 × 10−141), mitochondrial oxidative phosphorylation, and neurodegeneration. SNP association testing revealed tentative QTLs underlying module co-expression, notably at FASTKD2 (top P = 2.8 × 10−6), a gene involved in mitochondrial translation. These results substantiate the role of translational machinery in SZ pathogenesis, providing insights into a possible dopaminergic mechanism disrupting mitochondrial function, and demonstrates the utility of disease-relevant functional perturbation in the study of complex genetic etiologies

    Whole Genome Sequence Data From Captive Baboons Implicate RBFOX1 in Epileptic Seizure Risk

    Get PDF
    In this study, we investigate the genetic determinants that underlie epilepsy in a captive baboon pedigree and evaluate the potential suitability of this non-human primate model for understanding the genetic etiology of human epilepsy. Archived whole-genome sequence data were analyzed using both a candidate gene approach that targeted variants in baboon homologs of 19 genes (n = 20,881 SNPs) previously implicated in genetic generalized epilepsy (GGE) and a more agnostic approach that examined protein-altering mutations genome-wide as assessed by snpEff (n = 36,169). Measured genotype association tests for baboon cases of epileptic seizure were performed using SOLAR, as well as gene set enrichment analyses (GSEA) and protein–protein interaction (PPI) network construction of top association hits genome-wide (p \u3c 0.01; n = 441 genes). The maximum likelihood estimate of heritability for epileptic seizure in the pedigreed baboon sample is 0.76 (SE = 0.77; p = 0.07). Among candidate genes for GGE, a significant association was detected for an intronic SNP in RBFOX1 (p = 5.92 × 10–6; adjusted p = 0.016). For protein-altering variants, no genome-wide significant results were observed for epilepsy status. However, GSEA revealed significant positive enrichment for genes involved in the extracellular matrix structure (ECM; FDR = 0.0072) and collagen formation (FDR = 0.017), which was reflected in a major PPI network cluster. This preliminary study highlights the potential role of RBFOX1 in the epileptic baboon, a protein involved in transcriptomic regulation of multiple epilepsy candidate genes in humans and itself previously implicated in human epilepsy, both focal and generalized. Moreover, protein-damaging variants from across the genome exhibit a pattern of association that links collagen-containing ECM to epilepsy risk. These findings suggest a shared genetic etiology between baboon and human forms of GGE and lay the foundation for follow-up research

    Independent test assessment using the extreme value distribution theory

    Get PDF
    The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies

    Exome sequences of multiplex, multigenerational families reveal schizophrenia risk loci with potential implications for neurocognitive performance

    Get PDF
    Schizophrenia is a serious mental illness, involving disruptions in thought and behavior, with a worldwide prevalence of about one percent. Although highly heritable, much of the genetic liability of schizophrenia is yet to be explained. We searched for susceptibility loci in multiplex, multigenerational families affected by schizophrenia, targeting protein-altering variation with in silico predicted functional effects. Exome sequencing was performed on 136 samples from eight European-American families, including 23 individuals diagnosed with schizophrenia or schizoaffective disorder. In total, 11,878 non-synonymous variants from 6,396 genes were tested for their association with schizophrenia spectrum disorders. Pathway enrichment analyses were conducted on gene-based test results, protein-protein interaction (PPI) networks, and epistatic effects. Using a significance threshold of FDR\u3c0.1, association was detected for rs10941112 (P=2.1×10−5; q-value=0.073) in AMACR, a gene involved in fatty acid metabolism and previously implicated in schizophrenia, with significant cis effects on gene expression (P=5.5×10−4), including brain tissue data from the Genotype-Tissue Expression project (minimum P=6.0×10−5). A second SNP, rs10378 located in TMEM176A, also shows risk effects in the exome data (P=2.8×10−5; q-value=0.073). Protein-protein interactions among our top gene-based association results (P\u3c0.05; n=359 genes) reveal significant enrichment of genes involved in NCAM-mediated neurite outgrowth (P=3.0×10−5), while exome-wide SNP-SNP interaction effects for rs10941112 and rs10378 indicate a potential role for kinase-mediated signaling involved in memory and learning. In conclusion, these association results implicate AMACR and TMEM176A in schizophrenia risk, whose effects may be modulated by genes involved in synaptic plasticity and neurocognitive performance

    Independent test assessment using the extreme value distribution theory

    Full text link
    Abstract The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.http://deepblue.lib.umich.edu/bitstream/2027.42/134747/1/12919_2016_Article_38.pd

    Genotype phasing in pedigrees using whole-genome sequence data

    No full text
    Phasing is the process of inferring haplotypes from genotype data. Efficient algorithms and associated software for accurate phasing in pedigrees are needed, especially for populations lacking reference panels of sequenced individuals. We present a novel method for phasing genotypes from whole-genome sequence data in pedigrees, called PULSAR (Phasing Using Lineage Specific Alleles/Rare variants). The method is based on the property that alleles specific to a single founding chromosome within a pedigree are highly informative for identifying haplotypes that are shared identical by descent. Simulation studies are used to assess the performance of PULSAR with various pedigree sizes and structures, and the effect of genotyping errors and the presence of nonsequenced individuals is investigated. In pedigrees with complete sequencing and realistic genotyping error rates, PULSAR correctly phases \u3e99.9% of heterozygous genotypes, excluding sites at which all individuals are heterozygous, and does so with a switch error rate frequently below 10−4. PULSAR is highly accurate, capable of genotype error correction and imputation, and computationally competitive with alternative phasing software applicable to pedigrees. Our method has the significant advantage of not requiring reference panels that are essential for other population-based phasing algorithms. A software implementation of PULSAR is freely available
    corecore