172 research outputs found

    A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability.</p> <p>Results</p> <p>V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap.</p> <p>Conclusions</p> <p>V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.</p

    Mouse obesity network reconstruction with a variational Bayes algorithm to employ aggressive false positive control

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We propose a novel variational Bayes network reconstruction algorithm to extract the most relevant disease factors from high-throughput genomic data-sets. Our algorithm is the only scalable method for regularized network recovery that employs Bayesian model averaging and that can internally estimate an appropriate level of sparsity to ensure few false positives enter the model without the need for cross-validation or a model selection criterion. We use our algorithm to characterize the effect of genetic markers and liver gene expression traits on mouse obesity related phenotypes, including weight, cholesterol, glucose, and free fatty acid levels, in an experiment previously used for discovery and validation of network connections: an F2 intercross between the C57BL/6 J and C3H/HeJ mouse strains, where apolipoprotein E is null on the background.</p> <p>Results</p> <p>We identified eleven genes, Gch1, Zfp69, Dlgap1, Gna14, Yy1, Gabarapl1, Folr2, Fdft1, Cnr2, Slc24a3, and Ccl19, and a quantitative trait locus directly connected to weight, glucose, cholesterol, or free fatty acid levels in our network. None of these genes were identified by other network analyses of this mouse intercross data-set, but all have been previously associated with obesity or related pathologies in independent studies. In addition, through both simulations and data analysis we demonstrate that our algorithm achieves superior performance in terms of power and type I error control than other network recovery algorithms that use the lasso and have bounds on type I error control.</p> <p>Conclusions</p> <p>Our final network contains 118 previously associated and novel genes affecting weight, cholesterol, glucose, and free fatty acid levels that are excellent obesity risk candidates.</p

    Coordinated evolution of co-expressed gene clusters in the Drosophila transcriptome

    Get PDF
    Abstract Background Co-expression of genes that physically cluster together is a common characteristic of eukaryotic transcriptomes. This organization of transcriptomes suggests that coordinated evolution of gene expression for clustered genes may also be common. Clusters where expression evolution of each gene is not independent of their neighbors are important units for understanding transcriptome evolution. Results We used a common microarray platform to measure gene expression in seven closely related species in the Drosophila melanogaster subgroup, accounting for confounding effects of sequence divergence. To summarize the correlation structure among genes in a chromosomal region, we analyzed the fraction of variation along the first principal component of the correlation matrix. We analyzed the correlation for blocks of consecutive genes to assess patterns of correlation that may be manifest at different scales of coordinated expression. We find that expression of physically clustered genes does evolve in a coordinated manner in many locations throughout the genome. Our analysis shows that relatively few of these clusters are near heterochromatin regions and that these clusters tend to be over-dispersed relative to the rest of the genome. This suggests that these clusters are not the byproduct of local gene clustering. We also analyzed the pattern of co-expression among neighboring genes within a single Drosophila species: D. simulans. For the co-expression clusters identified within this species, we find an under-representation of genes displaying a signature of recurrent adaptive amino acid evolution consistent with previous findings. However, clusters displaying co-evolution of expression among species are enriched for adaptively evolving genes. This finding points to a tie between adaptive sequence evolution and evolution of the transcriptome. Conclusion Our results demonstrate that co-evolution of expression in gene clusters is relatively common among species in the D. melanogaster subgroup. We consider the possibility that local regulation of expression in gene clusters may drive the connection between adaptive sequence and coordinated gene expression evolution

    Adaptive Gene Expression Divergence Inferred from Population Genomics

    Get PDF
    Detailed studies of individual genes have shown that gene expression divergence often results from adaptive evolution of regulatory sequence. Genome-wide analyses, however, have yet to unite patterns of gene expression with polymorphism and divergence to infer population genetic mechanisms underlying expression evolution. Here, we combined genomic expression data—analyzed in a phylogenetic context—with whole genome light-shotgun sequence data from six Drosophila simulans lines and reference sequences from D. melanogaster and D. yakuba. These data allowed us to use molecular population genetics to test for neutral versus adaptive gene expression divergence on a genomic scale. We identified recent and recurrent adaptive evolution along the D. simulans lineage by contrasting sequence polymorphism within D. simulans to divergence from D. melanogaster and D. yakuba. Genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3′ flanking and amino acid sequence. Concomitantly, these genes are also decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. Interestingly, adaptive evolution in 5′ cis-regulatory regions did not correspond strongly with expression evolution. Our results provide a genomic view of the intimate link between selection acting on a phenotype and associated genic evolution

    PCAdmix: Principal Components-Based Assignment of Ancestry along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations

    Get PDF
    Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home), a Principal Componentsbased algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping

    Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives

    Get PDF
    Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92–99%) when detecting first- and second-degree relationships, but their accuracy dwindles to \u3c43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for \u3e76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance

    IQCB1 and PDE6B Mutations Cause Similar Early Onset Retinal Degenerations in Two Closely Related Terrier Dog Breeds

    Get PDF
    Purpose.: To identify the causative mutations in two early-onset canine retinal degenerations, crd1 and crd2, segregating in the American Staffordshire terrier and the Pit Bull Terrier breeds, respectively. Methods.: Retinal morphology of crd1- and crd2-affected dogs was evaluated by light microscopy. DNA was extracted from affected and related unaffected controls. Association analysis was undertaken using the Illumina Canine SNP array and PLINK (crd1 study), or the Affymetrix Version 2 Canine array, the “MAGIC” genotype algorithm, and Fisher\u27s Exact test for association (crd2 study). Positional candidate genes were evaluated for each disease. Results.: Structural photoreceptor abnormalities were observed in crd1-affected dogs as young as 11-weeks old. Rod and cone inner segment (IS) and outer segments (OS) were abnormal in size, shape, and number. In crd2-affected dogs, rod and cone IS and OS were abnormal as early as 3 weeks of age, progressing with age to severe loss of the OS, and thinning of the outer nuclear layer (ONL) by 12 weeks of age. Genome-wide association study (GWAS) identified association at the telomeric end of CFA3 in crd1-affected dogs and on CFA33 in crd2-affected dogs. Candidate gene evaluation identified a three bases deletion in exon 21 of PDE6B in crd1-affected dogs, and a cytosine insertion in exon 10 of IQCB1 in crd2-affected dogs. Conclusions.: Identification of the mutations responsible for these two early-onset retinal degenerations provides new large animal models for comparative disease studies and evaluation of potential therapeutic approaches for the homologous human diseases

    RNA-Seq quantification of the human small airway epithelium transcriptome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The small airway epithelium (SAE), the cell population that covers the human airway surface from the 6<sup>th </sup>generation of airway branching to the alveoli, is the major site of lung disease caused by smoking. The focus of this study is to provide quantitative assessment of the SAE transcriptome in the resting state and in response to chronic cigarette smoking using massive parallel mRNA sequencing (RNA-Seq).</p> <p>Results</p> <p>The data demonstrate that 48% of SAE expressed genes are ubiquitous, shared with many tissues, with 52% enriched in this cell population. The most highly expressed gene, SCGB1A1, is characteristic of Clara cells, the cell type unique to the human SAE. Among other genes expressed by the SAE are those related to Clara cell differentiation, secretory mucosal defense, and mucociliary differentiation. The high sensitivity of RNA-Seq permitted quantification of gene expression related to infrequent cell populations such as neuroendocrine cells and epithelial stem/progenitor cells. Quantification of the absolute smoking-induced changes in SAE gene expression revealed that, compared to ubiquitous genes, more SAE-enriched genes responded to smoking with up-regulation, and those with the highest basal expression levels showed most dramatic changes. Smoking had no effect on SAE gene splicing, but was associated with a shift in molecular pattern from Clara cell-associated towards the mucus-secreting cell differentiation pathway with multiple features of cancer-associated molecular phenotype.</p> <p>Conclusions</p> <p>These observations provide insights into the unique biology of human SAE by providing quantit-ative assessment of the global transcriptome under physiological conditions and in response to the stress of chronic cigarette smoking.</p
    corecore