496 research outputs found

    Modeling dependency structures in 450k DNA methylation data

    Get PDF
    Motivation: DNA methylation has been shown to be spatially dependent across chromosomes. Previous studies have focused on the influence of genomic context on the dependency structure, while not considering differences in dependency structure between individuals. Results: We modeled spatial dependency with a flexible framework to quantify the dependency structure, focusing on inter-individual differences by exploring the association between dependency parameters and technical and biological variables. The model was applied to a subset of the Finnish Twin Cohort study (N = 1611 individuals). The estimates of the dependency parameters varied considerably across individuals, but were generally consistent across chromosomes within individuals. The variation in dependency parameters was associated with bisulfite conversion plate, zygosity, sex and age. The age differences presumably reflect accumulated environmental exposures and/or accumulated small methylation differences caused by stochastic mitotic events, establishing recognizable, individual patterns more strongly seen in older individuals.Peer reviewe

    Identification of copy number variants from exome sequence data

    Get PDF
    Background With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array. Results We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate. Conclusions In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments

    Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes

    Get PDF
    Inter-individual differences in gene expression are likely to account for an important fraction of phenotypic differences, including susceptibility to common disorders. Recent studies have shown extensive variation in gene expression levels in humans and other organisms, and that a fraction of this variation is under genetic control. We investigated the patterns of gene expression variation in a 25 Mb region of human chromosome 21, which has been associated with many Down syndrome (DS) phenotypes. Taqman real-time PCR was used to measure expression variation of 41 genes in lymphoblastoid cells of 40 unrelated individuals. For 25 genes found to be differentially expressed, additional analysis was performed in 10 CEPH families to determine heritabilities and map loci harboring regulatory variation. Seventy-six percent of the differentially expressed genes had significant heritabilities, and genomewide linkage analysis led to the identification of significant eQTLs for nine genes. Most eQTLs were in trans, with the best result (P=7.46×10−8) obtained for TMEM1 on chromosome 12q24.33. A cis-eQTL identified for CCT8 was validated by performing an association study in 60 individuals from the HapMap project. SNP rs965951 located within CCT8 was found to be significantly associated with its expression levels (P=2.5×10−5) confirming cis-regulatory variation. The results of our study provide a representative view of expression variation of chromosome 21 genes, identify loci involved in their regulation and suggest that genes, for which expression differences are significantly larger than 1.5-fold in control samples, are unlikely to be involved in DS-phenotypes present in all affected individual

    Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

    Full text link
    A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes

    Association of protein function-altering variants with cardiometabolic traits:the strong heart study

    Get PDF
    Clinical and biomarker phenotypic associations for carriers of protein function-altering variants may help to elucidate gene function and health effects in populations. We genotyped 1127 Strong Heart Family Study participants for protein function-altering single nucleotide variants (SNV) and indels selected from a low coverage whole exome sequencing of American Indians. We tested the association of each SNV/indel with 35 cardiometabolic traits. Among 1206 variants (average minor allele count = 20, range of 1 to 1064), similar to 43% were not present in publicly available repositories. We identified seven SNV-trait significant associations including a missense SNV at ABCA10 (rs779392624, p= 8 x 10(-9)) associated with fasting triglycerides, which gene product is involved in macrophage lipid homeostasis. Among non-diabetic individuals, missense SNVs at four genes were associated with fasting insulin adjusted for BMI (PHIL, chr6:79,650,711, p= 2.1 x 10(-6); TRPM3, rs760461668, p= 5 x10(-8); SPTY2D1, rs756851199, p= 1.6 x 10(-8); and TSPO, rs566547284, p= 2.4 x 10(-6)). PHIL encoded protein is involved in pancreatic beta-cell proliferation and survival, and TRPM3 protein mediates calcium signaling in pancreatic beta-cells in response to glucose. A genetic risk score combining increasing insulin risk alleles of these four genes was associated with 53% (95% confidence interval 1.09, 2.15) increased odds of incident diabetes and 83% (95% confidence interval 1.35, 2.48) increased odds of impaired fasting glucose at follow-up. Our study uncovered novel gene-trait associations through the study of protein-coding variants and demonstrates the advantages of association screenings targeting diverse and high-risk populations to study variants absent in publicly available repositories

    Genome-wide associations of gene expression variation in humans

    Get PDF
    The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level
    corecore