39 research outputs found

    InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

    Get PDF
    Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/

    Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq

    Get PDF
    BackgroundThe MHC and KIR loci are clinically relevant regions of the genome. Typing the sequence of these loci has a wide range of applications including organ transplantation, drug discovery, pharmacogenomics and furthering fundamental research in immune genetics. Rapid advances in biochemical and next-generation sequencing (NGS) technologies have enabled several strategies for precise genotyping and phasing of candidate HLA alleles. Nonetheless, as typing of candidate HLA alleles alone reveals limited aspects of the genetics of MHC region, it is insufficient for the comprehensive utility of the aforementioned applications. For this reason, we believe phasing the entire MHC and KIR locus onto a single locus-spanning haplotype can be a critical improvement for better understanding transplantation biology.ResultsGenerating long-range (>1 Mb) phase information is traditionally very challenging. As proximity-ligation based methods of DNA sequencing preserves chromosome-span phase information, we have utilized this principle to demonstrate its utility towards generating full-length phasing of MHC and KIR loci in human samples. We accurately (~99%) reconstruct the complete haplotypes for over 90% of sequence variants (coding and non-coding) within these two loci that collectively span 4-megabases.ConclusionsBy haplotyping a majority of coding and non-coding alleles at the MHC and KIR loci in a single assay, this method has the potential to assist transplantation matching and facilitate investigation of the genetic basis of human immunity and disease

    Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression.

    Get PDF
    While genetic variation at chromatin loops is relevant for human disease, the relationships between contact propensity (the probability that loci at loops physically interact), genetics, and gene regulation are unclear. We quantitatively interrogate these relationships by comparing Hi-C and molecular phenotype data across cell types and haplotypes. While chromatin loops consistently form across different cell types, they have subtle quantitative differences in contact frequency that are associated with larger changes in gene expression and H3K27ac. For the vast majority of loci with quantitative differences in contact frequency across haplotypes, the changes in magnitude are smaller than those across cell types; however, the proportional relationships between contact propensity, gene expression, and H3K27ac are consistent. These findings suggest that subtle changes in contact propensity have a biologically meaningful role in gene regulation and could be a mechanism by which regulatory genetic variants in loop anchors mediate effects on expression

    Analysis of 3D genome organization and gene regulation in mammalian cells

    No full text
    The three-dimensional structure of the genome plays a key role in gene regulation. For example, while highly compacted heterochromatin drives gene silencing, open euchromatin facilitates gene activation. Nevertheless, how chromatin folds within these structures and consequently how it controls access to genomic content is poorly understood. Recent advances in high-throughput sequencing have provided valuable tools, such as Hi-C, for the study of chromatin structure. Using Hi-C datasets, I developed a hidden markov based algorithm to identify self-interacting patterns of chromatin structure termed topological domains. These mega-base sized domains are pervasive throughout the genome and are highly conserved among humans and mouse. At a higher resolution, topological domains encompass individual chromatin interactions between regulatory elements and its target gene. Therefore, in order to mechanistically understand gene regulation, it is essential to elucidate the functional relationship among regulatory elements and their target genes. By exploiting the sequence diversity between homologous chromosomes, it is possible to delineate this relationship. However, this requires the knowledge of haplotypes, which has traditionally been difficult to obtain. As the Hi-C protocol preferentially recovers DNA variants on the same chromosome, I invented HaploSeq to reconstruct chromosome-scale haplotypes. HaploSeq can generate haplotypes with ̃99.5% accuracy for >95% of alleles in mouse and 98% accuracy for ̃81% of alleles in humans, thus solving a long-standing problem in genetics. By integrating the knowledge of haplotypes, we queried the relationship between regulatory elements and gene expression in human embryonic stem cells and a panel of differentiated cell-types. Across the 5 cell lineages examined, I identified a total of 24% of genes that showed allelic bias in gene expression. While most of the allelic -genes had a correlating allelic-promoter chromatin state, ̃29% of genes were exceptions suggesting other mechanisms of gene regulation. Accordingly, I then analyzed histone- acetylation marks to identify 1589 allelic enhancers. By predicting chromatin interactions using Hi-C, we observed allelic enhancers to be spatially proximal to allelic genes, suggesting cooperative activity among genome sequence, structure, and function. Taken together, our studies suggest that gene regulation is facilitated and coordinated by genome structur

    L'Écho : grand quotidien d'information du Centre Ouest

    No full text
    09 décembre 19401940/12/09 (A69)-1940/12/10.Appartient à l’ensemble documentaire : PoitouCh

    InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms

    No full text
    Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al. demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/

    Additional file 7: Figure S7. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq

    No full text
    Targeted HaploSeq generates high quality phasing of heterozygous genes. Over 92 % of exonic het. variants are phased at an accuracy of 99 %. (TIFF 8219 kb

    Additional file 6: Figure S6. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq

    No full text
    Targeted HaploSeq generates a single (complete) haplotype structure across MHC/KIR locus. The performance metric of the Targeted HaploSeq protocol, measured by completeness (span of the haplotype bloc), resolution (fraction of het. alleles resolved), and accuracy. While each of these metrics were defined after performing read-based as well as population based haplotyping, seed resolution is estimated only based on read-based haplotyping. The overall resolution is defined as the weighted average among all alleles accross the MHC and KIR loci together. We observe over 50 % decrease in error rate from 2.3 to 1.06 % after correcting for potential incorrect local haplotypes from parent-trio data. (TIFF 8219 kb

    Additional file 3: Figure S3. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq

    No full text
    Targeted HaploSeq data has large pool of long insert fragments. a) Insert-size distribution of targeted Haploseq (green) and b) HaploSeq (purple) in GM12878 LCLs. Both these datasets have similar amount of long-insert fragments which is critical for long range haplotyping. (TIFF 8219 kb
    corecore