39 research outputs found
InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.
Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/
Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq
BackgroundThe MHC and KIR loci are clinically relevant regions of the genome. Typing the sequence of these loci has a wide range of applications including organ transplantation, drug discovery, pharmacogenomics and furthering fundamental research in immune genetics. Rapid advances in biochemical and next-generation sequencing (NGS) technologies have enabled several strategies for precise genotyping and phasing of candidate HLA alleles. Nonetheless, as typing of candidate HLA alleles alone reveals limited aspects of the genetics of MHC region, it is insufficient for the comprehensive utility of the aforementioned applications. For this reason, we believe phasing the entire MHC and KIR locus onto a single locus-spanning haplotype can be a critical improvement for better understanding transplantation biology.ResultsGenerating long-range (>1 Mb) phase information is traditionally very challenging. As proximity-ligation based methods of DNA sequencing preserves chromosome-span phase information, we have utilized this principle to demonstrate its utility towards generating full-length phasing of MHC and KIR loci in human samples. We accurately (~99%) reconstruct the complete haplotypes for over 90% of sequence variants (coding and non-coding) within these two loci that collectively span 4-megabases.ConclusionsBy haplotyping a majority of coding and non-coding alleles at the MHC and KIR loci in a single assay, this method has the potential to assist transplantation matching and facilitate investigation of the genetic basis of human immunity and disease
Recommended from our members
Bayesian Inference of Spatial Organizations of Chromosomes
Knowledge of spatial chromosomal organizations is critical for the study of transcriptional regulation and other nuclear processes in the cell. Recently, chromosome conformation capture (3C) based technologies, such as Hi-C and TCC, have been developed to provide a genome-wide, three-dimensional (3D) view of chromatin organization. Appropriate methods for analyzing these data and fully characterizing the 3D chromosomal structure and its structural variations are still under development. Here we describe a novel Bayesian probabilistic approach, denoted as “Bayesian 3D constructor for Hi-C data” (BACH), to infer the consensus 3D chromosomal structure. In addition, we describe a variant algorithm BACH-MIX to study the structural variations of chromatin in a cell population. Applying BACH and BACH-MIX to a high resolution Hi-C dataset generated from mouse embryonic stem cells, we found that most local genomic regions exhibit homogeneous 3D chromosomal structures. We further constructed a model for the spatial arrangement of chromatin, which reveals structural properties associated with euchromatic and heterochromatic regions in the genome. We observed strong associations between structural properties and several genomic and epigenetic features of the chromosome. Using BACH-MIX, we further found that the structural variations of chromatin are correlated with these genomic and epigenetic features. Our results demonstrate that BACH and BACH-MIX have the potential to provide new insights into the chromosomal architecture of mammalian cells.Statistic
Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression.
While genetic variation at chromatin loops is relevant for human disease, the relationships between contact propensity (the probability that loci at loops physically interact), genetics, and gene regulation are unclear. We quantitatively interrogate these relationships by comparing Hi-C and molecular phenotype data across cell types and haplotypes. While chromatin loops consistently form across different cell types, they have subtle quantitative differences in contact frequency that are associated with larger changes in gene expression and H3K27ac. For the vast majority of loci with quantitative differences in contact frequency across haplotypes, the changes in magnitude are smaller than those across cell types; however, the proportional relationships between contact propensity, gene expression, and H3K27ac are consistent. These findings suggest that subtle changes in contact propensity have a biologically meaningful role in gene regulation and could be a mechanism by which regulatory genetic variants in loop anchors mediate effects on expression
Analysis of 3D genome organization and gene regulation in mammalian cells
The three-dimensional structure of the genome plays a key role in gene regulation. For example, while highly compacted heterochromatin drives gene silencing, open euchromatin facilitates gene activation. Nevertheless, how chromatin folds within these structures and consequently how it controls access to genomic content is poorly understood. Recent advances in high-throughput sequencing have provided valuable tools, such as Hi-C, for the study of chromatin structure. Using Hi-C datasets, I developed a hidden markov based algorithm to identify self-interacting patterns of chromatin structure termed topological domains. These mega-base sized domains are pervasive throughout the genome and are highly conserved among humans and mouse. At a higher resolution, topological domains encompass individual chromatin interactions between regulatory elements and its target gene. Therefore, in order to mechanistically understand gene regulation, it is essential to elucidate the functional relationship among regulatory elements and their target genes. By exploiting the sequence diversity between homologous chromosomes, it is possible to delineate this relationship. However, this requires the knowledge of haplotypes, which has traditionally been difficult to obtain. As the Hi-C protocol preferentially recovers DNA variants on the same chromosome, I invented HaploSeq to reconstruct chromosome-scale haplotypes. HaploSeq can generate haplotypes with ̃99.5% accuracy for >95% of alleles in mouse and 98% accuracy for ̃81% of alleles in humans, thus solving a long-standing problem in genetics. By integrating the knowledge of haplotypes, we queried the relationship between regulatory elements and gene expression in human embryonic stem cells and a panel of differentiated cell-types. Across the 5 cell lineages examined, I identified a total of 24% of genes that showed allelic bias in gene expression. While most of the allelic -genes had a correlating allelic-promoter chromatin state, ̃29% of genes were exceptions suggesting other mechanisms of gene regulation. Accordingly, I then analyzed histone- acetylation marks to identify 1589 allelic enhancers. By predicting chromatin interactions using Hi-C, we observed allelic enhancers to be spatially proximal to allelic genes, suggesting cooperative activity among genome sequence, structure, and function. Taken together, our studies suggest that gene regulation is facilitated and coordinated by genome structur
L'Écho : grand quotidien d'information du Centre Ouest
09 décembre 19401940/12/09 (A69)-1940/12/10.Appartient à l’ensemble documentaire : PoitouCh
InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms
Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al. demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/
Additional file 7: Figure S7. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq
Targeted HaploSeq generates high quality phasing of heterozygous genes. Over 92Â % of exonic het. variants are phased at an accuracy of 99Â %. (TIFF 8219 kb
Additional file 6: Figure S6. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq
Targeted HaploSeq generates a single (complete) haplotype structure across MHC/KIR locus. The performance metric of the Targeted HaploSeq protocol, measured by completeness (span of the haplotype bloc), resolution (fraction of het. alleles resolved), and accuracy. While each of these metrics were defined after performing read-based as well as population based haplotyping, seed resolution is estimated only based on read-based haplotyping. The overall resolution is defined as the weighted average among all alleles accross the MHC and KIR loci together. We observe over 50Â % decrease in error rate from 2.3 to 1.06Â % after correcting for potential incorrect local haplotypes from parent-trio data. (TIFF 8219 kb
Additional file 3: Figure S3. of Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq
Targeted HaploSeq data has large pool of long insert fragments. a) Insert-size distribution of targeted Haploseq (green) and b) HaploSeq (purple) in GM12878 LCLs. Both these datasets have similar amount of long-insert fragments which is critical for long range haplotyping. (TIFF 8219 kb