76 research outputs found
Replication of alpha-satellite DNA arrays in endogenous human centromeric regions and in human artificial chromosome
In human chromosomes, centromeric regions comprise megabase-size arrays of 171 bp alpha-satellite DNA monomers. The large distances spanned by these arrays preclude their replication from external sites and imply that the repetitive monomers contain replication origins. However, replication within these arrays has not previously been profiled and the role of alpha-satellite DNA in initiation of DNA replication has not yet been demonstrated. Here, replication of alpha-satellite DNA in endogenous human centromeric regions and in de novo formed Human Artificial Chromosome (HAC) was analyzed. We showed that alpha-satellite monomers could function as origins of DNA replication and that replication of alphoid arrays organized into centrochromatin occurred earlier than those organized into heterochromatin. The distribution of inter-origin distances within centromeric alphoid arrays was comparable to the distribution of inter-origin distances on randomly selected non-centromeric chromosomal regions. Depletion of CENP-B, a kinetochore protein that binds directly to a 17 bp CENP-B box motif common to alpha-satellite DNA, resulted in enrichment of alpha-satellite sequences for proteins of the ORC complex, suggesting that CENP-B may have a role in regulating the replication of centromeric regions. Mapping of replication initiation sites in the HAC revealed that replication preferentially initiated in transcriptionally active regions
Nanopore sequencing and assembly of a human genome with ultra-long reads
We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ~30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ~3 Mb). Next, we developed a protocol to generate ultra-long reads (N50 > 100kb, up to 882 kb). Incorporating an additional 5×-coverage of these data more than doubled the assembly contiguity (NG50 ~6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4 Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length and closure of gaps in the reference human genome assembly GRCh38
Recommended from our members
Centromere studies in the era of ‘telomere-to-telomere’ genomics
We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from 'telomere-to-telomere' (T2T). This technological advance offers a new opportunity to include endogenous human centromeric regions in high-resolution, sequence-based studies. These emerging reference maps are expected to reveal a new functional landscape in the human genome, where centromere proteins, transcriptional regulation, and spatial organization can be examined with base-level resolution across different stages of development and disease. Such studies will depend on innovative assembly methods of extremely long tandem repeats (ETRs), or satellite DNAs, paired with the development of new, orthogonal validation methods to ensure accuracy and completeness. This review reflects the progress in centromere genomics, credited by recent advancements in long-read sequencing and assembly methods. In doing so, I will discuss the challenges that remain and the promise for a new period of scientific discovery for satellite DNA biology and centromere function
Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population
The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease
Recommended from our members
Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans
- …