Search CORE

38 research outputs found

Increased mutation and gene conversion within human segmental duplications

Author: DeWitt William S.
Dishuck Philip C.
Guitart Xavi
Harvey William T.
Marco Sola Santiago
Vollger Mitchell R.
Publication venue: Nature Research
Publication date: 01/01/2023
Field of study

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.We thank T. Brown for help in editing this manuscript, P. Green for valuable suggestions, and R. Seroussi and his staff for their generous donation of time and resources. This work was supported in part by grants from the US National Institutes of Health (NIH 5R01HG002385, 5U01HG010971 and 1U01HG010973 to E.E.E.; K99HG011041 to P.H.; and F31AI150163 to W.S.D.). W.S.D. was supported in part by a Fellowship in Understanding Dynamic and Multi-scale Systems from the James S. McDonnell Foundation. E.E.E. is an investigator of the Howard Hughes Medical Institute (HHMI). This article is subject to HHMI’s Open Access to Publications policy. HHMI laboratory heads have previously granted a nonexclusive CC BY 4.0 licence to the public and a sublicensable licence to HHMI in their research articles. Pursuant to those licences, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 licence immediately on publication.Peer Reviewed"Article signat per 19 autors/es: Mitchell R. Vollger, Philip C. Dishuck, William T. Harvey, William S. DeWitt, Xavi Guitart, Michael E. Goldberg, Allison N. Rozanski, Julian Lucas, Mobin Asri, Human Pangenome Reference Consortium, Katherine M. Munson, Alexandra P. Lewis, Kendra Hoekzema, Glennis A. Logsdon, David Porubsky, Benedict Paten, Kelley Harris, PingHsun Hsieh & Evan E. Eichler"Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Genomic inversions and GOLGA core duplicons underlie disease instability at the 15q25 locus.

Author: Accadia Maria
Antonacci Francesca
Cantsilieris Stuart
Carella Massimo
Coe Bradley P
D\u27Addabbo Pietro
Dumont Beth L
Eichler Evan E
Maggiolini Flavia A M
Manganelli Michele
Palumbo Orazio
Palumbo Pietro
Pang Andy Wing Chun
Sanders Ashley D
Vollger Mitchell R
Publication venue: The Mouseion at the JAXlibrary
Publication date: 27/03/2019
Field of study

Human chromosome 15q25 is involved in several disease-associated structural rearrangements, including microdeletions and chromosomal markers with inverted duplications. Using comparative fluorescence in situ hybridization, strand-sequencing, single-molecule, real-time sequencing and Bionano optical mapping analyses, we investigated the organization of the 15q25 region in human and nonhuman primates. We found that two independent inversions occurred in this region after the fission event that gave rise to phylogenetic chromosomes XIV and XV in humans and great apes. One of these inversions is still polymorphic in the human population today and may confer differential susceptibility to 15q25 microdeletions and inverted duplications. The inversion breakpoints map within segmental duplications containing core duplicons of the GOLGA gene family and correspond to the site of an ancestral centromere, which became inactivated about 25 million years ago. The inactivation of this centromere likely released segmental duplications from recombination repression typical of centromeric regions. We hypothesize that this increased the frequency of ectopic recombination creating a hotspot of hominid inversions where dispersed GOLGA core elements now predispose this region to recurrent genomic rearrangements associated with disease

The Jackson Laboratory: The Mouseion at the JAXlibrary

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

Author: Audano Peter A.
Baker Carl
Concepcion Gregory T.
Eichler Evan E.
Hunkapiller Michael W.
Kronenberg Zev N.
Lansdorp Peter M.
Logsdon Glennis A.
Munson Katherine M.
Peluso Paul
Porubsky David
Sanders Ashley D.
Spierings Diana C. J.
Sulovari Arvis
Surti Urvashi
Vollger Mitchell R.
Wenger Aaron M.
Publication venue: 'Wiley'
Publication date: 01/03/2020
Field of study

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

MDC Repository

Dissertations of the University of Groningen

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Author: Audano Peter A
Chaisson Mark J P
Devine Scott E
Ebert Peter
Ebler Jana
Eichler Evan E
Ghareghani Maryam
Harvey William T
Haukness Marina
Korbel Jan O
Lansdorp Peter M
Lee Charles
Marijon Pierre
Marschall Tobias
Munson Katherine M
Paten Benedict
Porubsky David
Sanders Ashley D
Sorensen Melanie
Structural Variation Consortium Human Genome
Sulovari Arvis
Vollger Mitchell R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

MDC Repository

MPG.PuRe

Human-specific tandem repeat expansion and differential gene expression during primate evolution

Author: Audano Peter A.
Chaisson Mark J. P.
Eichler Evan E.
Li Ruiyang
Logsdon Glennis A.
Pollen Alex A.
Porubsky David
Sulovari Arvis
Vollger Mitchell R.
Warren Wesley C.
Publication venue
Publication date: 06/09/2019
Field of study

Short tandem repeats (STRs) and variable number tandem repeats (VNTRs) are important sources of natural and disease-causing variation, yet they have been problematic to resolve in reference genomes and genotype with short-read technology. We created a framework tomodel the evolution and instability of STRs and VNTRs in apes. We phased and assembled 3 ape genomes (chimpanzee, gorilla, and orangutan) using long-read and 10x Genomics linked-read sequence data for 21,442 human tandem repeats discovered in 6 haplotype-resolved assemblies of Yoruban, Chinese, and Puerto Rican origin. We define a set of 1,584 STRs/VNTRs expanded specifically in humans, including large tandem repeats affecting coding and noncoding portions of genes (e.g., MUC3A, CACNA1C). We show that short interspersed nuclear element-VNTR-Alu (SVA) retrotransposition is the main mechanism for distributing GC-rich human-specific tandem repeat expansions throughout the genome but with a bias against genes. In contrast, we observe that VNTRs not originating from retrotransposons have a propensity to cluster near genes, especially in the subtelomere. Using tissue-specific expression from human and chimpanzee brains, we identify genes where transcript isoform usage differs significantly, likely caused by cryptic splicing variation within VNTRs. Using single-cell expression from cerebral organoids, we observe a strong effect for genes associated with transcription profiles analogous to intermediate progenitor cells. Finally, we compare the sequence composition of some of the largest human-specific repeat expansions and identify 52 STRs/VNTRs with at least 40 uninterrupted pure tracts as candidates for genetically unstable regions associated with disease

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MDC Repository

Dissertations of the University of Groningen

Genomic inversions and GOLGA core duplicons underlie disease instability at the 15q25 locus

Author: Accadia Maria
Antonacci Francesca
Cantsilieris Stuart
Carella Massimo
Coe Bradley P.
Dumont Beth L.
D’Addabbo Pietro
Eichler Evan E.
Maggiolini Flavia A. M.
Manganelli Michele
Palumbo Orazio
Palumbo Pietro
Sanders Ashley D.
Vollger Mitchell R.
Wing Chun Pang Andy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

<div><p>Human chromosome 15q25 is involved in several disease-associated structural rearrangements, including microdeletions and chromosomal markers with inverted duplications. Using comparative fluorescence <i>in situ</i> hybridization, strand-sequencing, single-molecule, real-time sequencing and Bionano optical mapping analyses, we investigated the organization of the 15q25 region in human and nonhuman primates. We found that two independent inversions occurred in this region after the fission event that gave rise to phylogenetic chromosomes XIV and XV in humans and great apes. One of these inversions is still polymorphic in the human population today and may confer differential susceptibility to 15q25 microdeletions and inverted duplications. The inversion breakpoints map within segmental duplications containing core duplicons of the <i>GOLGA</i> gene family and correspond to the site of an ancestral centromere, which became inactivated about 25 million years ago. The inactivation of this centromere likely released segmental duplications from recombination repression typical of centromeric regions. We hypothesize that this increased the frequency of ectopic recombination creating a hotspot of hominid inversions where dispersed <i>GOLGA</i> core elements now predispose this region to recurrent genomic rearrangements associated with disease.</p></div

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Bari

Archivio istituzionale della ricerca - Università di Brescia

MDC Repository

FigShare

Genomic inversions and GOLGA core duplicons underlie disease instability at the 15q25 locus.

Author: Accadia Maria
Antonacci. Francesca
Cantsilieris Stuart
Carella Massimo
Coe Bradley P.
Dumont Beth L.
D’addabbo Pietro
Eichler Evan E.
Maggiolini Flavia A. M.
Manganelli Michele
Palumbo Orazio
Palumbo Pietro
Sanders Ashley D.
Vollger Mitchell R.
Wing Chun Pang Andy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Università di Brescia

Characterization of large-scale genomic differences in the first complete human genome

Author: Dan Meng
Dylan J. Taylor
Evan E. Eichler
Glennis A. Logsdon
Junfeng Shi
Lianting Fu
Manying Xia
Michael C. Schatz
Mitchell R. Vollger
Nae-Chyun Chen
Qing Lu
Rajiv C. McCoy
Shilong Zhang
Weidong Li
William T. Harvey
Xiangyu Yang
Xuankai Wang
Yafei Mao
Yawen Zou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2023
Field of study

Abstract Background The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. Results Here, in addition to the previously reported “non-syntenic” regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region—the KLRC gene cluster—show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. Conclusion Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies

Directory of Open Access Journals

Recommended from our members

Epigenetic patterns in a complete human genome.

Author: Altemose Nicolas
Caldas Gina V
Eichler Evan E
Gershman Ariel
Guitart Xavi
Hook Paul W
Hoyt Savannah J
Jain Miten
Koren Sergey
Logsdon Glennis A
Miga Karen H
O'Neill Rachel J
Phillippy Adam M
Razaghi Roham
Rhie Arang
Sauria Michael EG
Schatz Michael C
Shumate Alaina
Timp Winston
Vollger Mitchell R
Publication venue: eScholarship, University of California
Publication date: 01/04/2022
Field of study

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation

eScholarship - University of California

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

Author: Antonacci Francesca
Baker Carl
Blanché Hélène
Cantsilieris Stuart
Chiatante Giorgia
Dang Vy
Deleuze Jean-François
Eichler Evan E
Hoekzema Kendra
Hsieh PingHsun
Kronenberg Zev N
Lewis Alexandra P
Maggiolini Flavia Angela Maria
Munson Katherine M
Murali Shwetha
Nelson Bradley J
Porubsky David
Sorensen Melanie
Underwood Jason G
Vollger Mitchell R
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/01/2019
Field of study

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation

Archivio istituzionale della ricerca - Università di Bari

HAL-CEA