Search CORE

1,002 research outputs found

Multi-platform discovery of haplotype-resolved structural variation in human genomes

Author: Ding Li
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

BISER: Fast Characterization of Segmental Duplication Structure in Multiple Genome Assemblies

Author: Alkan Can
Hach Faraz
Numanagi? Ibrahim
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Publication date: 01/01/2021
Field of study

The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural parts, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure. However, optimal computation of SDs through standard local alignment algorithms is impractical due to the size of most genomes. A cross-genome evolutionary analysis of SDs is even harder, as one needs to characterize SDs in multiple genomes and find relations between those SDs and unique segments in other genomes. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. Here we introduce a new tool, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology (75%) to multiple genomes while introducing further 8-24x speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 90 million years

Dagstuhl Research Online Publication Server

Short tandem repeats, segmental duplications, gene deletion, and genomic instability in a rapidly diversified immune gene family

Crossref

Using population admixture to help complete maps of the human genome

Author: A Kong
A Sırmacı
AG Hinch
AL Price
Alkes L Price
Amelia M Lindgren
AP Reiner
Bogdan Pasaniuc
C Alkan
CA Winkler
Cynthia C Morton
D Botstein
D Reich
D Wegmann
DA Benson
David Reich
DM Church
DP Ryan
EE Eichler
EE Eichler
ES Lander
G Golfier
Giulio Genovese
H Donis-Keller
H Lango Allen
H Li
H Li
H Li
H Stefansson
HA Taylor Jr.
HC Mefford
Heng Li
J Christiansen
J Martin
J Weissenbach
J Zhang
JA Bailey
JA Bailey
JA Bailey
James G Wilson
JC Venter
JI Kim
JK Pickrell
JM Kidd
JM Korn
JT Robinson
K Musunuru
Kimberly Chambert
M Guipponi
M Ruault
MA DePristo
Martin R Pollak
MF Seldin
MM Mahtani
MY Dennis
N Brunetti-Pierri
NA Doggett
Nicolas Altemose
PH Sudmant
R Li
R Lyle
RE Handsaker
Robert E Handsaker
RV Samonte
S Gnerre
S Kirsch
S Levy
Steven A McCarroll
X She
YS Ju
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2013
Field of study

Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

The University of Manchester - Institutional Repository

Semi-automated assembly of high-quality diploid human reference genomes

Author: Cody Sarah
et al.
Fulton Lucinda L
Fulton Robert S
Jarvis Erich D
Li Daofeng
Lindsay Tina
Stitziel Nathan O
Wang Ting
Publication venue: Digital Commons@Becker
Publication date: 19/10/2022
Field of study

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements

Digital Commons@Becker

Inversion variants in human and primate genomes

Author: Antonacci Francesca
Archidiacono Nicoletta
BITONTO MIRIANA
Capozzi Oronzo
Catacchio Claudia Rita
D'Addabbo Pietro
Eichler Evan E
Maggiolini Flavia Angela Maria
Miroballo Mattia
Signorile Martina Lepore
Ventura Mario
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2018
Field of study

For many years, inversions have been proposed to be a direct driving force in speciation since they suppress recombination when heterozygous. Inversions are the most common large-scale differences among humans and great apes. Nevertheless, they represent large events easily distinguishable by classical cytogenetics, whose resolution, however, is limited. Here, we performed a genome-wide comparison between human, great ape, and macaque genomes using the net alignments for the most recent releases of genome assemblies. We identified a total of 156 putative inversions, between 103 kb and 91 Mb, corresponding to 136 human loci. Combining literature, sequence, and experimental analyses, we analyzed 109 of these loci and found 67 regions inverted in one or multiple primates, including 28 newly identified inversions. These events overlap with 81 human genes at their breakpoints, and seven correspond to sites of recurrent rearrangements associated with human disease. This work doubles the number of validated primate inversions larger than 100 kb, beyond what was previously documented. We identified 74 sites of errors, where the sequence has been assembled in the wrong orientation, in the reference genomes analyzed. Our data serve two purposes: First, we generated a map of evolutionary inversions in these genomes representing a resource for interrogating differences among these species at a functional level; second, we provide a list of misassembled regions in these primate genomes, involving over 300 Mb of DNA and 1978 human genes. Accurately annotating these regions in the genome references has immediate applications for evolutionary and biomedical studies on primates

Archivio istituzionale della ricerca - Università di Bari

Single haplotype assembly of the human genome from a hydatidiform mole

Author: Agarwala Richa
Church Deanna M.
Eichler Evan E.
Fulton Robert S.
Graves-Lindsay Tina A.
Huddleston John
Meltz Steinberg Karyn
Morgulis Aleksandr
Schneider Valerie A.
Shiryev Sergey A.
Surti Urvashi
Warren Wesley C.
Wilson Richard K.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

Crossref

Digital Commons@Becker

PubMed Central

Recommended from our members

OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.

Author: Chan Ting-Fung
Chu Catherine
Ho Pak-Leung
Kwok Pui-Yan
Lai Yvonne Yuk-Yin
Leung Alden King-Yung
Li Le
Liu Melissa Chun-Jiao
Yip Kevin Y
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

BackgroundOptical mapping is an emerging technology that complements sequencing-based methods in genome analysis. It is widely used in improving genome assemblies and detecting structural variations by providing information over much longer (up to 1 Mb) reads. Current standards in optical mapping analysis involve assembling optical maps into contigs and aligning them to a reference, which is limited to pairwise comparison and becomes bias-prone when analyzing multiple samples.FindingsWe present a new method, OMMA, that extends optical mapping to the study of complex genomic features by simultaneously interrogating optical maps across many samples in a reference-independent manner. OMMA captures and characterizes complex genomic features, e.g., multiple haplotypes, copy number variations, and subtelomeric structures when applied to 154 human samples across the 26 populations sequenced in the 1000 Genomes Project. For small genomes such as pathogenic bacteria, OMMA accurately reconstructs the phylogenomic relationships and identifies functional elements across 21 Acinetobacter baumannii strains.ConclusionsWith the increasing data throughput of optical mapping system, the use of this technology in comparative genome analysis across many samples will become feasible. OMMA is a timely solution that can address such computational need. The OMMA software is available at https://github.com/TF-Chan-Lab/OMTools

eScholarship - University of California

Stepwise evolution of a butterfly supergene via duplication and inversion

Author: De-Kayne Rishi
Ffrench-Constant Richard H
Gordon Ian J.
Kim Kang-Wook
Martin Simon H
Martins Dino J.
Saitoti Omufwoko Kennedy
Publication venue: 'The Royal Society'
Publication date: 13/06/2022
Field of study

Supergenes maintain adaptive clusters of alleles in the face of genetic mixing. Although usually attributed to inversions, supergenes can be complex, and reconstructing the precise processes that led to recombination suppression and their timing is challenging. We investigated the origin of the BC supergene, which controls variation in warning coloration in the African monarch butterfly, Danaus chrysippus. By generating chromosome-scale assemblies for all three alleles, we identified multiple structural differences. Most strikingly, we find that a region of more than 1 million bp underwent several segmental duplications at least 7.5 Ma. The resulting duplicated fragments appear to have triggered four inversions in surrounding parts of the chromosome, resulting in stepwise growth of the region of suppressed recombination. Phylogenies for the inversions are incongruent with the species tree and suggest that structural polymorphisms have persisted for at least 4.1 Myr. In addition to the role of duplications in triggering inversions, our results suggest a previously undescribed mechanism of recombination suppression through independent losses of divergent duplicated tracts. Overall, our findings add support for a stepwise model of supergene evolution involving a variety of structural changes. This article is part of the theme issue ‘Genomic architecture of supergenes: causes and evolutionary consequences’

PubMed Central

Edinburgh Research Explorer

eScholarship - University of California