Search CORE

6,693 research outputs found

Discovery and genotyping of novel sequence insertions in many sequenced individuals

Author: Alkan C.
Asghari H.
Güngör T.
Hach F.
Kavak P.
Lin Y.-Y.
Numanagić I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Motivation: Despite recent advances in algorithms design to characterize structural variation using high-throughput short read sequencing (HTS) data, characterization of novel sequence insertions longer than the average read length remains a challenging task. This is mainly due to both computational difficulties and the complexities imposed by genomic repeats in generating reliable assemblies to accurately detect both the sequence content and the exact location of such insertions. Additionally, de novo genome assembly algorithms typically require a very high depth of coverage, which may be a limiting factor for most genome studies. Therefore, characterization of novel sequence insertions is not a routine part of most sequencing projects. There are only a handful of algorithms that are specifically developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. Still, most such algorithms rely on high depth of coverage, and to our knowledge there is only one method (PopIns) that can use multi-sample data to "collectively" obtain a very high coverage dataset to accurately find insertions common in a given population. Result: Here, we present Pamir, a new algorithm to efficiently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets. Pamir is able to detect breakpoint locations of the insertions and calculate their zygosity (i.e. heterozygous versus homozygous) by analyzing multiple sequence signatures, matching one-end-anchored sequences to small-scale de novo assemblies of unmapped reads, and conducting strand-aware local assembly. We test the efficacy of Pamir on both simulated and real data, and demonstrate its potential use in accurate and routine identification of novel sequence insertions in genome projects. © 2017 The Author. Published by Oxford University Press. All rights reserved

Bilkent University Institutional Repository

Discovery and genotyping of structural variation from long-read haploid genome sequence data

Author: Boitano Matthew
Chaisson Mark J.P.
Chin Chen-Shin
Eichler Evan E
Gordon David
Graves-Lindsay Tina A
Hoekzema Kendra
Huddleston John
Korlach Jonas
Kronenberg Zev N
Munson Katherine M
Peluso Paul
Steinberg Karyn Meltz
Vives Laura
Warren Wes
Wilson Richard K
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

Digital Commons@Becker

Recommended from our members

Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants.

Author: Brown C Titus
Hormozdiari Fereydoun
Standage Daniel S
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

De novo genetic variants are an important source of causative variation in complex genetic disorders. Many methods for variant discovery rely on mapping reads to a reference genome, detecting numerous inherited variants irrelevant to the phenotype of interest. To distinguish between inherited and de novo variation, sequencing of families (parents and siblings) is commonly pursued. However, standard mapping-based approaches tend to have a high false-discovery rate for de novo variant prediction. Kevlar is a mapping-free method for de novo variant discovery, based on direct comparison of sequences between related individuals. Kevlar identifies high-abundance k-mers unique to the individual of interest. Reads containing these k-mers are partitioned into disjoint sets by shared k-mer content for variant calling, and preliminary variant predictions are sorted using a probabilistic score. We evaluated Kevlar on simulated and real datasets, demonstrating its ability to detect both de novo single-nucleotide variants and indels with high accuracy

eScholarship - University of California

Recommended from our members

Mapping Copy Number Variation by Population Scale Genome Sequencing

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.Organismic and Evolutionary Biolog

Harvard University - DASH

Identification of Structural Variation in Chimpanzees Using Optical Mapping and Nanopore Sequencing.

Author: Andrés Aida M
Dennis Megan Y
Kaya Gulhan
Mastoras Mira
Sahasrabudhe Ruta
Schmidt Joshua M
Shew Colin
Soto Daniela C
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Recent efforts to comprehensively characterize great ape genetic diversity using short-read sequencing and single-nucleotide variants have led to important discoveries related to selection within species, demographic history, and lineage-specific traits. Structural variants (SVs), including deletions and inversions, comprise a larger proportion of genetic differences between and within species, making them an important yet understudied source of trait divergence. Here, we used a combination of long-read and -range sequencing approaches to characterize the structural variant landscape of two additional Pan troglodytes verus individuals, one of whom carries 13% admixture from Pan troglodytes troglodytes. We performed optical mapping of both individuals followed by nanopore sequencing of one individual. Filtering for larger variants (>10 kbp) and combined with genotyping of SVs using short-read data from the Great Ape Genome Project, we identified 425 deletions and 59 inversions, of which 88 and 36, respectively, were novel. Compared with gene expression in humans, we found a significant enrichment of chimpanzee genes with differential expression in lymphoblastoid cell lines and induced pluripotent stem cells, both within deletions and near inversion breakpoints. We examined chromatin-conformation maps from human and chimpanzee using these same cell types and observed alterations in genomic interactions at SV breakpoints. Finally, we focused on 56 genes impacted by SVs in >90% of chimpanzees and absent in humans and gorillas, which may contribute to chimpanzee-specific features. Sequencing a greater set of individuals from diverse subspecies will be critical to establish the complete landscape of genetic variation in chimpanzees

UCL Discovery

eScholarship - University of California

Multi-platform discovery of haplotype-resolved structural variation in human genomes

Author: Ding Li
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker

Paragraph: A graph-based structural variant genotyper for short-read sequence data

Author: Bentley D. R.
Chen S.
Dolzhenko E.
Eberle M. A.
Kirsche M.
Krusche P.
Petrovski R.
Schatz M. C.
Schlesinger F.
Sedlazeck F. J.
Sherman R. M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/12/2019
Field of study

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long-read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies. © 2019 The Author(s)

Cold Spring Harbor Laboratory Institutional Repository