Search CORE

31 research outputs found

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

Author: Albracht Derek
et al
Fulton Robert S
Graves-Lindsay Tina
Kremitzki Milinn
Magrini Vincent
Markovic Chris
McGrath Sean
Steinberg Karyn Meltz
Wilson Richard K
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Building and Improving Reference Genome Assemblies: This paper reviews the problems and algorithms of assembling a complete genome from millions of short DNA sequencing reads

Author: Alkan C.
Church D.M.
Meltz Steinberg K.
Montague M.J.
Schneider V.A.
Warren W.C.
Wilson R.K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

A genome sequence assembly provides the foundation for studies of genotypic and phenotypic variation, genome structure, and evolution of the target organism. In the past four decades, there has been a surge of new sequencing technologies, and with these developments, computational scientists have developed new algorithms to improve genome assembly. Here we discuss the relationship between sequencing technology improvements and assembly algorithm development and how these are applied to extend and improve human and nonhuman genome assemblies. © 1963-2012 IEEE

Bilkent University Institutional Repository

Discovery and genotyping of structural variation from long-read haploid genome sequence data

Author: Boitano Matthew
Chaisson Mark J.P.
Chin Chen-Shin
Eichler Evan E
Gordon David
Graves-Lindsay Tina A
Hoekzema Kendra
Huddleston John
Korlach Jonas
Kronenberg Zev N
Munson Katherine M
Peluso Paul
Steinberg Karyn Meltz
Vives Laura
Warren Wes
Wilson Richard K
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.</jats:p

Crossref

Digital Commons@Becker

Single haplotype assembly of the human genome from a hydatidiform mole

Author: Agarwala Richa
Church Deanna M.
Eichler Evan E.
Fulton Robert S.
Graves-Lindsay Tina A.
Huddleston John
Meltz Steinberg Karyn
Morgulis Aleksandr
Schneider Valerie A.
Shiryev Sergey A.
Surti Urvashi
Warren Wesley C.
Wilson Richard K.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

Crossref

Digital Commons@Becker

PubMed Central

Recommended from our members

Exome sequencing of Finnish isolates enhances rare-variant association power.

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power

eScholarship - University of California

De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes

Author: A Al-Chalabi
A Cecchinato
A Kong
A McKenna
Alan Pittman
B Bertelsen
BS Petersen
C Lavedan
CD Campbell
Charles Lee
Chengsheng Zhang
D Freed
D Mataix-Cols
D Nickles
D Vitucci
Deborah Hughes
DF Levinson
E Colvert
EA Ehli
EHM Wong
Eliza Cerveira
Elliott Rees
EV Davydov
F Antonacci
F Magne
G Kuhlenbäumer
George Kirov
GM Dal
H Higashida
IA Adzhubei
J Chen
J Dongen van
J Fallon
J Tang
Jamal Nasir
JB Potash
JM Schwarz
John Hardy
K Meltz Steinberg
K Ohi
K Wang
K Wang
Kerra Pearce
L Cai
L Vadlamudi
L Yuan
LC Francioli
M Florio
Mark Kristiansen
ME Ketelaar
Michael Simpson
MJ Lindhurst
MY Dennis
Niranjanan Nirmalananthan
Nirmal Vadgama
P Kumar
Peter De Rijk
Qihui Zhu
R Acuna-Hidalgo
R Hashimoto
R Hilker
R Pamphlett
Robin Murray
RP Ebstein
S Akbarian
S Beicht
S Petrovski
S Schuster
SE Baranzini
SP Robertson
Takeo Yoshikawa
Tomas Fitzgerald
V Labrie
YL Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Recent studies have demonstrated genetic differences between monozygotic (MZ) twins. To test the hypothesis that early post-twinning mutational events associate with phenotypic discordance, we investigated a cohort of 13 twin pairs (n = 26) discordant for various clinical phenotypes using whole-exome sequencing and screened for copy number variation (CNV). We identified a de novo variant in PLCB1, a gene involved in the hydrolysis of lipid phosphorus in milk from dairy cows, associated with lactase non-persistence, and a variant in the mitochondrial complex I gene MT-ND5 associated with amyotrophic lateral sclerosis (ALS). We also found somatic variants in multiple genes (TMEM225B, KBTBD3, TUBGCP4, TFIP11) in another MZ twin pair discordant for ALS. Based on the assumption that discordance between twins could be explained by a common variant with variable penetrance or expressivity, we screened the twin samples for known pathogenic variants that are shared and identified a rare deletion overlapping ARHGAP11B, in the twin pair manifesting with either schizotypal personality disorder or schizophrenia. Parent-offspring trio analysis was implemented for two twin pairs to assess potential association of variants of parental origin with susceptibility to disease. We identified a de novo variant in RASD2 shared by 8-year-old male twins with a suspected diagnosis of autism spectrum disorder (ASD) manifesting as different traits. A de novo CNV duplication was also identified in these twins overlapping CD38, a gene previously implicated in ASD. In twins discordant for Tourette's syndrome, a paternally inherited stop loss variant was detected in AADAC, a known candidate gene for the disorder

Crossref

Online Research @ Cardiff

The Jackson Laboratory: The Mouseion at the JAXlibrary

University of Northampton's Research Explorer

UCL Discovery

Institutional Repository Universiteit Antwerpen

King's Research Portal

St George's Online Research Archive

NECTAR