Search CORE

94 research outputs found

ParMap, an algorithm for the identification of small genomic insertions and deletions in nextgen sequencing data

Author: A Gnirke
Adolfo A Ferrando
AV Dalca
Hossein Khiabanian
J Shendure
JD McPherson
KJ McKernan
N Homer
P Medvedev
P Van Vlierberghe
Pieter Van Vlierberghe
Raul Rabadan
RM Kuhn
SM Rumble
Teresa Palomero
Publication venue: BioMed Central
Publication date: 01/05/2010
Field of study

Abstract Background Next-generation sequencing produces high-throughput data, albeit with greater error and shorter reads than traditional Sanger sequencing methods. This complicates the detection of genomic variations, especially, small insertions and deletions. Findings Here we describe ParMap, a statistical algorithm for the identification of complex genetic variants, such as small insertion and deletions, using partially mapped reads in nextgen sequencing data. Conclusions We report ParMap's successful application to the mutation analysis of chromosome X exome-captured leukemia DNA samples.</p

Crossref

Directory of Open Access Journals

PubMed Central

Structural Alterations from Multiple Displacement Amplification of a Human Genome Revealed by Mate-Pair Sequencing

Author: AJ Iafrate
C Tanabe
CA Klein
Christian Tellgren-Roth
FB Dean
H Telenius
Jonathan Mangion
JR Nelson
Jörg D. Hoheisel
KJ McKernan
L Lovmar
L Zhang
Liqun He
Magnus Rosenlund
PJ Campbell
PJ Stephens
RS Lasken
S Volik
Sean D. Hooper
T Sjöblom
Tobias Sjöblom
Xiang Jiao
Yutao Fu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Comprehensive identification of the acquired mutations that cause common cancers will require genomic analyses of large sets of tumor samples. Typically, the tissue material available from tumor specimens is limited, which creates a demand for accurate template amplification. We therefore evaluated whether phi29-mediated whole genome amplification introduces false positive structural mutations by massive mate-pair sequencing of a normal human genome before and after such amplification. Multiple displacement amplification led to a decrease in clone coverage and an increase by two orders of magnitude in the prevalence of inversions, but did not increase the prevalence of translocations. While multiple strand displacement amplification may find uses in translocation analyses, it is likely that alternative amplification strategies need to be developed to meet the demands of cancer genomics

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A novel and well-defined benchmarking method for second generation read mapping

Author: A Döring
A Valouev
Anne-Katrin Emde
B Langmead
C Alkan
C Amid
D Weese
DA Wheeler
David Weese
DR Bentley
ER Mardis
G Myers
G Navarro
H Li
J Deng
J Dohm
J Qin
KJ McKernan
Knut Reinert
M Holtgrewe
Manuel Holtgrewe
P Sanders
R Guigó
R Li
SB Ng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task. Results We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format. Conclusions We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html

Institutional Repository of the Freie Universität Berlin

Crossref

Springer - Publisher Connector

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

Author: Andreas Wilke
AR Quinlan
B Ewing
B Niu
C Quince
C Quince
C Quince
C von Mering
DH Huson
EA Dinsdale
F Meyer
Folker Meyer
HC Bravo
J Reeder
Jared Wilkening
JC Dohm
JG Caporaso
Kevin P. Keegan
KJ Hoff
KJ McKernan
M Margulies
Mark D'Souza
MJ Pallen
MP Cox
PJ Cock
R Seshadri
RA Freitas
RC Edgar
Scott Markel
SG Tringe
SM Huse
SM Huse
TD Harris
Travis Harrison
V Gomez-Alvarez
V Kunin
VM Markowitz
WC Kao
William L. Trimble
Y Sun
Publication venue: Public Library of Science
Publication date: 07/06/2012
Field of study

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

Author: A Bashir
AA Hoffmann
AJ Iafrate
AM Hillmer
AW Pang
B Zeitouni
C Alkan
CB Krimbas
DC Richter
E Tuzun
F Hormozdiari
F Hormozdiari
H Li
H Stefansson
J Cao
J Sebat
J Wang
JC Roach
JM Kidd
JM Kidd
JO Korbel
JO Korbel
José Ignacio Lucas Lledó
K Chen
KF Manly
KJ McKernan
L Feuk
M Onishi-Seebacher
Mario Cáceres
P Medvedev
PJ Campbell
PJ Stephens
R Xi
S Suzuki
SM Ahn
SS Sindi
T Rausch
Y Jiang
ZD Zhang
Zhanjiang Liu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions -SVDetect, GRIAL, and VariationHunter-, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB

Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque

Author: A Hamosh
AG Jegga
AM Trichel
AW Chan
B Ferguson
B Ling
Bing Su
DS Wishart
G Loots
J Rogers
J Satkoski
J Wang
J Wang
JP Overington
Jun Wang
Kaixiong Ye
KJ McKernan
LD Stein
Lixin Yang
M Goodman
M Raveendran
Ming Li
ML Metzker
P Tong
R Li
RA Gibbs
RA Weiss
RC Edgar
RD Finn
RD Hernandez
RJ Colman
RS Malhi
Rui Zhang
S Chun
S Griffiths-Jones
S Khouangsathiene
S Levy
SH Yang
TJ Hubbard
TR Disotell
Xiaodong Fang
Xiaosen Guo
Yanfeng Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

Author: A Kowarsch
AV Zimin
B Hayes
BL Golden
C Alkan
C Alkan
C Camacho
C Xie
CG Elsik
CP Van Tassell
DQ Nguyen
DQ Nguyen
EW Sayers
FL Houghton
GE Liu
GE Shook
HY Yuan
J Fadista
Jennifer M Sumner-Thomson
JM Kidd
JR Grant
JS Bae
Jung-Woo Choi
KJ McKernan
M Ashburner
M Golik
ME Goddard
P Flicek
Paul Stothard
PD Stenson
RA Gibbs
SH Eck
Stephen S Moore
UniProt Consortium
Urmila Basu
Xiaoping Liao
Y Liu
Yan Meng
Z Du
Z Zhang
ZL Hu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle.Results: The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs), 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel) between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs). Ten randomly selected CNVs, five genic and five non-genic, were successfully validated using quantitative real-time PCR. The CNVs are enriched for immune system genes and include genes that may contribute to lactation capacity. The majority of the CNVs (69%) were detected as regions with higher abundance in the Holstein bull.Conclusions: Substantial genetic differences exist between the Black Angus and Holstein animals sequenced in this work and the Hereford reference sequence, and some of this variation is predicted to affect evolutionarily conserved amino acids or gene copy number. The deeply annotated SNPs and CNVs identified in this resequencing study can serve as useful genetic tools, and as candidates in searches for phenotype-altering DNA differences

Crossref

Springer - Publisher Connector

PubMed Central

University of Queensland eSpace

Differentially expressed alternatively spliced genes in Malignant Pleural Mesothelioma identified using massively parallel transcriptome sequencing

Author: Assunta De Rienzo
B Modrek
C Lecomte
D Thierry-Mieg
David J Sugarbaker
DJ Sugarbaker
E Buratti
E Sakhinia
F Faul
Gavin J Gordon
GY Li
J Helleman
JD Hoffman
JK Cowell
JM Coulson
JM Johnson
JP Venables
JP Venables
K Honda
K Thorsen
KJ McKernan
Lingsheng Dong
M Margulies
M Roy
MA Garcia-Blanco
MB Watson
P Dammeyer
PY Gasdaska
Q Tanko
R Klinck
Raphael Bueno
Roderick V Jensen
S Gupta
SW Blain
T Miwa
T Papp
U Peters
WN Venables
Y Chen
Yanlong Xu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Analyses of Expressed Sequence Tags (ESTs) databases suggest that most human genes have multiple alternative splice variants. The alternative splicing of pre-mRNA is tightly regulated during development and in different tissue types. Changes in splicing patterns have been described in disease states. Recently, we used whole-transcriptome shotgun pryrosequencing to characterize 4 malignant pleural mesothelioma (MPM) tumors, 1 lung adenocarcinoma and 1 normal lung. We hypothesized that alternative splicing profiles might be detected in the sequencing data for the expressed genes in these samples. Methods We developed a software pipeline to map the transcriptome read sequences of the 4 MPM samples and 1 normal lung sample onto known exon junction sequences in the comprehensive AceView database of expressed sequences and to count how many reads map to each junction. 13,274,187 transcriptome reads generated by the Roche/454 sequencing platform for 5 samples were compared with 151,486 exon junctions from the AceView database. The exon junction expression index (EJEI) was calculated for each exon junction in each sample to measure the differential expression of alternative splicing events. Top ten exon junctions with the largest EJEI difference between the 4 mesothelioma and the normal lung sample were then examined for differential expression using Quantitative Real Time PCR (qRT-PCR) in the 5 sequenced samples. Two of the differentially expressed exon junctions (ACTG2.aAug05 and CDK4.aAug05) were further examined with qRT-PCR in additional 18 MPM and 18 normal lung specimens. Results We found 70,953 exon junctions covered by at least one sequence read in at least one of the 5 samples. All 10 identified most differentially expressed exon junctions were validated as present by RT-PCR, and 8 were differentially expressed exactly as predicted by the sequence analysis. The differential expression of the AceView exon junctions for the ACTG2 and CDK4 genes were also observed to be statistically significant in an additional 18 MPM and 18 normal lung samples examined using qRT-PCR. The differential expression of these two junctions was shown to successfully classify these mesothelioma and normal lung specimens with high sensitivity (89% and 78%, respectively). Conclusion Whole-transcriptome shotgun sequencing, combined with a downstream bioinformatics pipeline, provides powerful tools for the identification of differentially expressed exon junctions resulting from alternative splice variants. The alternatively spliced genes discovered in the study could serve as useful diagnostic markers as well as potential therapeutic targets for MPM.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line

Author: Ascia Eskin
Barry Merriman
Brian D. O'Connor
DA Wheeler
DR Bentley
DW Collins
ES Lander
G Dennis Jr
H Lee
H Li
H Li
Hane Lee
J Ponten
J Wang
JA Squire
JC Venter
JI Kim
K Yamane
KJ McKernan
M Krzywinski
Marshall S. Horwitz
ME Law
Michael James Clark
MS Taylor
N Homer
N Homer
Nils Homer
P Hupe
PA Futreal
R Beroukhim
R Stupp
RE Mills
S Bamford
S Levy
SM Ahn
ST Sherry
Stanley F. Nelson
TJ Ley
W Huang da
X Gu
Y Lee
Zugen Chen
Publication venue: Public Library of Science
Publication date: 29/01/2010
Field of study

U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date

Public Library of Science (PLOS)

Crossref

PubMed Central

Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

Author: A Morgulis
A Untergasser
Addie Vereijken
AJ Sharp
B Daines
B Ewing
BE Stranger
Bert W Dibbits
BM Skinner
D Wright
DF Conrad
DK Griffin
DR Bentley
E Tuzun
EJ Hollox
F Zhang
G Benson
GK Wong
H Li
H Megens
H Stefansson
Hindrik HD Kerstens
JA Lee
JM Kidd
JO Korbel
JS Mattick
K Chen
KJ McKernan
KK Wong
Martien AM Groenen
MG Elferink
NP Carter
PJ Campbell
Q Xia
R Li
R Redon
Richard PMA Crooijmans
Ron Okimoto
SA McCarroll
T Hori
TA Graubert
TJP Hubbard
TL Newman
V Guryev
W Chen
WE Stumph
Z Bao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken. Results We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome. Conclusion We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications