Search CORE

29 research outputs found

Estimation of Sequencing Error Rates in Short Reads

Author: Blades Natalie
Ding Jie
Parmigiani Giovanni
Sultana Razvan
Wang Xin Victoria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors and it is important to have simple and reliable approaches for monitoring it at the level of individual experiments. Results: We developed a fast, scalable and accurate approach to estimating error rates in short reads, which has the added advantage of not requiring a reference genome. We build on the fundamental observation that there is a linear relationship between the copy number for a given read and the number of erroneous reads that differ from the read of interest by one or two bases. The slope of this relationship can be transformed to give an estimate of the error rate, both by read and by position. We present simulation studies as well as analyses of real data sets illustrating the precision and accuracy of this method, and we show that it is more accurate than alternatives that count the difference between the sample of interest and a reference genome. We show how this methodology led to the detection of mutations in the genome of the PhiX strain used for calibration of Illumina data. The proposed method is implemented in an R package, which can be downloaded from http://bcb.dfci.harvard.edu/∼vwang/shadowRegression.html. Conclusions: The proposed method can be used to monitor the quality of sequencing pipelines at the level of individual experiments without the use of reference genomes. Furthermore, having an estimate of the error rates gives one the opportunity to improve analyses and inferences in many applications of next-generation sequencing data

Crossref

Harvard University - DASH

Springer - Publisher Connector

Recommended from our members

Next-generation sequencing of dsRNA is greatly improved by treatment with the inexpensive denaturing reagent DMSO.

Author: Delwart Eric
Díaz-Muñoz Samuel L
Wilcox Alexander H
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

dsRNA is the genetic material of important viruses and a key component of RNA interference-based immunity in eukaryotes. Previous studies have noted difficulties in determining the sequence of dsRNA molecules that have affected studies of immune function and estimates of viral diversity in nature. DMSO has been used to denature dsRNA prior to the reverse-transcription stage to improve reverse transcriptase PCR and Sanger sequencing. We systematically tested the utility of DMSO to improve the sequencing yield of a dsRNA virus (Φ6) in a short-read next-generation sequencing platform. DMSO treatment improved sequencing read recovery by over two orders of magnitude, even when RNA and cDNA concentrations were below the limit of detection. We also tested the effects of DMSO on a mock eukaryotic viral community and found that dsRNA virus reads increased with DMSO treatment. Furthermore, we provide evidence that DMSO treatment does not adversely affect recovery of reads from a ssRNA viral genome (influenza A/California/07/2009). We suggest that up to 50 % DMSO treatment be used prior to cDNA synthesis when samples of interest are composed of or may contain dsRNA

eScholarship - University of California

DRISEE overestimates errors in metagenomic sequencing data

Author: Eren A. Murat
Huse Susan M.
Morrison Hilary G.
Sogin Mitchell L.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

© The Author(s), 2013. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Briefings in Bioinformatics 15 (2014): 783-787, doi:10.1093/bib/bbt010.The extremely high error rates reported by Keegan et al. in ‘A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE’ (PLoS Comput Biol 2012;8:e1002541) for many next-generation sequencing datasets prompted us to re-examine their results. Our analysis reveals that the presence of conserved artificial sequences, e.g. Illumina adapters, and other naturally occurring sequence motifs accounts for most of the reported errors. We conclude that DRISEE reports inflated levels of sequencing error, particularly for Illumina data. Tools offered for evaluating large datasets need scrupulous review before they are implemented.National Institutes of Health [1UH2DK083993 to M.L.S.]; National Science Foundation [BDI- 096026 to S.M.H.]

CiteSeerX

Woods Hole Open Access Server

PubMed Central

Next-generation sequencing : an eye-opener for the surveillance of antiviral resistance in influenza

Author: Roosens Nancy H
Saelens Xavier
Thomas Isabelle
Van Poelvoorde Laura
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Next-generation sequencing (NGS) can enable a more effective response to a wide range of communicable disease threats, such as influenza, which is one of the leading causes of human morbidity and mortality worldwide. After vaccination, antivirals are the second line of defense against influenza. The use of currently available antivirals can lead to antiviral resistance mutations in the entire influenza genome. Therefore, the methods to detect these mutations should be developed and implemented. In this Opinion, we assess how NGS could be implemented to detect drug resistance mutations in clinical influenza virus isolates

Sciensano Publications Repository

Ghent University Academic Bibliography

Maximize Resolution or Minimize Error? Using Genotyping-By-Sequencing to Investigate the Recent Diversification of Helianthemum (Cistaceae)

Author: Aparicio Martínez Abelardo
Fernández Mazuecos Mario
González Albaladejo Rafael
Martín Hernánz Sara
Olangua Corral María
Reyes Betancort J. Alfredo
Rubio Pérez Encarnación
Santos Guerra Arnoldo
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

A robust phylogenetic framework, in terms of extensive geographical and taxonomic sampling, well-resolved species relationships and high certainty of tree topologies and branch length estimations, is critical in the study of macroevolutionary patterns. Whereas Sanger sequencing-based methods usually recover insufficient phylogenetic signal, especially in recently diversified lineages, reduced-representation sequencing methods tend to provide well-supported phylogenetic relationships, but usually entail remarkable bioinformatic challenges due to the inherent trade-off between the number of SNPs and the magnitude of associated error rates. The genus Helianthemum (Cistaceae) is a species-rich and taxonomically complex Palearctic group of plants that diversified mainly since the Upper Miocene. It is a challenging case study since previous attempts using Sanger sequencing were unable to resolve the intrageneric phylogenetic relationships. Aiming to obtain a robust phylogenetic reconstruction based on genotyping-by-sequencing (GBS), we established a rigorous methodological workflow in which we i) explored how variable settings during dataset assembly have an impact on error rates and on the degree of resolution under concatenation and coalescent approaches, ii) assessed the effect of two extreme parameter configurations (minimizing error rates vs. maximizing phylogenetic resolution) on tree topology and branch lengths, and iii) evaluated the effects of these two configurations on estimates of divergence times and diversification rates. Our analyses produced highly supported topologically congruent phylogenetic trees for both configurations. However, minimizing error rates did produce more reliable branch lengths, critically affecting the accuracy of downstream analyses (i.e. divergence times and diversification rates). In addition to recommending a revision of intrageneric systematics, our results enabled us to identify three highly diversified lineages in Helianthemum in contrasting geographical areas and ecological conditions, which started radiating in the Upper Miocene.España, MINECO grants CGL2014- 52459-P and CGL2017-82465-PEspaña, Ministerio de Economía, Industria y Competitividad, reference IJCI-2015-2345

Docta Complutense

idUS. Depósito de Investigación Universidad de Sevilla

Clinical Genetic Testing in Children with Kidney Disease

Author: Beom Hee Lee
Eungu Kang
Publication venue: Korean Society of Pediatric Nephrology
Publication date: 01/06/2021
Field of study

Chronic kidney disease, the presence of structural and functional abnormalities in the kidneys, is associated with a lower quality of life and increased morbidity and mortality in children. Genetic etiologies account for a substantial proportion of pediatric chronic kidney disease. With recent advances in genetic testing techniques, an increasing number of genetic causes of kidney disease continue to be found. Genetic testing is recommended in children with steroid-resistant nephrotic syndrome, congenital malformations of the kidney and urinary tract, cystic disease, or kidney disease with extrarenal manifestations. Diagnostic yields differ according to the category of clinical diagnosis and the choice of test. Here, we review the characteristics of genetic testing modalities and the implications of genetic testing in clinical genetic diagnostics

Directory of Open Access Journals

Population-Sequencing as a Biomarker for Sample Characterization

Author
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

The detection of high-qualified indels in exomes and their effect on cognition

Author: Younis Nadine
Publication venue
Publication date: 01/12/2021
Field of study

Plusieurs insertions/délétions (indels) génétiques ont été identifiées en lien avec des troubles du neurodéveloppement, notamment le trouble du spectre de l’autisme (TSA) et la déficience intellectuelle (DI). Bien que ce soit le deuxième type de variant le plus courant, la détection et l’identification des indels demeure difficile à ce jour, et on y retrouve un grand nombre de faux positifs. Ce projet vise à trouver une méthode pour détecter des indels de haute qualité ayant une forte probabilité d’être des vrais positifs. Un « ensemble de vérité » a été construit à partir d’indels provenant de deux cohortes familiales basé sur un diagnostic d’autisme. Ces indels ont été filtrés selon un ensemble de paramètres prédéterminés et ils ont été appelés par plusieurs outils d’appel de variants. Cet ensemble a été utilisé pour entraîner trois modèles d’apprentissage automatique pour identifier des indels de haute qualité. Par la suite, nous avons utilisé ces modèles pour prédire des indels de haute qualité dans une cohorte de population générale, ayant été appelé par une technologie d’appel de variant. Les modèles ont pu identifier des indels de meilleure qualité qui ont une association avec le QI, malgré que cet effet soit petit. De plus, les indels prédits par les modèles affectent un plus petit nombre de gènes par individu que ceux ayant été filtrés par un seuil de rejet fixe. Les modèles ont tendance à améliorer la qualité des indels, mais nécessiteront davantage de travail pour déterminer si ce serait possible de prédire les indels qui ont un effet non-négligeable sur le QI.Genetic insertions/deletions (indels) have been linked to many neurodevelopmental disorders (NDDs) such as autism spectrum disorder (ASD) and intellectual disability (ID). However, although they are the second most common type of genetic variant, they remain to this day difficult to identify and verify, presenting a high number of false positives. We sought to find a method that would appropriately identify high-quality indels that are likely to be true positives. We built an indel “truth set” using indels from two diagnosis-based family cohorts that were filtered according to a set of threshold values and called by several variant calling tools in order to train three machine learning models to identify the highest quality indels. The two best performing models were then used to identify high quality indels in a general population cohort that was called using only one variant calling technology. The machine learning models were able to identify higher quality indels that showed a association with IQ, although the effect size was small. The indels predicted by the models also affected a much smaller number of genes per individual than those predicted through using minimum thresholds alone. The models tend to show an overall improvement in the quality of the indels but would require further work to see if it could a noticeable and significant effect on IQ

Dépôt Institutionnel Numérique

Quality control of microbiota metagenomics by k-mer analysis

Author: A Cotillard
B Langmead
B Yang
C Juste
Catherine Juste
CE Shannon
CJ Adler
Cyrielle Fougeroux
D Williams
Doriane Gouas
E Chatelier Le
ER Mardis
EV Koonin
Florence Levenez
Florian Plaza Onate
G Biesbroek
Guy Gorochov
J Li
J Qin
J Schroder
J Ward
Jean-Michel Batto
Jehane Fadlallah
JJ Godon
Joel Dore
JP McCutcheon
JT Simpson
KP Keegan
L Gao
M Arumugam
MA Dillies
Martin Larsen
N Kamada
Nicolas Pons
PJ Turnbaugh
RA Edwards
RC Edgar
RM Leggett
S Dusko Ehrlich
Sean Kennedy
T Ding
T Yatsunenko
TC Glenn
XV Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref