Search CORE

FigShare

Mining for Structural Variations in Next-Generation Sequencing Data

Author: Dovč Peter
Ogorevc Jernej
Zorc Minja
Publication venue: 'IntechOpen'
Publication date: 20/06/2018
Field of study

Genomic structural variations (SVs) are genetic alterations that result in duplications, insertions, deletions, inversions, and translocations of segments of DNA covering 50 or more base pairs. By changing the organization of DNA, SVs can contribute to phenotypic variation or cause pathological consequences as neurobehavioral disorders, autoimmune diseases, obesity, and cancers. SVs were first examined using classic cytogenetic methods, revealing changes down to 3 Mb. Later techniques for SV detection were based on array comparative genome hybridization (aCGH) and single-nucleotide polymorphism (SNP) arrays. Next-generation sequencing (NGS) approaches enabled precise characterization of breakpoints of SVs of various types and sizes at a genome-wide scale. Dissecting SVs from NGS presents substantial challenge due to the relatively short sequence reads and the large volume of the data. Benign variants and reference errors in the genome present another dimension of problem complexity. Even though a wide range of tools is available, the usage of SV callers in routine molecular diagnostic is still limited. SV detection algorithms relay on different properties of the underlying data and vary in accuracy and sensitivity; therefore, SV detection process usually utilizes multiple variant callers. This chapter summarizes strengths and limitations of different tools in effective NGS SV calling

IntechOpen

EvoPipes.net: Bioinformatic Tools for Ecological and Evolutionary Genomics

Author: Barker Michael S.
Challa R. Sashikiran
Dinh Louie
Dlugosch Katrina M.
Kane Nolan C.
King Matthew G.
Rieseberg Loren H.
Publication venue: Libertas Academica
Publication date: 01/01/2010
Field of study

Recent increases in the production of genomic data are yielding new opportunities and challenges for biologists. Among the chief problems posed by next-generation sequencing are assembly and analyses of these large data sets. Here we present an online server, http://EvoPipes.net, that provides access to a wide range of tools for bioinformatic analyses of genomic data oriented for ecological and evolutionary biologists. The EvoPipes.net server includes a basic tool kit for analyses of genomic data including a next-generation sequence cleaning pipeline (SnoWhite), scaffolded assembly software (SCARF), a reciprocal best-blast hit ortholog pipeline (RBH Orthologs), a pipeline for reference protein-based translation and identification of reading frame in transcriptome and genomic DNA (TransPipe), a pipeline to identify gene families and summarize the history of gene duplications (DupPipe), and a tool for developing SSRs or microsatellites from a transcriptome or genomic coding sequence collection (findSSR). EvoPipes.net also provides links to other software developed for evolutionary and ecological genomics, including chromEvol and NU-IN, as well as a forum for discussions of issues relating to genomic analyses and interpretation of results. Overall, these applications provide a basic bioinformatic tool kit that will enable ecologists and evolutionary biologists with relatively little experience and computational resources to take advantage of the opportunities provided by next-generation sequencing in their systems

Addressing challenges in the production and analysis of illumina sequencing data

Author: A McKenna
A Meyerhans
AW Briggs
B Langmead
C Trapnell
CJ Creighton
D Reich
DJ Lahr
DR Bentley
DR Zerbino
EH Turner
ER Mardis
GJ Porreca
H Li
H Li
HA Burbano
J Krause
J Rougemont
Janet Kelso
KD Hansen
L Mamanova
M Fedurco
M Kircher
M Kircher
M Meyer
MA Quail
Martin Kircher
MJ Chaisson
ML Metzker
MM DeAngelis
N Whiteford
Patricia Heyn
PC Dolan
R Li
R Li
RE Green
RE Green
RM Durbin
S Hoffmann
S Paabo
SC Schuster
SJ Odelberg
T Lassmann
WC Kao
WJ Ansorge
Y Erlich
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with

Springer - Publisher Connector

MPG.PuRe

AdapterRemoval:easy cleaning of next generation sequencing reads

Author: Lindgreen Stinus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

BACKGROUND: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. FINDINGS: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. CONCLUSIONS: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data

Copenhagen University Research Information System

Archivio istituzionale della ricerca - Università di Trieste

Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching

Author: Alon
Ana Maria Suzuki
Baltimore
Batut
Benjamini
Bourgon
Carninci
Carninci
Charles Plessy
Cloonan
Dave T. P. Tang
Fan
Goetz
Guttman
Hirzmann
Islam
Islam
Jayaprakash
Kapteyn
Kawano
Kivioja
Ko
Konig
Lassmann
Li
Li
Maeda
Marioni
Matsumura
Matz
Md Salimullah
Needleman
Ohtake
Piero Carninci
Plessy
Raffaella Calligaris
Ramskold
Robinson
Salimullah
Schmidt
Schneider
Shiroguchi
Stefano Gustincich
Takahashi
Temin
Trapnell
Wang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 23/11/2012
Field of study

Template switching (TS) has been an inherent mechanism of reverse transcriptase, which has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and short RNA sequencing. TS is an attractive option, given the simplicity of the protocol, which does not require an adaptor mediated step and thus minimizes sample loss. As such, it has been used in several studies that deal with limited amounts of RNA, such as in single cell studies. Additionally, TS has also been used to introduce DNA barcodes or indexes into different samples, cells or molecules. This labeling allows one to pool several samples into one sequencing flow cell, increasing the data throughput of sequencing and takes advantage of the increasing throughput of current sequences. Here, we report TS artifacts that form owing to a process called strand invasion. Due to the way in which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by introducing unsystematic biases. We describe a strategy that eliminates these artifacts in silico and propose an experimental solution that suppresses biases from TS