Search CORE

645 research outputs found

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Author: A Bankevich
A Dayarian
A1 Gurevich
AC English
AV Zimin
B Chevreux
CS Chin
DA Rasko
DR Zerbino
FJ Ribeiro
JT Simpson
KF Au
L Salmela
M Boetzer
Marten Boetzer
MJ Chaisson
R Li
S Boisvert
S Koren
S Koren
SF Altschul
SL Salzberg
SM Goldberg
V Deshpande
Walter Pirovano
X Jiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Author: Auvinen Petri
Jernvall Jukka
Kammonen Juhana I.
Koskinen Patrik
Laine Pia
Paulin Lars
Pereira Pedro A. B.
Smolander Olli-Pekka
Publication venue
Publication date: 01/01/2019
Field of study

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

A comprehensive evaluation of assembly scaffolding tools

Author: Chris Newbold
Martin Hunt
Matthew Berriman
Thomas D Otto
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees

Author: Burton K. H. Chia
Denis Bertrand
Niranjan Nagarajan
Song Gao
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

10.1186/s13059-016-0951-yGenome Biology17110

Crossref

Springer - Publisher Connector

PubMed Central

ScholarBank@NUS

BESST - Efficient scaffolding of large fragmented assemblies

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

Assembly, quantification, and downstream analysis for high trhoughput sequencing data

Author: Mandric Igor
Publication venue: ScholarWorks @ Georgia State University
Publication date: 07/08/2018
Field of study

Next Generation Sequencing is a set of relatively recent but already well-established technologies with a wide range of applications in life sciences. Despite the fact that they are constantly being improved, multiple challenging problems still exist in the analysis of high throughput sequencing data. In particular, genome assembly still suffers from inability of technologies to overcome issues related to such structural properties of genomes as single nucleotide polymorphisms and repeats, not even mentioning the drawbacks of technologies themselves like sequencing errors which also hinder the reconstruction of the true reference genomes. Other types of issues arise in transcriptome quantification and differential gene expression analysis. Processing millions of reads requires sophisticated algorithms which are able to compute gene expression with high precision and in reasonable amount of time. Following downstream analysis, the utmost computational task is to infer the activity of biological pathways (e.g., metabolic). With many overlapping pathways challenge is to infer the role of each gene in activity of a given pathway. Assignment products of a gene to a wrong pathway may result in misleading differential activity analysis, and thus, wrong scientific conclusions. In this dissertation I present several algorithmic solutions to some of the enumerated problems above. In particular, I designed scaffolding algorithm for genome assembly and created new tools for differential gene and biological pathways expression analysis

ScholarWorks @ Georgia State University