Search CORE

106 research outputs found

Scaffolder - software for manual genome scaffolding

Author: A Dayarian
CM Fraser
D Chelimsky
D Gordon
DC Richter
E Branscomb
F Zhao
Hazel A Barton
IJ Tsai
J Parkhill
JR Miller
M Boetzer
M Pop
M Pop
M Pop
Michael D Barton
N Goto
N Nagarajan
S Assefa
S Koren
Z Mulyukov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Scaffolder - Software for Reproducible Genome Scaffolding.

Author: Hazel A. Barton
Michael D. Barton
Publication venue
Publication date: 14/03/2011
Field of study

Background: Assembly of short-read sequencing data can result in a fragmented non-contiguous series of genomic sequences. Therefore a common step in a genome project is to join neighboring sequence regions together and fill gaps in the assembly using additional sequences. This scaffolding step, however, is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together, these considerations may make reproducing or editing an existing genome build difficult.

Methods: The software outlined here, “Scaffolder,” is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format, which is both human and machine-readable. Command line binaries and extensive documentation are available.

Results: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax to define the scaffold. This syntax further allows unknown regions to be defined, and adds additional sequences to fill gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with FASTA nucleotide sequence.

Conclusions: Scaffolder is easy-to-use genome scaffolding software. This tool promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs

Nature Precedings

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Author: Auvinen Petri
Jernvall Jukka
Kammonen Juhana I.
Koskinen Patrik
Laine Pia
Paulin Lars
Pereira Pedro A. B.
Smolander Olli-Pekka
Publication venue
Publication date: 01/01/2019
Field of study

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Author: A Bankevich
A Dayarian
A1 Gurevich
AC English
AV Zimin
B Chevreux
CS Chin
DA Rasko
DR Zerbino
FJ Ribeiro
JT Simpson
KF Au
L Salmela
M Boetzer
Marten Boetzer
MJ Chaisson
R Li
S Boisvert
S Koren
S Koren
SF Altschul
SL Salzberg
SM Goldberg
V Deshpande
Walter Pirovano
X Jiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ExplorePipolin: a pipeline for identification and exploration of pipolins, novel mobile genetic elements widespread among bacteria

Author: Chuprikova Liubov
Publication venue
Publication date: 01/06/2020
Field of study

Trabajo de fin de máster en Bioinformática y Biología ComputacionalPipolins constitute a new group of self-synthesizing or self-replicating mobile genetic elements (MGEs), encoding for their own replicative DNA polymerase B. These elements have been found to be mostly integrated into the genomes of bacteria from diverse phyla and also present as circular plasmids in mitochondria. Since a reduced number of pipolins has been identified and described so far, their origin and role remains unknown as well as there is little evidence of their horizontal transfer. A bioinformatics software capable of automatic identification and analysis of pipolins from bacterial genomes might ensure the progress in the accumulation of knowledge about these mobile genetic elements. Therefore, the main goal of the current project was to design and implement a pilot version of a pipeline for the identification and analysis of pipolins from Escherichia coli genomes. The pipeline should be flexible enough to easily extend it to other bacteria in the future. As a sub-goal, it was decided to perform a detailed analysis of pipolins of E. coli strains and isolates, available from the NCBI database and from the Spanish E. coli Reference Laboratory (LREC) collectio

Biblos-e Archivo

MetAMOS: A modular and open source metagenomic assembly and analysis pipeline

Author: Astrovskaya I
Darling AE
Koren S
Liu B
Ondov B
Phillippy AM
Pop M
Sommer DD
Treangen TJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

© 2013 Treangen et al. We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS

Crossref

Springer - Publisher Connector

OPUS - University of Technology Sydney

PubMed Central

eScholarship - University of California

Digital Repository at the University of Maryland

Computational Biology and Chemistry

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The use of computers and software tools in biochemistry (biology) has led to a deep revolution in basic sciences and medicine. Bioinformatics and systems biology are the direct results of this revolution. With the involvement of computers, software tools, and internet services in scientific disciplines comprising biology and chemistry, new terms, technologies, and methodologies appeared and established. Bioinformatic software tools, versatile databases, and easy internet access resulted in the occurrence of computational biology and chemistry. Today, we have new types of surveys and laboratories including “in silico studies” and “dry labs” in which bioinformaticians conduct their investigations to gain invaluable outcomes. These features have led to 3-dimensioned illustrations of different molecules and complexes to get a better understanding of nature

Directory of Open Access Books (DOAB)

Genome Assembly: Novel Applications by Harnessing Emerging Sequencing Technologies and Graph Algorithms

Author: Koren Sergey
Publication venue
Publication date: 01/01/2012
Field of study

Genome assembly is a critical first step for biological discovery. All current sequencing technologies share the fundamental limitation that segments read from a genome are much shorter than even the smallest genomes. Traditionally, whole- genome shotgun (WGS) sequencing over-samples a single clonal (or inbred) target chromosome with segments from random positions. The amount of over-sampling is known as the coverage. Assembly software then reconstructs the target. So called next-generation (or second-generation) sequencing has reduced the cost and increased throughput exponentially over first-generation sequencing. Unfortunately, next-generation sequences present their own challenges to genome assembly: (1) they require amplification of source DNA prior to sequencing leading to artifacts and biased coverage of the genome; (2) they produce relatively short reads: 100bp- 700bp; (3) the sizeable runtime of most second-generation instruments is prohibitive for applications requiring rapid analysis, with an Illumina HiSeq 2000 instrument requiring 11 days for the sequencing reaction. Recently, successors to the second-generation instruments (third-generation) have become available. These instruments promise to alleviate many of the down- sides of second-generation sequencing and can generate multi-kilobase sequences. The long sequences have the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of these reads is challenging and has limited their use. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. Our approach achieves over 99% read accuracy and produces substantially better assemblies than current sequencing strategies. The availability of cheaper sequencing has made new sequencing targets, such as multiple displacement amplified (MDA) single-cells and metagenomes, popular. Current algorithms assume assembly of a single clonal target, an assumption that is violated in these sequencing projects. We developed Bambus 2, a new scaffolder that works for metagenomics and single cell datasets. It can accurately detect repeats without assumptions about the taxonomic composition of a dataset. It can also identify biological variations present in a sample. We have developed a novel end-to-end analysis pipeline leveraging Bambus 2. Due to its modular nature, it is applicable to clonal, metagenomic, and MDA single-cell targets and allows a user to rapidly go from sequences to assembly, annotation, genes, and taxonomic info. We have incorporated a novel viewer, allowing a user to interactively explore the variation present in a genomic project on a laptop. Together, these developments make genome assembly applicable to novel targets while utilizing emerging sequencing technologies. As genome assembly is critical for all aspects of bioinformatics, these developments will enable novel biological discovery

Digital Repository at the University of Maryland

Unravelling the genome of Holy basil: an “incomparable” “elixir of life” of traditional Indian medicine

Author: A Bankevich
A Dayarian
A Mclysaght
A Shukla
AD Zimmer
AE Vinogradov
Ajit Kumar Shasany
AK Gupta
Alok Kalra
Anil Kumar Tripathi
Anonymous
AR Quinlan
B Langmead
C Feuillet
Chellappa Gopalakrishnan
D Hernandez
DA Petrov
DA Petrov
DK Yi
EN Moriyama
EV Leushkin
Feroz Khan
G Hao
Gopalakrishna Ramaswamy
H Ogata
HH Darrah
HX Zhao
J Qian
J Yu
JD Palmer
JD Thompson
JF Wendel
JL Bennetzen
JM Comeron
JS Sena
K Carovic-Stanko
K Tamura
KH Wolfe
KJ Kim
L Salmela
L Yang
M Ashburner
M Bhasin
M Boetzer
M David
M Deutsch
M Lohse
M Lynch
M Lynch
M Punta
M Stanke
MA Kelm
N Carels
N Sierro
N Singh
NM Krishnan
P Pattanayak
P Prakash
P Uma Devi
PK Gupta
R Luo
R Mariotti
R Teich
Raj Kishori Lal
RK Lal
RR Fall
S Gupta
S Lal
S Rastogi
S Shishodia
SF Altschul
Shubhra Rastogi
SK Gupta
SK Wyman
Sriram Parameswaran
T Zhang
TAG Initiative
Vikrant Gupta
WJ Kent
World Health Organization
Y Iijima
Y Moriya
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

BIOINFORMATIC TOOLS FOR NEXT GENERATION GENOMICS

Author: M. Chiara
Publication venue: Universit\ue0 degli Studi di Milano
Publication date: 20/04/2012
Field of study

New sequencing strategies have redefined the concept of \u201chigh-throughput sequencing\u201d and many companies, researchers, and recent reviews use the term \u201cNext-Generation Sequencing\u201d (NGS) instead of high-throughput sequencing. These advances have introduced a new era in genomics and bioinformatics\u2060\u2060. During my years as PhD student I have developed various software, algorithms and procedures for the analysis of Nest Generation sequencing data required for distinct biological research projects and collaborations in which our research group was involved. The tools and algorithms are thus presented in their appropriate biological contexts. Initially I dedicated myself to the development of scripts and pipelines which were used to assemble and annotate the mitochondrial genome of the model plant Vitis vinifera. The sequence was subsequently used as a reference to study the RNA editing of mitochondrial transcripts, using data produced by the Illumina and SOLiD platforms. I subsequently developed a new approach and a new software package for the detection of of relatively small indels between a donor and a reference genome, using NGS paired-end (PE) data and machine learning algorithms. I was able to show that, suitable Paired End data, contrary to previous assertions, can be used to detect, with high confidence, very small indels in low complexity genomic contexts. Finally I participated in a project aimed at the reconstruction of the genomic sequences of 2 distinct strains of the biotechnologically relevant fungus Fusarium. In this context I performed the sequence assembly to obtain the initial contigs and devised and implemented a new scaffolding algorithm which has proved to be particularly efficient

AIR Universita degli studi di Milano