Search CORE

6 research outputs found

STELLAR: fast and exact local alignments

Author: A Döring
A Gogol-Döring
A Marzal
A Mortazavi
AE Darling
AN Arslan
B Langmead
B Paten
B Raphael
Birte Kehr
D Weese
David Weese
H Jiang
H Li
H Li
I Dubchak
Knut Reinert
KR Rasmussen
M Blanchette
MS Waterman
P Jokinen
PH Sellers
R Li
S Burkhardt
S Karlin
S Rumble
S Schwartz
S Tweedie
SF Altschul
SF Altschul
TF Smith
TW Lam
WJ Kent
WR Pearson
Z Zhang
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for <it>ε</it>-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at <url>http://www.seqan.de/projects/stellar</url>. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at <url>http://www.seqan.de</url>.</p

Institutional Repository of the Freie Universität Berlin

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

Author: Gorrie Claire L
Holt Kathryn E
Judd Louise M
Wick Ryan R
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/06/2017
Field of study

The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler

Crossref

LSHTM Research Online

Directory of Open Access Journals

University of Melbourne Institutional Repository

FigShare

Visual programming for next-generation sequencing data analytics

Author: A Bottaro
A Doring
A Gogol-Döring
A Grada
A Nordell Markovits
AD Beggs
AN Desai
AR Wattam
B Berger
B Giardine
B Yu
C Shyr
CE Mason
CE Mason
DC Koboldt
DW Barnett
E Evans
EL Dijk van
F Butler
F Finotello
F James
F Milicchio
F Sanger
FM Facio
Franco Milicchio
G Berger
G Gremme
H Hauswedell
H Li
H Mangalam
H Ohashi
HJ Liu
HP Buermans
J Busby
J Dutheil
J Goecks
J Nilsson
J Plieskatt
J Xuan
Jae Min
JE Stajich
Jiang Bian
K Mensaert
K Okonechnikov
M Bahassi el
M Baker
M Kearse
M MacLaurin
M Vyverman
Mattia Prosperi
MH Schulz
ML Metzker
N Goto
N Pataki
N Shu
O Golosova
P Rice
PJ Cock
R Bao
R Brooks
R Jain
R Rahn
RC Holland
RD Hawkins
Rebecca Rose
RF Service
RK Madduri
TP Niedringhaus
TRG Green
V Makinen
W Huber
WR Pitt
Y Feng
YL Ying
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Dissecting multiple sequence alignment methods : the analysis, design and development of generic multiple sequence alignment components in SeqAn

Author: Rausch T.
Publication venue: Freie Universität
Publication date: 11/05/2010
Field of study

Multiple sequence alignments are an indispensable tool in bioinformatics. Many applications rely on accurate multiple alignments, including protein structure prediction, phylogeny and the modeling of binding sites. In this thesis we dissected and analyzed the crucial algorithms and data structures required to construct such a multiple alignment. Based upon that dissection, we present a novel graph-based multiple sequence alignment program and a new method for multi-read alignments occurring in assembly projects. The advantage of the graph-based alignment is that a single vertex can represent a single character, a large segment or even an abstract entity such as a gene. This gives rise to the opportunity to apply the consistencybased progressive alignment paradigm to alignments of genomic sequences. The proposed multi-read alignment method outperforms similar methods in terms of alignment quality and it is apparently one of the first methods that can readily be used for insert sequencing. An important aspect of this thesis was the design, the development and the integration of the essential multiple sequence alignment components in the SeqAn library. SeqAn is a software library for sequence analysis that provides the core algorithmic components required to analyze large-scale sequence data. SeqAn aims at bridging the current gap between algorithm theory and available practical implementations in bioinformatics. Hence, we always describe in conjunction to the theoretical development of the methods, the actual implementation of the data structures and algorithms in order to strengthen the use of SeqAn as an experimental platform for rapidly developing and testing applications. All presented methods are part of the open source SeqAn library that can be downloaded from our website, www.seqan.de

MPG.PuRe

Biological Sequence Analysis using the SeqAn C++ Library

Author: Gogol-Döring A.
Reinert K.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2010
Field of study

Before the SeqAn project, there was clearly a lack of available implementations in sequence analysis, even for standard tasks. Implementations of needed algorithmic components were either unavailable or hard to access in third-party monolithic software products. Addressing these concerns, the developers of SeqAn created a comprehensive, easy-to-use, open source C++ library of efficient algorithms and data structures for the analysis of biological sequences. Written by the founders of this project, Biological Sequence Analysis Using the SeqAn C++ Library covers the SeqAn library, its documentation, and the supporting infrastructure. The first part of the book describes the general library design. It introduces biological sequence analysis problems, discusses the benefit of using software libraries, summarizes the design principles and goals of SeqAn, details the main programming techniques used in SeqAn, and demonstrates the application of these techniques in various examples. Focusing on the components provided by SeqAn, the second part explores basic functionality, sequence data structures, alignments, pattern and motif searching, string indices, and graphs. The last part illustrates applications of SeqAn to genome alignment, consensus sequence in assembly projects, suffix array construction, and more. This handy book describes a user-friendly library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn enables not only the implementation of new algorithms, but also the sound analysis and comparison of existing algorithms

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)