Search CORE

17,959 research outputs found

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

Author: Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

arXiv.org e-Print Archive

Publications at Bielefeld University

Bacterial microevolution and the Pangenome

Author: A Bankevich
AE Darling
AE Darling
AJ Page
AO Kislyuk
B Charlesworth
C Buckee
C Collins
C Wiuf
CM Thomas
CS Pepperell
DJ Wilson
DR Zerbino
E Jacox
F Lassalle
GE Sims
GJ Szollosi
GJ Szollosi
GJ Szollosi
H Ochman
IJ Wilson
J Hedge
J Lawrence
JB Joy
JFC Kingman
KAA Jolley
KE Dingle
KT Konstantinidis
L Li
L Petersen
M Csurös
M Nordborg
M Pagel
M Steinegger
M Touchon
M Vos
M Vos
M Vos
MJ Ward
MTG Holden
NA Rosenberg
NJ Croucher
P Donnelly
PAP Moran
R Griffiths
RC Griffiths
RG Everitt
RK Aziz
S Castillo-Ramírez
S Kurtz
S Wright
SF Altschul
SK Sheppard
SK Sheppard
SS Abby
SV Angiuoli
T Ohta
T Seemann
TG Vaughan
WP Maddison
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
X Didelot
Z Yang
Z Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

The comparison of multiple genome sequences sampled from a bacterial population reveals considerable diversity in both the core and the accessory parts of the pangenome. This diversity can be analysed in terms of microevolutionary events that took place since the genomes shared a common ancestor, especially deletion, duplication, and recombination. We review the basic modelling ingredients used implicitly or explicitly when performing such a pangenome analysis. In particular, we describe a basic neutral phylogenetic framework of bacterial pangenome microevolution, which is not incompatible with evaluating the role of natural selection. We survey the different ways in which pangenome data is summarised in order to be included in microevolutionary models, as well as the main methodological approaches that have been proposed to reconstruct pangenome microevolutionary history

Crossref

Warwick Research Archives Portal Repository

Parameter estimation in pair hidden Markov models

Author: Baum L.
Cover T. M.
Csiszar I.
Durbin R.
Ibragimov I. A.
Kingman J. F. C.
Publication venue: 'Wiley'
Publication date: 05/10/2005
Field of study

This paper deals with parameter estimation in pair hidden Markov models (pair-HMMs). We first provide a rigorous formalism for these models and discuss possible definitions of likelihoods. The model being biologically motivated, some restrictions with respect to the full parameter space naturally occur. Existence of two different Information divergence rates is established and divergence property (namely positivity at values different from the true one) is shown under additional assumptions. This yields consistency for the parameter in parametrization schemes for which the divergence property holds. Simulations illustrate different cases which are not covered by our results.Comment: corrected typo

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

CLEVER: Clique-Enumerating Variant Finder

Author: Bauer Markus
Canzar Stefan
Costa Ivan
Klau Gunnar
Marschall Tobias
Schliep Alexander
Schönhuth Alexander
Publication venue
Publication date: 01/01/2012
Field of study

Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

CWI's Institutional Repository

Publikationsserver der RWTH Aachen University

Publications at Bielefeld University

Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

Author: BENTLEY J.L.
BUCHBERGER B.
David Eppstein
DURAN B. S.
GOTOH O.
MATIAS Y.
SUPOWIT K.J.
YIANILOS P.N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

arXiv.org e-Print Archive

CiteSeerX

Crossref

Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition

Author: Ocampo Sepúlveda Jorge Carlos
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 11/01/2012
Field of study

A human-computer interface is developed to provide services of computer assisted machine translation (CAT) and computer assisted transcription of handwritten text images (CATTI). The back-end machine translation (MT) and handwritten text recognition (HTR) systems are provided by the Pattern Recognition and Human Language Technology (PRHLT) research group. The idea is to provide users with easy to use tools to convert interactive translation and transcription feasible tasks. The assisted service is provided by remote servers with CAT or CATTI capabilities. The interface supplies the user with tools for efficient local edition: deletion, insertion and substitution.Ocampo Sepúlveda, JC. (2009). Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition. http://hdl.handle.net/10251/14318Archivo delegad

RiuNet

An FPGA-based Web server for high performance biological sequence alignment

Author: Benkrid Abdsamad
Benkrid K.
Kasap S.
Liu Ying
Publication venue
Publication date: 01/01/2009
Field of study

Portsmouth University Research Portal (Pure)