Search CORE

74 research outputs found

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

Author: Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

arXiv.org e-Print Archive

Publications at Bielefeld University

Next Generation Cluster Editing

Author: Bellitto Thomas
Klau Gunnar W.
Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

arXiv.org e-Print Archive

CWI's Institutional Repository

Eliminating Blind Spots in Genetic Variant Discovery

Author: Marschall T. (Tobias)
Schönhuth A. (Alexander)
Publication venue
Publication date: 01/01/2016
Field of study

CWI's Institutional Repository

CLEVER: Clique-Enumerating Variant Finder

Author: Bauer Markus
Canzar Stefan
Costa Ivan
Klau Gunnar
Marschall Tobias
Schliep Alexander
Schönhuth Alexander
Publication venue
Publication date: 01/01/2012
Field of study

Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

Crossref

CWI's Institutional Repository

Publikationsserver der RWTH Aachen University

Publications at Bielefeld University

Repeat- and Error-Aware Comparison of Deletions

Author: Maekinen V. (Veli)
Marschall T. (Tobias)
Schönhuth A. (Alexander)
Wittler R.
Publication venue: Oxford U.P.
Publication date: 01/09/2015
Field of study

Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster accumulation of next-generation sequencing data. A major issue is comparability. Standards that address the combined problem of inaccurately predicted breakpoints and repeat-induced ambiguities are missing. This decisively lowers the quality of ‘consensus’ callsets and hampers the removal of duplicate entries in variant databases, which can have deleterious effects in downstream analyses. Results: We introduce a sound framework for comparison of deletions that captures both tool-induced inaccuracies and repeat-induced ambiguities. We present a maximum matching algorithm that outputs virtual duplicates among two sets of predictions/annotations. We demonstrate that our approach is clearly superior over ad hoc criteria, like overlap, and that it can reduce the redundancy among callsets substantially. We also identify large amounts of duplicate entries in the Database of Genomic Variants, which points out the immediate relevance of our approach. Availability and implementation: Implementation is open source and available from https://bitbucket.org/readdi/readd

CWI's Institutional Repository

Repeat- and error-aware comparison of deletions

Author: Makinen Veli
Marschall Tobias
Schönhuth Alexander
Wittler Roland
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Wittler R, Marschall T, Schönhuth A, Makinen V. Repeat- and error-aware comparison of deletions. Bioinformatics. 2015;31(18):2947-2954

Publications at Bielefeld University

SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines

Author: Falquet Laurent
Leung Wai Yi
Maoz Tiffanie Yael
Marschall Tobias
Mei Hailiang
Paudel Yogesh
Schönhuth Alexander
Publication venue
Publication date: 01/01/2015
Field of study

Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a) Creating an automated, standardized pipeline for SV prediction. b) Identifying the best tool(s) for SV prediction through benchmarking. c) Providing a statistically sound method for merging SV calls

Crossref

Springer - Publisher Connector

CWI's Institutional Repository

PubMed Central

Wageningen University & Research Publications

Publications at Bielefeld University

RERO DOC Digital Library

MPG.PuRe

Discovering motifs that induce sequencing errors

Author: Allhoff M.C. (Manuel)
Costa I.G.
Marschall T. (Tobias)
Martin M. (Marcel)
Rahmann S. (Sven)
Schönhuth A. (Alexander)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

CWI's Institutional Repository

Characteristics of de novo structural changes in the human genome

Author: et al not CWI
Guryev V. (Victor)
Kloosterman W.P. (Wigard)
Marschall T. (Tobias)
Schönhuth A. (Alexander)
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2015
Field of study

Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1–20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations

CWI's Institutional Repository

Discovering motifs that induce sequencing errors

Author: Allhoff M.C. (Manuel)
Costa I.G.
Marschall T. (Tobias)
Martin M. (Marcel)
Rahmann S. (Sven)
Schönhuth A. (Alexander)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

CWI's Institutional Repository