Search CORE

16 research outputs found

Analytical model of peptide mass cluster centres with applications

Author: Emde Anne-Katrin
Farrow Malcolm
Lalowski Maciej
Lehrach Hans
Reinert Knut
Wolski Witold E
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The elemental composition of peptides results in formation of distinct, equidistantly spaced clusters across the mass range. The property of peptide mass clustering is used to calibrate peptide mass lists, to identify and remove non-peptide peaks and for data reduction. RESULTS: We developed an analytical model of the peptide mass cluster centres. Inputs to the model included, the amino acid frequencies in the sequence database, the average length of the proteins in the database, the cleavage specificity of the proteolytic enzyme used and the cleavage probability. We examined the accuracy of our model by comparing it with the model based on an in silico sequence database digest. To identify the crucial parameters we analysed how the cluster centre location depends on the inputs. The distance to the nearest cluster was used to calibrate mass spectrometric peptide peak-lists and to identify non-peptide peaks. CONCLUSION: The model introduced here enables us to predict the location of the peptide mass cluster centres. It explains how the location of the cluster centres depends on the input parameters. Fast and efficient calibration and filtering of non-peptide peaks is achieved by a distance measure suggested by Wool and Smilansky

Springer - Publisher Connector

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

MDC Repository

MPG.PuRe

A novel and well-defined benchmarking method for second generation read mapping

Author: A Döring
A Valouev
Anne-Katrin Emde
B Langmead
C Alkan
C Amid
D Weese
DA Wheeler
David Weese
DR Bentley
ER Mardis
G Myers
G Navarro
H Li
J Deng
J Dohm
J Qin
KJ McKernan
Knut Reinert
M Holtgrewe
Manuel Holtgrewe
P Sanders
R Guigó
R Li
SB Ng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task. Results We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format. Conclusions We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html

Institutional Repository of the Freie Universität Berlin

Crossref

Springer - Publisher Connector

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions

Author: A Italiano
A McKenna
A Rose Brannon
Agnes Viale
Andrea Cercek
Anne-Katrin Emde
Brooke E Sylvester
CA Pratilas
CR Boland
CT Saunders
D Santini
David B Solit
Dayna M Oschwald
E Cerami
E Domingo
E Vakiani
Efsevia Vakiani
ER Fearon
Gregory McDermott
H Li
H Li
H Peradziryi
HH Won
Jinru Shia
JS Vermaat
K Cibulskis
Krishan Kania
LA Diaz Jr
Leonard B Saltz
M Gerlinger
Martin R Weiser
MH Voss
Michael F Berger
Michael I D’Angelica
N Knijn
Nancy E Kemeny
O Lavi
PL Bedard
R Firestein
Rona Yaeger
Ronak H Shah
S Jones
S Misale
S Yachida
SA Forbes
Sasinya N Scott
Vladimir Vacic
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

von Read Mapping zur Detektion von genomischen Variationen

Author: Emde Anne-Katrin
Publication venue
Publication date: 01/01/2013
Field of study

Next-Generation-Sequencing (NGS) has brought on a revolution in sequence analysis with its broad spectrum of applications ranging from genome resequencing to transcriptomics or metagenomics, and from fundamental research to diagnostics. The tremendous amounts of data necessitate highly efficient computational analysis tools for the wide variety of NGS applications. This thesis addresses a broad range of key computational aspects of resequencing applications, where a reference genome sequence is known and heavily used for interpretation of the newly sequenced sample. It presents tools for read mapping and benchmarking, for partial read mapping of small RNA reads and for structural variant/indel detection, and finally tools for detecting and genotyping SNVs and short indels. Our tools efficiently scale to large NGS data sets and are well-suited for advances in sequencing technology, since their generic algorithm design allows handling of arbitrary read lengths and variable error rates. Furthermore, they are implemented within the robust C++ library SeqAn, making them open-source, easily available, and potentially adaptable for the bioinformatics community. Among other applications, our tools have been integrated into a large-scale analysis pipeline and have been applied to large datasets, leading to interesting discoveries of human retrocopy variants and insights into the genetic causes of X-linked intellectual disabilities.Neuste DNA-Sequenzieungstechnologien (kurz genannt NGS Technologien) ermöglichen revolutionäre neue Anwendungen, die sowohl von Genomresequenzierung über Transkriptomsequenzierung zu Metagenomik als auch von Grundlagenforschung zu Diagnostik reichen. Problematisch ist dabei die Flut an Daten, die eine grosse Herausforderung für die Bionformatik darstellt. Hocheffiziente Analysesoftware ist von enormer Wichtigkeit für das breite Spektrum von NGS Anwendungen. Diese Arbeit adressiert mehrere Schlüsselaspekte der Analyse von Resequenzierungsdaten, bei der ein bereits sequenziertes Referenzgenom als Grundlage für die Interpretation eines neu sequenzierten Datensatzes dient. Es werden Algorithmen und Programme präsentiert für das sogenannte Read Mapping Problem und für die Auswertung der Güte seiner Lösung, für partielles Read Mapping, welches in miRNA Studien und bei der Suche nach strukturellen Variationen Anwendung findet, sowie letztlich zum Auffinden und Genotypisieren von Basenmutationen und kurzen Insertionen/Deletionen im Genom. Die vorgestellten Algorithmen sind effizient und so gestaltet, dass sie auch bei Fortschritten in Sequenzierungstechnologien weiterhin anwendbar und skalierbar bleiben. Zudem sind sie in der robusten C++ Bibliothek SeqAn implementiert, was sie leicht zugänglich und adaptierbar macht. Unter anderem wurden unsere Tools in eine Hochdurchsatz-Analysepipeline integriert und auf grosse Datensaetze angewendet, wodurch interessante biologische Erkenntnisse (vorallem im Zusammenhang X-Chromosom gebundener geistiger Behinderung) gewonnen werden konnten

Institutional Repository of the Freie Universität Berlin

Segment-based multiple sequence alignment

Author: Döring Andreas
Emde Anne-Katrin
Notredame Cedric
Rausch Tobias
Reinert Knut
Weese David
Publication venue
Publication date: 01/01/2008
Field of study

Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far. Results: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences

CiteSeerX

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

MPG.PuRe

Robust consensus computation

Author: A Döring
Anne-Katrin Emde
C Notredame
Knut Reinert
M Schatz
T Rausch
Tobias Rausch
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS

Author: Albers
Alkan
Ameur
Anne-Katrin Emde
Au
Barski
Bentley
Burkhardt
Chen
Chen
David Weese
Durbin
Döring
Eid
Green
Holtgrewe
Homer
Iafrate
Johnston
Kalscheuer
Knut Reinert
Korbel
Krawitz
Lee
Lenski
Li
Marcel H. Schulz
Martin Vingron
McKenna
McKernan
Medvedev
Metzker
Mills
Mills
Mullaney
Myers
Ng
Pinkel
Rasmussen
Ruping Sun
Sherry
Stankiewicz
Stefan A. Haas
Stenson
Stratton
Vera M. Kalscheuer
Wang
Wang
Weese
Wheeler
Wu
Xie
Ye
Yoon
Zeitouni
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

MOTIVATION: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY: SplazerS is available from http://www.seqan.de/projects/ splazers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

MPG.PuRe

Data S1. Human PGBD5 DNA transposase promotes site-specific oncogenic mutations in rhabdoid tumors

Supplementary data S1 for Henssen et al. " Human PGBD5 DNA transposase promotes site-specific oncogenic mutations in rhabdoid tumors

ZENODO

The Francis Crick Institute