Search CORE

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics

Author: A Frank
A Frank
A Zien
AA Klammer
Andreas Leinenbach
AV Gorshkov
B Schölkopf
C Igel
C Leslie
C Oh
C Schley
CC Chang
Christian G Huber
CJC Burges
CT Mant
DN Perkins
EF Strittmatter
G Rätsch
G Rätsch
H Toll
JA Taylor
JK Eng
JL Meek
JP Dworzanski
JP Vert
K Petritis
K Petritis
LY Geer
M Sturm
MJ MacCoss
Nico Pfeifer
O Kohlbacher
O Krokhin
Oliver Kohlbacher
OV Krokhin
P Meinicke
R Craig
R Kaliszan
RE Moore
S Henikoff
S Sonnenburg
T Lingner
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. Results We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly. Conclusion The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p

Incremental Feature Model Synthesis for Clone-and-Own Software Systems in MATLAB/Simulink

Author: Acher M.
Alalfi M.
Alalfi M. H.
Andersen N.
Botterweck G.
Bürdek J.
Czarnecki K.
Deissenboeck F.
Deissenboeck F.
Dhungana D.
Dubinsky Y.
El-Sharkawy S.
Engels G.
Fenske W.
Font J.
Font J.
Font J.
Haber A.
Holthusen S.
Kehrer T.
Krueger C.W.
Mazo R.
Meinicke J.
Merschen D.
Metzger A.
Nešić D.
Nieke M.
Pohl R.
Reicherdt R.
Riva C.
Rosiak K.
Rubin J.
Rubin J.
Ryssel U.
Sanen F.
Schlie A.
Schlie A.
Schlie A.
Schlie A.
Schroeter J.
Seidl C.
She S.
Software Productivity Consortium Services Corporation.
Thum T.
Thum T.
Wehling K.
Wille D.
Wille D.
Wille D.
Wąsowski A.
Xue Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2020
Field of study

The IT University of Copenhagen's Repository

Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps

Author: A Aharoni
A Schilmiller
AK Jain
Alexander Kaever
B von Malek
Burkhard Morgenstern
C Delker
C Guy
C Wasternack
C Wasternack
Cornelia Göbel
D Jiang
DH Sanchez
E Grata
E Pohjanen
G Glauser
GR Gray
H Weber
I Stenzel
Ivo Feussner
J Leon
JD Gibbons
K Dettmer
Kirstin Feussner
L Tarpley
M Steinfath
O Fiehn
O Fiehn
O Miersch
P Reymond
Peter Meinicke
Petr Karlovsky
R Bhalla
S Wiklund
T Graepel
T Heskes
T Kohonen
The Arabidosis Genome Iniative
Thomas Lingner
V Shulaev
Publication venue: BioMed Central
Publication date: 26/06/2008
Field of study

Repository for Publications and Research Data

Support Vector Machines and Kernels for Computational Biology

ISSN:1553-734XISSN:1553-735

Fraunhofer-ePrints

MPG.PuRe

Gene prediction in metagenomic fragments: A large scale machine learning approach

Author: A Lukashin
AL Delcher
BE Suzek
Burkhard Morgenstern
CJ van Rijsbergen
CM Bishop
CS Riesenfeld
D Frishman
DA Benson
DJC MacKay
F Sanger
F Wilcoxon
GW Tyson
H Noguchi
HY Ou
IT Nabney
J Besemer
J Handelsman
JC Venter
K Chen
Katharina J Hoff
KE Rudd
L Krause
M Ronaghi
M Tech
M Tech
Maike Tech
MS Rappe
P Hugenholtz
P Nielson
Peter Meinicke
R Amann
R Daniel
R Daniel
R Development Core Team
RA Edwards
Rolf Daniel
S Altschul
S Voget
SG Tringe
T Hastie
T Jarvie
Thomas Lingner
V Torsvik
VB Bajic
W Streit
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p

Public Library of Science (PLOS)

Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

Author: A Holst-Jensen
A Rosling
AE Arnold
AS Amend
B Michot
B Michot
B Michot
C Lozupone
C Lozupone
C Quince
C Quince
C Stubben
CP Kurtzman
CW Schadt
D Begerow
D Hibbett
D van Tuinen
DH Huson
DH Huson
DL Hawksworth
DL Hawksworth
DL Lindner
DL Taylor
DM Simon
DP Faith
DP Faith
DS Hibbett
DS Hibbett
DS Hibbett
E Bellemain
E Lara
E Lara
E Pruesse
F Lutzoni
FA Matsen
G. Brian Golding
GL Rosen
GL Rosen
GM Veldman
H Stockinger
J Kuczynski
J Reeder
J Reeder
J-M Moncalvo
J-M Moncalvo
J-M Moncalvo
Jason E. Stajich
JE Stajich
JR Bray
JR Cole
JW Fell
JW Spatafora
K Abarenkov
K Munch
K Munch
K O'Donnell
K-L Liu
KA Seifert
L Tedersoo
LB Koski
LG Nagy
M Krüger
MD Jones
ME Smith
MN Schnare
N Hassouna
O Kårén
O Ovaskainen
P Meinicke
PM Brock
Q Wang
R Kjøller
R Vilgalys
RC Edgar
S John
SA Berger
SA Rehner
SC Goslee
SF Altschul
SG Acinas
SM Huse
SM Huse
T Nagahama
T Urich
TD Bruns
Teresita M. Porter
TJ White
TM Gihring
TM Porter
TM Porter
TM Porter
TY James
TZ DeSantis
V Kunin
W Ludwig
Z Liu
Publication venue: Public Library of Science
Publication date: 27/04/2012
Field of study

Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys

The mining of toxin-like polypeptides from EST database by single residue distribution analysis

Author: A Hotz-Wagenblatt
AA Vassilevski
AA Venn
AE Christie
B Lee
B Waegele
BL Sollod
BM Olivera
C Alsen
C Sabourault
CV Jongeneel
D Darmer
EC Silva
Eugene Grishin
G Anderluh
H Hemmi
H Jornvall
H Schweitz
HH Chou
ID McFarlane
J Chen
J Mistry
J Parkinson
JD Bendtsen
JK Bonfield
K Shiomi
L Beress
M Gajewski
MS Boguski
O Castaneda
P Coggill
P Escoubas
P Meinicke
Q Dong
S Diochot
S Kozlov
S Kozlov
S Ranganathan
S Rudd
S Sunagawa
SA Kozlov
SA Kozlov
SA Kozlov
Sergey Kozlov
SF Altschul
SG Conticello
SH Nagaraj
SH Nam
T Honma
TE Scheetz
TF Duda Jr
Y Katsukura
Y Moran
Y Moran
YA Andreev
Z Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Warwick Research Archives Portal Repository

Critical Assessment of Metagenome Interpretation:A benchmark of metagenomics software

Author: A Mikheenko
Aaron E Darling
Adrian Fritz
Alexander Sczyrba
Alexey Gurevich
Alice C McHardy
Andreas Bremges
B Liu
Bernhard Y Renard
Bertrand Denis
Burton K H Chia
C Lozupone
Charles Deltel
Chirag Jain
Christopher Quince
Claire Lemaitre
D Coil
D Koslicki
D Koslicki
D Koslicki
D Li
D Turaev
Daniel A Cuevas
David Koslicki
DD Kang
DE Wood
DH Huson
Dmitrij Turaev
Dominique Lavenier
Dongwan Don Kang
E Pruesse
Edward M Rubin
Eik Dahms
Fernando Meyer
Genivaldo Gueiros Z Silva
GG Silva
Guillaume Rizk
H Klingenberg
Hans-Peter Klenk
Heiner Klingenberg
HH Lin
Hsin-Hung Lin
I Gregor
Ivan Gregor
J Alneberg
J Dröge
JA Chapman
Jeff L Froula
Jeffrey J Cook
Jessika Fiedler
Johannes Dröge
Julia A Vorholt
K Mavromatis
KT Konstantinidis
Lars Hestbjerg Hansen
M Arumugam
M Balvočiūtė
M Strous
M Yassour
Marc Strous
Markus Göker
Matthew Z DeMaere
Michael Beckstette
Michael D Barton
Mihai Pop
ML Bendall
Monika Balvočiūtė
N Kashtan
N Sangwan
N Segata
Nicole Shapiro
Nikos C Kyrpides
Niranjan Nagarajan
NP Nguyen
O Koren
P Belmann
Paul Schulze-Lefert
Peter Belmann
Peter Hofmann
Peter Meinicke
Philip D Blood
Pierre Peterlongo
R Chikhi
R Ounit
Rayan Chikhi
Robert A Edwards
Robert Egan
RR Miller
Ruben Garrido-Oter
S Boisvert
S Chatterjee
S Gao
S Lindgreen
S Sunagawa
Stefan Janssen
Stephan Majda
Steven W Singer
Surya Saha
Søren J Sørensen
T Thomas
Tanja Woyke
Thomas Lingner
Thomas Rattei
Tue Sparholt Jørgensen
V Marx
VC Piro
Vitor C Piro
Y Bai
Yang Bai
Yu-Chieh Liao
Yu-Wei Wu
YW Wu
Zhong Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceIn metagenome analysis, computational methods for assembly, taxonomic profilingand binning are key components facilitating downstream biological datainterpretation. However, a lack of consensus about benchmarking datasets andevaluation metrics complicates proper performance assessment. The CriticalAssessment of Metagenome Interpretation (CAMI) challenge has engaged the globaldeveloper community to benchmark their programs on datasets of unprecedentedcomplexity and realism. Benchmark metagenomes were generated from newlysequenced ~700 microorganisms and ~600 novel viruses and plasmids, includinggenomes with varying degrees of relatedness to each other and to publicly availableones and representing common experimental setups. Across all datasets, assemblyand genome binning programs performed well for species represented by individualgenomes, while performance was substantially affected by the presence of relatedstrains. Taxonomic profiling and binning programs were proficient at high taxonomicranks, with a notable performance decrease below the family level. Parametersettings substantially impacted performances, underscoring the importance ofprogram reproducibility. While highlighting current challenges in computationalmetagenomics, the CAMI results provide a roadmap for software selection to answerspecific research questions

Roskilde Universitet

HAL Descartes

MPG.PuRe

Hal-Diderot

Repository for Publications and Research Data