Search CORE

MPG.PuRe

DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS

Author: A. R. Subramanian
B. Morgenstern
Brudno
Do
E. Corel
Edgar
Edgar
Edgar
Feng
Heringa
Lenhof
Montgomerie
Morgenstern
Morgenstern
Morgenstern
P. Meinicke
Pohler
R. Steinkamp
S. Hiran
Subramanian
Subramanian
Taylor
Thompson
Wong
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

We introduce web interfaces for two recent extensions of the multiple-alignment program DIALIGN. DIALIGN-TX combines the greedy heuristic previously used in DIALIGN with a more traditional ‘progressive’ approach for improved performance on locally and globally related sequence sets. In addition, we offer a version of DIALIGN that uses predicted protein secondary structures together with primary sequence information to construct multiple protein alignments. Both programs are available through ‘Göttingen Bioinformatics Compute Server’ (GOBICS)

CiteSeerX

Word correlation matrices for protein sequence analysis and remote homology detection

Author: A Ben-Hur
A Krogh
AG Murzin
C Leslie
C Leslie
CS Leslie
G Cohen
H Rangwala
H Saigo
J Park
L Liao
O Chapelle
Peter Meinicke
QW Dong
R Finn
R Kuang
SF Altschul
T Jaakkola
T Lingner
TF Smith
Thomas Lingner
UniProtConsortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. Results In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection. Conclusion Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

Directory of Open Access Journals

Incremental Feature Model Synthesis for Clone-and-Own Software Systems in MATLAB/Simulink

Author: Acher M.
Alalfi M.
Alalfi M. H.
Andersen N.
Botterweck G.
Bürdek J.
Czarnecki K.
Deissenboeck F.
Deissenboeck F.
Dhungana D.
Dubinsky Y.
El-Sharkawy S.
Engels G.
Fenske W.
Font J.
Font J.
Font J.
Haber A.
Holthusen S.
Kehrer T.
Krueger C.W.
Mazo R.
Meinicke J.
Merschen D.
Metzger A.
Nešić D.
Nieke M.
Pohl R.
Reicherdt R.
Riva C.
Rosiak K.
Rubin J.
Rubin J.
Ryssel U.
Sanen F.
Schlie A.
Schlie A.
Schlie A.
Schlie A.
Schroeter J.
Seidl C.
She S.
Software Productivity Consortium Services Corporation.
Thum T.
Thum T.
Wehling K.
Wille D.
Wille D.
Wille D.
Wąsowski A.
Xue Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2020
Field of study

The IT University of Copenhagen's Repository

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics

Author: A Frank
A Frank
A Zien
AA Klammer
Andreas Leinenbach
AV Gorshkov
B Schölkopf
C Igel
C Leslie
C Oh
C Schley
CC Chang
Christian G Huber
CJC Burges
CT Mant
DN Perkins
EF Strittmatter
G Rätsch
G Rätsch
H Toll
JA Taylor
JK Eng
JL Meek
JP Dworzanski
JP Vert
K Petritis
K Petritis
LY Geer
M Sturm
MJ MacCoss
Nico Pfeifer
O Kohlbacher
O Krokhin
Oliver Kohlbacher
OV Krokhin
P Meinicke
R Craig
R Kaliszan
RE Moore
S Henikoff
S Sonnenburg
T Lingner
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. Results We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly. Conclusion The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p

Directory of Open Access Journals

Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps

Author: A Aharoni
A Schilmiller
AK Jain
Alexander Kaever
B von Malek
Burkhard Morgenstern
C Delker
C Guy
C Wasternack
C Wasternack
Cornelia Göbel
D Jiang
DH Sanchez
E Grata
E Pohjanen
G Glauser
GR Gray
H Weber
I Stenzel
Ivo Feussner
J Leon
JD Gibbons
K Dettmer
Kirstin Feussner
L Tarpley
M Steinfath
O Fiehn
O Fiehn
O Miersch
P Reymond
Peter Meinicke
Petr Karlovsky
R Bhalla
S Wiklund
T Graepel
T Heskes
T Kohonen
The Arabidosis Genome Iniative
Thomas Lingner
V Shulaev
Publication venue: BioMed Central
Publication date: 26/06/2008
Field of study

Gene prediction in metagenomic fragments: A large scale machine learning approach

Author: A Lukashin
AL Delcher
BE Suzek
Burkhard Morgenstern
CJ van Rijsbergen
CM Bishop
CS Riesenfeld
D Frishman
DA Benson
DJC MacKay
F Sanger
F Wilcoxon
GW Tyson
H Noguchi
HY Ou
IT Nabney
J Besemer
J Handelsman
JC Venter
K Chen
Katharina J Hoff
KE Rudd
L Krause
M Ronaghi
M Tech
M Tech
Maike Tech
MS Rappe
P Hugenholtz
P Nielson
Peter Meinicke
R Amann
R Daniel
R Daniel
R Development Core Team
RA Edwards
Rolf Daniel
S Altschul
S Voget
SG Tringe
T Hastie
T Jarvie
Thomas Lingner
V Torsvik
VB Bajic
W Streit
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p

Directory of Open Access Journals

Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization

Author: A Malaspina
A Sanyal
A Tanay
BE Boser
C Elkan
C Widmer
E de Wit
E Lieberman-Aiden
F Ay
G Rätsch
G Rätsch
H Hamada
J Dekker
J Dekker
J Dostie
J Harrow
JO Yáñez-Cuna
JR Dixon
JR Hughes
KJ Brookes
L Jacob
M Simonis
MJ Fullwood
MJ Zeitz
N Cope
N Heidari
N Varoquaux
Nico Pfeifer
P Meinicke
P Vogt
R Edgar
S Ramamoorthy
Sarvesh Nikumbh
SSP Rao
T Evgeniou
T Evgeniou
T Lingner
TD Schneider
WA Bickmore
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

arXiv.org e-Print Archive

Learning a peptide-protein binding affinity predictor with kernel ridge regression

Author: A Dömling
AJ Bordner
AJ Bordner
AJ Smola
Alexandre Drouin
AR Ortiz
B Hoffmann
B Peters
B Schölkopf
C Rasmussen
CS Leslie
Dana-Farber Cancer Institute
François Laviolette
G Rätsch
H Saigo
J Qiu
J Robinson
J Shawe-Taylor
J Swets
J Wells
Jacques Corbeil
JL Faulon
JM Perez-De-Vega
L Costantino
L Jacob
L Jacob
L Zhang
M Hue
M Nielsen
M Nielsen
M Takarabe
Mario Marchand
N Nagamine
N Toussaint
P Meinicke
P Vanhee
P Vanhee
P Zhou
PL Toogood
R Albert
Sébastien Giguère
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/07/2012
Field of study

We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development.Comment: 22 pages, 4 figures, 5 table