Search CORE

211 research outputs found

RNA secondary structure prediction using large margin methods

Author: De Bona F.
Ong C.
Rätsch G.
Zien A.
Publication venue
Publication date: 01/07/2007
Field of study

The secondary structure of RNA is essential for its biological role. Recently, Do, Woods, Batzoglou, (ISMB 2006) proposed a probabilistic approach that generalizes SCFGs using conditional maximum likelihood to estimate the model parameters. We propose an alternative approach to parameter estimation which is based on an SVM-like large margin method

MPG.PuRe

Kernel Methods for Predictive Sequence Analysis

Author: Ong C.
Rätsch G.
Publication venue
Publication date: 01/09/2006
Field of study

This tutorial is meant for a broad audience: Students, researchers, biologists and computer scientist interested in (a) an overview of general and efficient algorithms for statistical learning used in computational biology, (b) sequence kernels for the problems such as promoter or splice site detection. No specific knowledge will be required since the tutorial is self-contained and most fundamental concepts are introduced during the course

MPG.PuRe

Towards the Inference of Graphs on Ordered Vertexes

Author: Ong C.
Rätsch G.
Zien A.
Publication venue: Max Planck Institute for Biological Cybernetics
Publication date: 01/08/2006
Field of study

We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs

MPG.PuRe

Asymmetric Totally-corrective Boosting for Real-time Object Detection

Author: A. Demiriz
C. Zhu
G. Rätsch
J. Friedman
J. Wu
P. Viola
P. Viola
S. Boyd
S.Z. Li
Publication venue
Publication date: 01/01/2010
Field of study

Real-time object detection is one of the core problems in computer vision. The cascade boosting framework proposed by Viola and Jones has become the standard for this problem. In this framework, the learning goal for each node is asymmetric, which is required to achieve a high detection rate and a moderate false positive rate. We develop new boosting algorithms to address this asymmetric learning problem. We show that our methods explicitly optimize asymmetric loss objectives in a totally corrective fashion. The methods are totally corrective in the sense that the coefficients of all selected weak classifiers are updated at each iteration. In contract, conventional boosting like AdaBoost is stage-wise in that only the current weak classifier's coefficient is updated. At the heart of the totally corrective boosting is the column generation technique. Experiments on face detection show that our methods outperform the state-of-the-art asymmetric boosting methods.Comment: 14 pages, published in Asian Conf. Computer Vision 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Adelaide Research & Scholarship

The Australian National University

PALMA: Perfect Alignments using Large Margin Algorithms

Author: Hepp B.
Ong C.
Rätsch G.
Schulze U.
Publication venue
Publication date: 01/09/2006
Field of study

Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50 of the query sequence, it still aligns 95 of all sequences correctly, while other methods achieve less than 36 accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma

Fraunhofer-ePrints

MPG.PuRe

Exploiting physico-chemical properties in string kernels

Author: B Peters
B Shen
C Leslie
C Leslie
C Leslie
Christian Widmer
CS Ong
CS Ong
CW Tung
G Rätsch
G Rätsch
G Schweikert
Gunnar Rätsch
H Rangwala
H Saigo
J Weston
L Jacob
M Röttig
M Venkatarajan
N Pfeifer
Nora C Toussaint
Oliver Kohlbacher
R Kuang
RM Clark
S Henikoff
S Kawashima
S Sonnenburg
S Sonnenburg
SJ Schultheiss
V Roth
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas. Results We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels. Conclusions In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference. Availability Data sets, code and additional information are available from <url>http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask</url>. Implementations of the developed kernels are available as part of the Shogun toolbox.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Inferring latent task structure for Multitask Learning by Multiple Kernel Learning

Author: B Schölkopf
C Chang
C Leslie
Christian Widmer
F Bach
G Rätsch
G Schweikert
Gunnar Rätsch
H Daumé
H Daumé III
J Blitzer
J Robinson
L Bottou
L Jacob
L Jacob
M Kloft
Nora C Toussaint
P Gehler
R Caruana
S Sonnenburg
Schuller Ben-David
T Evgeniou
T Evgeniou
T Joachims
V Vapnik
Y Xue
Yasemin Altun
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published <it>q</it>-Norm MKL algorithm. Results We demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities<it> ab initio</it> along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against. Conclusions We present a novel approach to Multitask Learning that is capable of learning task similarity along with the classifiers. The framework is very general as it allows to incorporate prior knowledge about tasks relationships if available, but is also able to identify task similarities in absence of such prior information. Both variants show promising results in applications from Computational Biology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

The Feature Importance Ranking Measure

Author: A. Graf
B. Schölkopf
B. Üstün
C. Strobl
C. Strobl
G. Rätsch
G.R.G. Lanckriet
J. Friedman
J. Schäfer
K. Bennett
M. Laan van der
R. Tibshirani
S. Sonnenburg
S. Sonnenburg
Publication venue
Publication date: 01/01/2009
Field of study

Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations.Comment: 15 pages, 3 figures. to appear in the Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 200

arXiv.org e-Print Archive

Probabilistic Clustering of Time-Evolving Distance Data

Author: AK Jain
AY Ng
C Leslie
CP Robert
D Blei
DD Lee
DM Blei
Gunnar Rätsch
H Saigo
J Pitman
Julia E. Vogt
M Bilodeau
Marius Kloft
MB Eisen
MS Srivastava
P McCullagh
P McCullagh
RM Neal
S Sonnenburg
Sandhya Prabhakaran
SN MacEachern
Stefan Stark
Sudhir S. Raman
SVN Vishwanathan
TS Ferguson
TW Anderson
Volker Roth
WJ Ewens
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time

arXiv.org e-Print Archive

Crossref

edoc

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics

Author: A Frank
A Frank
A Zien
AA Klammer
Andreas Leinenbach
AV Gorshkov
B Schölkopf
C Igel
C Leslie
C Oh
C Schley
CC Chang
Christian G Huber
CJC Burges
CT Mant
DN Perkins
EF Strittmatter
G Rätsch
G Rätsch
H Toll
JA Taylor
JK Eng
JL Meek
JP Dworzanski
JP Vert
K Petritis
K Petritis
LY Geer
M Sturm
MJ MacCoss
Nico Pfeifer
O Kohlbacher
O Krokhin
Oliver Kohlbacher
OV Krokhin
P Meinicke
R Craig
R Kaliszan
RE Moore
S Henikoff
S Sonnenburg
T Lingner
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. Results We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly. Conclusion The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central