Search CORE

58 research outputs found

Partitional clustering of protein sequences - An inductive logic programming approach

Author: A. Conesa
C. Notredame
D.J. Hand
F. Ronquist
F. Zelezný
L. Ralaivola
N.A. Fonseca
P. Rice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that; the method proposed Produces understand able descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs

Crossref

Repositório Aberto da Universidade do Porto

Interpreting linear support vector machine models with heat map molecule coloring

Author: A Bender
Andreas Jahn
Andreas Zell
B Schölkopf
C Steinbeck
D Bossemeyer
D Fourches
D Rogers
D Weininger
G Hinselmann
Georg Hinselmann
H Kubinyi
I Guyon
J Bajorath
J Kazius
J Mohr
J Orts
K Hasegawa
KD Freeman-Cook
KH Bleicher
L Han
L Prade
L Ralaivola
Lars Rosenbaum
MS Buchanan
N Fechner
P Jonathan
RE Fan
SG Rohrer
SJ Swamidass
SM Free
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity. Results We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor. Conclusions In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

Author: A Ceroni
A Dalby
C Steinbeck
CA James
Christopher JF Cameron
CJF Cameron
DE Rumelhart
Eddie YT Ma
EY Ma
EY Ma
F Fountaine
GW Kauffman
H Fang
H Gohlke
H Li
I Walsh
JA Mohr
JJ Sutherland
JJ Sutherland
L Ralaivola
LS Gold
M Bohm
R Guha
RM Blair
SA Depriest
SE Stein
Stefan C Kremer
W Tong
WS Branham
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Unconfused ultraconservative multiclass algorithms

Author: A Daniely
FR Bach
G Blanchard
H Block
K Crammer
K Crammer
L Devroye
L Valiant
Liva Ralaivola
M Minsky
MJ Kearns
N Cristianini
P Drineas
P Drineas
Ugo Louche
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem

Author: A Tatsuya
A Tatsuya
AC Good
AC Good
B Mak
BB Masek
C Steinbeck
C Steinbeck
CA Azencott
CJ Churchwell
DB Reitz
FJ Burkowski
Forbes J Burkowski
GH Bakir
HC Huang
J Shawe-Taylor
JJ Sutherland
JL Faulon
JL Faulon
JL Faulon
JL Faulon
JTY Kwok
JW Robin
K-R Müller
KA Sharp
L Ralaivola
LB Kier
LH Hall
LH Hall
MI Skvortsova
N Brown
P Chavatte
P Mahe
P Mahe
PA Pevzner
R Todeschini
RA Lewis
RC Glenn
RP Sheridan
S Mika
SJ Swamidass
V Kvasnicka
V Venkatasubramanian
VJ Gillet
William WL Wong
X Leval
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The inverse-QSAR problem seeks to find a new molecular descriptor from which one can recover the structure of a molecule that possess a desired activity or property. Surprisingly, there are very few papers providing solutions to this problem. It is a difficult problem because the molecular descriptors involved with the inverse-QSAR algorithm must adequately address the forward QSAR problem for a given biological activity if the subsequent recovery phase is to be meaningful. In addition, one should be able to construct a feasible molecule from such a descriptor. The difficulty of recovering the molecule from its descriptor is the major limitation of most inverse-QSAR methods. Results In this paper, we describe the reversibility of our previously reported descriptor, the vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our inverse-QSAR approach can be described using five steps: (1) generate the VSMMD for the compounds in the training set; (2) map the VSMMD in the input space to the kernel feature space using an appropriate kernel function; (3) design or generate a new point in the kernel feature space using a kernel feature space algorithm; (4) map the feature space point back to the input space of descriptors using a pre-image approximation algorithm; (5) build the molecular structure template using our VSMMD molecule recovery algorithm. Conclusion The empirical results reported in this paper show that our strategy of using kernel methodology for an inverse-Quantitative Structure-Activity Relationship is sufficiently powerful to find a meaningful solution for practical problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Structure-activity models of oral clearance, cytotoxicity, and LD50: a screen for promising anticancer compounds

Author: A Hoskuldsson
AE Soffers
AK Saxena
AL Boulesteix
CW Andrews
D Thai
D Zmuidinavicius
DM Hawkins
DV Nguyen
F Yoshida
G Fort
G Lou
G Wang
H Gonzalez-Diaz
H Wold
HJ Pieniaszek Jr.
IO Juranic
J Ghasemi
J Tunkel
J Wegelin
JC Boik
JC Madden
John C Boik
JR Votano
JV Turner
K Yu
L Eriksson
L Ralaivola
M Ashton
M Momma
M Olah
M Pintore
M Zahouily
MA Perez
MD Wessel
N Brown
O Isayev
P Buchwald
R Burgos-Vargas
R Caruana
R Rosipal
RK Ando
Robert A Newman
S Ben-David
S Rannar
SJ Swamidass
T Evgeniou
T Hou
T Niwa
T Wajima
TE Yen
TI Oprea
W Deng
W Halle
WJ Hunter
Y Xue
YH Zhao
YH Zhao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

gBoost: a mathematical programming approach to graph classification and regression

Author: A. Demiriz
A. Inokuchi
B. Schölkopf
C. Helma
D. G. Luenberger
E. Frank
G. Rätsch
H. Fröhrich
H. Hong
H. Kashima
H. Zou
Hiroto Saigo
J. Gasteiger
J. Kazius
J. L. Duran
J. R. Quinlan
K. M. Borgwardt
Koji Tsuda
L. Cai
L. M. Shi
L. Ralaivola
M. Hamada
O. du Merle
P. Mahé
P. Mahé
Q. V. Le
R. Durbin
R. Kohavi
R. Tibshrani
S. Abiteboul
S. Boyd
S. Nijssen
Sebastian Nowozin
T. Gärtner
T. Horváth
T. Kudo
Tadashi Kadowaki
Taku Kudo
W. W. Cohen
X. Yan
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Graph Kernels for Chemical Informatics

Author: Baldi P.
Ralaivola L.
Saigo H.
Swamidass J.
Publication venue: 'Elsevier BV'
Publication date: 01/10/2005
Field of study

Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5 on the Mutag dataset, 65-67 on the PTC (Predictive Toxicology Challenge) dataset, and 72 on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed

MPG.PuRe

Regression learning with non-identically and non-independently sampling

Author: Billingsley P.
Bousquet O.
Hongwei Sun
Meijian Zhang
Ralaivola L.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

Author: J. Bruand
J. Chen
L. Ralaivola
P. Baldi
P. Phung
S. J. Swamidass
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref