Search CORE

118,248 research outputs found

Kernel learning for ligand-based virtual screening: discovery of a new PPARgamma agonist

Author: Hansen Katja
Müller Klaus-Robert
Proschak Ewgenij
Rau Oliver
Rupp Matthias
Schneider Gisbert (Prof. Dr.)
Schroeter Timon
Schubert-Zsilavecz Manfred (Prof. Dr.)
Steri Ramona
Zettl Heiko
Publication venue
Publication date: 01/01/2010
Field of study

Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) [1]. PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis [2], kernel principle component analysis [3], multiple kernel learning [4], and, Gaussian process regression [5]. In the machine learning approach to ligand-based virtual screening, one uses the similarity principle [6] to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning [7] uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK) [2] is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists [8], we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay [9] yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results

Hochschulschriftenserver - Universität Frankfurt am Main

Prediction of Atomization Energy Using Graph Kernel and Active Learning

Author: de Jong Wibe A.
Tang Yu-Hang
Publication venue: 'AIP Publishing'
Publication date: 01/01/2019
Field of study

Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effect of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7 data set

arXiv.org e-Print Archive

eScholarship - University of California

Industry Dynamics and the Distribution of Firm Sizes: A Non-Parametric Approach

Author: Enrico Santarelli
Francesca Lotti
Publication venue
Publication date
Field of study

The aim of this paper is to analyze the evolution of the size distribution of young firms within some selected industries, trying to assess the empirical implications of different models of industry dynamics: the model of passive learning (Jovanovic 1982), the model of active learning (Ericson and Pakes, 1995), and the evolutionary model (Audretsch, 1995). We use a non-parametric technique, the Kernel density estimator, applied to a data set from the Italian National Institute for Social Security (INPS), consisting in 12 cohorts of new manufacturing firms followed for 6 years. Since the patterns of convergence to the limit distribution are different between industries, we conclude that the model of passive learning is consistent with some of them, the active exploration model with others, the evolutionary model with all of them.Cohorts; Gibrats Law; Kernel; Industry Dynamics; Non-parametric; Shakeouts.

Research Papers in Economics

Active learning with kernel machines

Author: Brinker Klaus
Publication venue
Publication date
Field of study

Klaus BrinkerPaderborn, Univ., Diss., 200

Universität Paderborn - Digitale Sammlungen

Multiple kernel active learning for image classification

Author: Duan Lingyu
Gao Wen
Li Yuanning
Tian Yonghong
Yang Jingjing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Recently, multiple kernel learning (MKL) methods have shown promising performance in image classification. As a sort of supervised learning, training MKL-based classifiers relies on selecting and annotating extensive dataset. In general, we have to manually label large amount of samples to achieve desirable MKL-based classifiers. Moreover, MKL also suffers a great computational cost on kernel computation and parameter optimization. In this paper, we propose a local adaptive active learning (LA-AL) method to reduce the labeling and computational cost by selecting the most informative training samples. LA-AL adopts a top-down (or global-local) strategy for locating and searching informative samples. Uncertain samples are first clustered into groups, and then informative samples are consequently selected via inter-group and intra-group competitions. Experiments over COREL-5K show that the proposed LA-AL method can significantly reduce the demand of sample labeling and have achieved the state-of-the-art performance. ?2009 IEEE.EI

Crossref

Evaluation of machine-learning methods for ligand-based virtual screening

Author: A Bender
A Bender
A Bender
A Ormerod
A Ormerod
AE Klon
AM Capelli
AR Leach
B Chen
Beining Chen
C Williams
D Hand
D Rogers
D Wilton
DA Cosgrove
David J. Wood
DB Kitchen
DE Clark
DJ Hand
DJ Wilton
DM Hawkins
E Parzen
FL Stahura
G Harper
G Redl
G Schneider
George Papadatos
H Eckert
H Kubinyi
HM Berman
J Aitchison
J Bajorath
J Delaney
J Hert
J Hert
J Hert
JC Saeh
L Hodes
L Hodes
L Hodes
M Congreve
M Glick
M Glick
M Wagener
M Whittle
N Christianini
N Nikolova
Nikolaus Stiefl
P Constans
P Domingos
P Willett
P Willett
P Willett
P Willett
P Willett
Paulette Greenidge
Peter Willett
Q Zhang
R P Sheridan
RD Brown
RD Brown
RD Cramer
RE Carhart
RO Duda
Robert F. Harrison
S Anzali
TJ McNeany
TM Mitchell
Xiao Qing Lewell
XY Xia
YC Martin
YC Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed

Crossref

White Rose Research Online