Search CORE

589 research outputs found

Estimation of the applicability domain of kernel-based machine learning models for virtual screening

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

PubMed Central

A Survey on Graph Kernels

Author: Johansson Fredrik D.
Kriege Nils M.
Morris Christopher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification

arXiv.org e-Print Archive

DSpace@MIT

Chalmers Research

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification

Author: Czarnecki Wojciech
Jastrzębski Stanisław
Podlewska Sabina
Sieradzki Igor
Publication venue: ENGINE Center. Wroclaw University of Technology
Publication date: 01/01/2015
Field of study

Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy

CiteSeerX

Jagiellonian Univeristy Repository

Kernel-based estimation of the applicability domain of QSAR models

Author: A Jahn
A Zell
Georg Hinselmann
Nikolas Fechner
Publication venue: Springer Nature
Publication date: 04/05/2010
Field of study

Springer - Publisher Connector

PubMed Central

Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems

Author: Arnott
Bergstrom
Bhat
Box
Breiman
Brown
Brown
Cai
Cherkasov
Dudek
Fang
Fühner
Galton
Gedeck
Glick
Golbraikh
Helfert
Hermundstad
Hopfinger
Hughes
Jorgensen
Karatzoglou
Karthikeyan
Kubat
Kuhn
Kvålseth
Liaw
Lin
Lipinski
Lipinski
Liu
Llinàs
Lusci
Mardia
McDonagh
Mitchell
Muggleton
Nantasenamat
Needham
Nigsch
Noble
Oshiro
Palmer
Pao
Partalas
Ran
Rasmussen
Schroeter
Schwaighofer
Sebastiani
Spatola
Surowiecki
Svetnik
Team
Tesauro
Tipping
Tropsha
Walton
Williams
Williams
Yang
Publication venue: 'Wiley'
Publication date: 01/09/2015
Field of study

The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. Thinvestigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the ‘wisdom of crowds’ principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.PostprintPeer reviewe

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Rosenbaum Lars
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Representation learning of drug and disease terms for drug repositioning

Author: Anand Ashish
Manchanda Sahil
Publication venue
Publication date: 20/05/2017
Field of study

Drug repositioning (DR) refers to identification of novel indications for the approved drugs. The requirement of huge investment of time as well as money and risk of failure in clinical trials have led to surge in interest in drug repositioning. DR exploits two major aspects associated with drugs and diseases: existence of similarity among drugs and among diseases due to their shared involved genes or pathways or common biological effects. Existing methods of identifying drug-disease association majorly rely on the information available in the structured databases only. On the other hand, abundant information available in form of free texts in biomedical research articles are not being fully exploited. Word-embedding or obtaining vector representation of words from a large corpora of free texts using neural network methods have been shown to give significant performance for several natural language processing tasks. In this work we propose a novel way of representation learning to obtain features of drugs and diseases by combining complementary information available in unstructured texts and structured datasets. Next we use matrix completion approach on these feature vectors to learn projection matrix between drug and disease vector spaces. The proposed method has shown competitive performance with state-of-the-art methods. Further, the case studies on Alzheimer's and Hypertension diseases have shown that the predicted associations are matching with the existing knowledge.Comment: Accepted to appear in 3rd IEEE International Conference on Cybernetics (Spl Session: Deep Learning for Prediction and Estimation

arXiv.org e-Print Archive

Crossref