Search CORE

3,305 research outputs found

Multi-task learning for pKa prediction

Author: Hansen Katja
Rupp Matthias
Sanguinetti Guido
Skolidis Grigorios
Publication venue
Publication date: 18/06/2018
Field of study

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n=698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85% of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are availabl

RERO DOC Digital Library

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Learning Large-Scale Bayesian Networks with the sparsebn Package

Author: Aragam Bryon
Gu Jiaying
Zhou Qing
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 10/03/2018
Field of study

Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure

arXiv.org e-Print Archive

Journal of Statistical Software

Machine Learning Small Molecule Properties in Drug Discovery

Author: Arroniz Carlos
De Fabritiis Gianni
Majewski Maciej
Schapin Nikolai
Varela Alejandro
Publication venue
Publication date: 02/08/2023
Field of study

Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.Comment: 46 pages, 1 figur

arXiv.org e-Print Archive

Using multitask classification methods to investigate the kinase-specific phosphorylation sites

Author: Fang Jianwen
Fang Yaping
Gao Shan
Xu Shuo
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation. Methods A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3). Results Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods. Conclusions The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.</p

Crossref

Directory of Open Access Journals

PubMed Central

Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules

Author: Brea Johanni
Corneil Dane
Gerstner Wulfram
Lehmann Marco
Liakoni Vasiliki
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Directory of Open Access Journals

Frontiers - Publisher Connector