Search CORE

139 research outputs found

NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

Author: MA CHAO
Publication venue
Publication date: 04/09/2012
Field of study

Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

D-Scholarship@Pitt

Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more

Author: Glen Robert
Marcus David
Mitchell John B. O.
Mussa Hamse Yussuf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2015
Field of study

Mussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support.Background In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.Publisher PDFPeer reviewe

Crossref

PubMed Central

Spiral - Imperial College Digital Repository

University of St. Andrews - Pure

St Andrews Research Repository

Full "Laplacianised" posterior naive Bayesian algorithm

Author: Glen RC
Mitchell JBO
Mussa HY
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2013
Field of study

BACKGROUND: In the last decade the standard Naive Bayes (SNB) algorithm has been widely employed in multi–class classification problems in cheminformatics. This popularity is mainly due to the fact that the algorithm is simple to implement and in many cases yields respectable classification results. Using clever heuristic arguments “anchored” by insightful cheminformatics knowledge, Xia et al. have simplified the SNB algorithm further and termed it the Laplacian Corrected Modified Naive Bayes (LCMNB) approach, which has been widely used in cheminformatics since its publication. In this note we mathematically illustrate the conditions under which Xia et al.’s simplification holds. It is our hope that this clarification could help Naive Bayes practitioners in deciding when it is appropriate to employ the LCMNB algorithm to classify large chemical datasets. RESULTS: A general formulation that subsumes the simplified Naive Bayes version is presented. Unlike the widely used NB method, the Standard Naive Bayes description presented in this work is discriminative (not generative) in nature, which may lead to possible further applications of the SNB method. CONCLUSIONS: Starting from a standard Naive Bayes (SNB) algorithm, we have derived mathematically the relationship between Xia et al.’s ingenious, but heuristic algorithm, and the SNB approach. We have also demonstrated the conditions under which Xia et al.’s crucial assumptions hold. We therefore hope that the new insight and recommendations provided can be found useful by the cheminformatics community

Crossref

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

University of St. Andrews - Pure

Large scale study of multiple-molecule queries

Author: Baldi Pierre F
Nasr Ramzi J
Swamidass S Joshua
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family. Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics. Results Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics. Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two previously defined methods, the Exponential Tanimoto Discriminant (ETD), the Tanimoto Power Discriminant (TPD), and the Binary Kernel Discriminant (BKD), outperform most other methods but are more complex, requiring one or two parameters to be fit to the data. Conclusion Fourteen methods for multiple-molecule querying of chemical databases, including novel methods, (ETD) and (TPD), are validated using publicly available data sets, standard cross-validation protocols, and established metrics. The best results are obtained with ETD, TPD, BKD, MAX-SIM, and MIN-RANK. These results can be replicated and compared with the results of future studies using data freely downloadable from <url>http://cdb.ics.uci.edu/</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MOFGalaxyNet: a social network analysis for predicting guest accessibility in metal–organic frameworks utilizing graph convolutional networks

Author: Jalali Mehrdad
Wonanke A. D. Dinga
Wöll Christof
Publication venue: SpringerOpen
Publication date: 02/11/2023
Field of study

Metal–organic frameworks (MOFs), are porous crystalline structures comprising of metal ions or clusters intricately linked with organic entities, displaying topological diversity and effortless chemical flexibility. These characteristics render them apt for multifarious applications such as adsorption, separation, sensing, and catalysis. Predominantly, the distinctive properties and prospective utility of MOFs are discerned post-manufacture or extrapolation from theoretically conceived models. For empirical researchers unfamiliar with hypothetical structure development, the meticulous crystal engineering of a high-performance MOF for a targeted application via a bottom-up approach resembles a gamble. For example, the precise pore limiting diameter (PLD), which determines the guest accessibility of any MOF cannot be easily inferred with mere knowledge of the metal ion and organic ligand. This limitation in bottom-up conceptual understanding of specific properties of the resultant MOF may contribute to the cautious industrial-scale adoption of MOFs. Consequently, in this study, we take a step towards circumventing this limitation by designing a new tool that predicts the guest accessibility—a MOF key performance indicator—of any given MOF from information on only the organic linkers and the metal ions. This new tool relies on clustering different MOFs in a galaxy-like social network, MOFGalaxyNet, combined with a Graphical Convolutional Network (GCN) to predict the guest accessibility of any new entry in the social network. The proposed network and GCN results provide a robust approach for screening MOFs for various host–guest interaction studies

KITopen

Recommended from our members

Target Fishing: A Single-Label or Multi-Label Problem?

Author: Afzal AM
Bender A
Glen RC
Mussa HY
Turner RE
Publication venue: arXiv
Publication date: 23/11/2014
Field of study

According to Cobanoglu et al and Murphy, it is now widely acknowledged that the single target paradigm (one protein or target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - that is, it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has been approached computationally in different ways. In this study we confine attention to the so-called ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. With a few exceptions, the target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can home in on one specific target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naive Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds and 308 targets retrieved from the ChEMBL17 database. SMM and MMM performed differently: for 16,344 test compounds, the MMM model returned recall and precision values of 0.8058 and 0.6622, respectively; the corresponding recall and precision values yielded by the SMM model were 0.7805 and 0.7596, respectively. However, at a significance level of 0.05 and one degree of freedom McNemar test performed on the target prediction results returned by SMM and MMM for the 16,344 test ligands gave a chi-squared value of 15.656, in favour of the MMM approach

Apollo (Cambridge)

A Machine Learning Approach for the Identification of a Treatment against Chagas Disease

Author: Jiménez Rubén
Publication venue
Publication date: 01/01/2017
Field of study

In this final degree project we have presented a machine learning approach to predict the biological activity of FDA approved drugs against T. cruzi. We believe that the proposed methodology will expand the state-of-art of machine learning in the Chagas disease drug discovery pipeline. We have obtained similar performance results with the work presented in but applied only to FDA approved drugs as a repurposing strategy. A final contribution of this work is the biological evaluation provided by the metabolic pathway analysis. This evaluation allows us to map FDA approved drugs onto T. cruzi metabolic pathways. This validation is useful because it incorporates important informa tion of how the drugs target T. cruzi. Finding a subset of drugs that come up from differently motivated experiments is promising. The fact that among our results are drugs that already have been tested in the past against Chagas disease is encouraging evidence that our approaches are able to produce reasonable candidates for drug repurposing. Additionally, the majority of the drugs present in our results were never tested against T. cruzi, confirming the novelty of our approaches.CONACYT – Consejo Nacional de Ciencia y TecnologíaPROCIENCI

Repositorio Institucional CONACYT

Analysis of coding principles in the olfactory system and their application in cheminformatics

Author: Schmuker Michael
Publication venue
Publication date: 05/03/2008
Field of study

Unser Geruchssinn vermittelt uns die Wahrnehmung der chemischen Welt. Im Laufe der Evolution haben sich in unserem olfaktorischen System Mechanismen entwickelt, die wahrscheinlich optimal auf die Erfüllung dieser Aufgabe angepasst sind. Die Analyse dieser Verarbeitungsstrategien verspricht Einblicke in effiziente Algorithmen für die Kodierung und Verarbeitung chemischer Information, deren Entwicklung und Anwendung dem Kern der Chemieinformatik entspricht. In dieser Arbeit nähern wir uns der Entschlüsselung dieser Mechanismen durch die rechnerische Modellierung von funktionellen Einheiten des olfaktorischen Systems. Hierbei verfolgten wir einen interdisziplinären Ansatz, der die Gebiete der Chemie, der Neurobiologie und des maschinellen Lernens mit einbezieht

Hochschulschriftenserver - Universität Frankfurt am Main

Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector