Search CORE

199 research outputs found

Current Mathematical Methods Used in QSAR/QSPR Studies

Author: Agatonovic-Kustrin
Agrafiotis
Bhonsle
Carlucci
Chen
Cheng
Cheng
Cheng
Cho
Cortes
Davies
Deeb
Du
Du
Du
Du
Du
Du
Du
Du
Elliott
Equbal
Fan
Fisz
Friedman
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gharagheizi
Gong
Goudarzi
Guha
Gunturi
Hashemianzadeh
Ibric
Ji
Jores-Kong
Joseph
Jung
Kahn
Kansal
Karimi
Katritzky
Leonard
Leonard
Li
Li
Li
Liang
Liu
Liu
Liu
Lu
Lu
Luan
Luan
Luan
Luan
Luan
Luan
Luan
Ma
Ma
Mager
Mandal
Niazi
Niazi
Niazi
Niazi
Niazi
Nunthanavanit
Om
Peixun Liu
Priolo
Psihogios
Qi
Qin
Rebehmed
Ren
Ren
Ren
Riahi
Riahi
Rogers
Rouhollahi
Roy
Roy
Samadi-Maybodi
Samee
Sammi
Sattari
Shi
Si
Si
Si
Si
Si
Specht
Srivani
Suykens
Szaleniec
Tetko
Thomas Leonard
Vapnik
Vatani
Wang
Wang
Wei Long
Wold
Word
Xia
Xia
Xia
Xia
Yang
Yap
Yin
Yuan
Yuan
Yuan
Yuan
Zhao
Zhao
Zhao
Zhao
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 22/12/2009
Field of study

This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR) studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP), Project Pursuit Regression (PPR) and Local Lazy Regression (LLR) have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR), Partial Least Squares (PLS), Neural Networks (NN), Support Vector Machine (SVM) and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future

CiteSeerX

Crossref

PubMed Central

Smooth Monotonic Networks

Author: Igel Christian
Publication venue
Publication date: 01/06/2023
Field of study

Monotonicity constraints are powerful regularizers in statistical modelling. They can support fairness in computer supported decision making and increase plausibility in data-driven scientific models. The seminal min-max (MM) neural network architecture ensures monotonicity, but often gets stuck in undesired local optima during training because of vanishing gradients. We propose a simple modification of the MM network using strictly-increasing smooth non-linearities that alleviates this problem. The resulting smooth min-max (SMM) network module inherits the asymptotic approximation properties from the MM architecture. It can be used within larger deep learning systems trained end-to-end. The SMM module is considerably simpler and less computationally demanding than state-of-the-art neural networks for monotonic modelling. Still, in our experiments, it compared favorably to alternative neural and non-neural approaches in terms of generalization performance

arXiv.org e-Print Archive

How to Explain Individual Classification Decisions

Author: Baehrens David
Hansen Katja
Harmeling Stefan
Kawanabe Motoaki
Mueller Klaus-Robert
Schroeter Timon
Publication venue
Publication date: 06/12/2009
Field of study

After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.Comment: 31 pages, 14 figure

arXiv.org e-Print Archive

MPG.PuRe

Regression or significance tests: What other choice is there?—An academic perspective

Author: Krull Marcos
Newman Michael C.
Publication venue: W&M ScholarWorks
Publication date: 01/09/2016
Field of study

Both the no-observed-effect concentration and its null hypothesis significance testing foundation have drawn steady criticism since their inceptions [1–5]. Many in our field reasonably advocate regression to avoid conventional null hypothesis significance testing shortcomings; however, regression is compromised under commonly encountered conditions (Green, present Perspective’s Challenge). As the debate to favor null hypothesis significance testing or regression methods continues into the 21st century, a sensible strategy might be to take a moment to ask, Are there now other choices? Our goal is to sketch out 1 such choice

College of William & Mary: W&M Publish

Regression or significance tests: What other choice is there?-An academic perspective Response

Author: Krull M
Newman MC
Publication venue: W&M ScholarWorks
Publication date: 01/01/2015
Field of study

Both the no-observed-effect concentration and its null hypothesis significance testing foundation have drawn steady criticism since their inceptions [1–5]. Many in our field reasonably advocate regression to avoid conventional null hypothesis significance testing shortcomings; however, regression is compromised under commonly encountered conditions (Green,present Perspective’s Challenge). As the debate to favor null hypothesis significance testing or regression methods continues into the 21st century, a sensible strategy might be to take a moment to ask, Are there now other choices? Our goal is to sketch out 1 such choice

College of William & Mary: W&M Publish

Mathematical programming for piecewise linear regression analysis

Author: Liu S
Papageorgiou LG
Tsoka S
Yang L
Publication venue: PERGAMON-ELSEVIER SCIENCE LTD
Publication date: 01/02/2016
Field of study

In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base

UCL Discovery

King's Research Portal

An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials.

Author: Aithani Laksh
Anderson Mark
Cardoso-Silva Jonathan
Cincilla Giovanni
Conduit Gareth J
Galushka Mykola
Guan Davy
Hallyburton Irene
Irwin Benedict WJ
Kirk Kiaran
Lehane Adele M
Lindblom Julia CR
Lui Raymond
Matthews Slade
McCulloch James
Motion Alice
Ng Ho Leung
Robertson Murray N
Spadavecchio Vito
Tatsis Vasileios A
Todd Matthew H
Tse Edwin G
van Hoorn Willem P
Wade Alexander D
Whitehead Thomas M
Willis Paul
Öeren Mario
Publication venue: J Med Chem
Publication date: 08/11/2021
Field of study

The Open Source Malaria (OSM) consortium is developing compounds that kill the human malaria parasite, Plasmodium falciparum, by targeting PfATP4, an essential ion pump on the parasite surface. The structure of PfATP4 has not been determined. Here, we describe a public competition created to develop a predictive model for the identification of PfATP4 inhibitors, thereby reducing project costs associated with the synthesis of inactive compounds. Competition participants could see all entries as they were submitted. In the final round, featuring private sector entrants specializing in machine learning methods, the best-performing models were used to predict novel inhibitors, of which several were synthesized and evaluated against the parasite. Half possessed biological activity, with one featuring a motif that the human chemists familiar with this series would have dismissed as "ill-advised". Since all data and participant interactions remain in the public domain, this research project "lives" and may be improved by others

University of Strathclyde Institutional Repository

UCL Discovery

Apollo (Cambridge)

University of Dundee Online Publications

Optimisation based approaches for machine learning

Author: Gkioulekas Ioannis
Publication venue: UCL (University College London)
Publication date: 28/11/2020
Field of study

Machine learning has attracted a lot of attention in recent years and it has become an integral part of many commercial and research projects, with a wide range of applications. With current developments in technology, more data is generated and stored than ever before. Identifying patterns, trends and anomalies in these datasets and summarising them with simple quantitative models is a vital task. This thesis focuses on the development of machine learning algorithms based on mathematical programming for datasets that are relatively small in size. The first topic of this doctoral thesis is piecewise regression, where a dataset is partitioned into multiple regions and a regression model is fitted to each one. This work uses an existing algorithm from the literature and extends the mathematical formulation in order to include information criteria. The inclusion of such criteria targets to deal with overfitting, which is a common problem in supervised learning tasks, by finding a balance between predictive performance and model complexity. The improvement in overall performance is demonstrated by testing and comparing the proposed method with various algorithms from the literature on various regression datasets. Extending the topic of regression, a decision tree regressor is also proposed. Decision trees are powerful and easy to understand structures that can be used both for regression and classification. In this work, an optimisation model is used for the binary splitting of nodes. A statistical test is introduced to check whether the partitioning of nodes is statistically meaningful and as a result control the tree generation process. Additionally, a novel mathematical formulation is proposed to perform feature selection and ultimately identify the appropriate variable to be selected for the splitting of nodes. The performance of the proposed algorithm is once again compared with a number of literature algorithms and it is shown that the introduction of the variable selection model is useful for reducing the training time of the algorithm without major sacrifices in performance. Lastly, a novel decision tree classifier is proposed. This algorithm is based on a mathematical formulation that identifies the optimal splitting variable and break value, applies a linear transformation to the data and then assigns them to a class while minimising the number of misclassified samples. The introduction of the linear transformation step reduces the dimensionality of the examined dataset down to a single variable, aiding the classification accuracy of the algorithm for more complex datasets. Popular classifiers from the literature have been used to compare the accuracy of the proposed algorithm on both synthetic and publicly available classification datasets

UCL Discovery

Predictive data-driven modeling approaches in environmental management decision-making

Author: Niska Harri
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Novel drug-target interactions via link prediction and network embedding

Author: Amiri Souri E
Karagiannis S N
Laddach R
Papageorgiou Lazaros G.
Tsoka S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2022
Field of study

BACKGROUND: As many interactions between the chemical and genomic space remain undiscovered, computational methods able to identify potential drug-target interactions (DTIs) are employed to accelerate drug discovery and reduce the required cost. Predicting new DTIs can leverage drug repurposing by identifying new targets for approved drugs. However, developing an accurate computational framework that can efficiently incorporate chemical and genomic spaces remains extremely demanding. A key issue is that most DTI predictions suffer from the lack of experimentally validated negative interactions or limited availability of target 3D structures. RESULTS: We report DT2Vec, a pipeline for DTI prediction based on graph embedding and gradient boosted tree classification. It maps drug-drug and protein–protein similarity networks to low-dimensional features and the DTI prediction is formulated as binary classification based on a strategy of concatenating the drug and target embedding vectors as input features. DT2Vec was compared with three top-performing graph similarity-based algorithms on a standard benchmark dataset and achieved competitive results. In order to explore credible novel DTIs, the model was applied to data from the ChEMBL repository that contain experimentally validated positive and negative interactions which yield a strong predictive model. Then, the developed model was applied to all possible unknown DTIs to predict new interactions. The applicability of DT2Vec as an effective method for drug repurposing is discussed through case studies and evaluation of some novel DTI predictions is undertaken using molecular docking. CONCLUSIONS: The proposed method was able to integrate and map chemical and genomic space into low-dimensional dense vectors and showed promising results in predicting novel DTIs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04650-w

PubMed Central

UCL Discovery

King's Research Portal