6 research outputs found
AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment
<p>Abstract</p> <p>Background</p> <p>Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community.</p> <p>Results</p> <p>This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment.</p> <p>Conclusions</p> <p>AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.</p
Quantitative structure-pharmacokinetic relationship modelling: apparent volume of distribution
The purpose of this study was to develop a quantitative structure–activity relationship (QSAR) for the
prediction of the apparent volume of distribution (Vd) in man for a heterogeneous series of drugs.
The relationship of many computed, and some experimental, structural descriptors with Vd, and the
Vd corrected for protein binding (unbound Vd), was investigated. Models were constructed using
stepwise regression analysis for all the 70 drugs in the dataset, as well as for acidic drugs and basic
drugs separately. The predictive power of the models was assessed using half the chemicals as a test
set, and revealed that the models for Vd yielded lower prediction errors than those constructed for
the unbound Vd (mean fold error of 2.01 for Vd compared with 2.28 for unbound Vd). Moreover, the
separation of the compounds into acids and bases did not reduce the prediction error significantly