33 research outputs found
Application of portfolio optimization to drug discovery
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.In this work, a problem of selecting a subset of molecules, which are potential lead candidates for drug discovery, is considered. Such molecule subset selection problem is formulated as a portfolio optimization, well known and studied in financial management. The financial return, more precisely the return rate, is interpreted as return rate from a potential lead and calculated as a product of gain and probability of success (probability that a selected molecule becomes a lead), which is related to performance of the molecule, in particular, its (bio-)activity. The risk is associated with not finding active molecules and is related to the level of diversity of the molecules selected in portfolio. It is due to potential of some molecules to contribute to the diversity of the set of molecules selected in portfolio and hence decreasing risk of portfolio as a whole. Even though such molecules considered in isolation look inefficient, they are located in sparsely sampled regions of chemical space and are different from more promising molecules. One way of computing diversity of a set is associated with a covariance matrix, and here it is represented by the Solow-Polasky measure. Several formulations of molecule portfolio optimization are considered taking into account the limited budget provided for buying molecules and the fixed size of the portfolio. The proposed approach is tested in experimental settings for three molecules datasets using exact and/or evolutionary approaches. The results obtained for these datasets look promising and encouraging for application of the proposed portfolio-based approach for molecule subset selection in real settings
The development of the advanced web shop based on purchase history
The goal of thesis is to develop a typical web shop application with some additional functionality. This functionality enables web shop customers to browse products in a more efficient way and thus makes shop more profitable. For this purpose, we developed a specific mechanism that handles product presentation in customer adapted way.
First we describe technologies used for development. Programing language C# is presented shortly as well as some other frameworks (ASP.net, Entity framework,), libraries (LINQ) and other web technologies (HTML, CSS, AJAX). For storing and manipulating data a database with tables in MS SQL database is created.
Furthermore we take a look at requirements, idea and logic of solution. We present solution design and present how specific functionality behaves in case of different user types. We present a solution analysis where a comparison with other similar solutions and user tests are shown. Finally we discuss problems during the development and possibilities about the future improvements
Proteochemometric modeling in a Bayesian framework
International audienceProteochemometric (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty in predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivities measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with a number of small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R 2 0 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the Normalized Polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results
Construction of balanced, chemically dissimilar training, validation and test sets for machine learning on molecular datasets
When preparing training, validation and test sets for machine learning on molecular datasets, it is desirable to combine two requirements: 1) robustness, i.e. making a test set that is chemically dissimilar from the training set; 2) data balance, i.e. ensuring that the proportion of data points and the distribution of data labels (categorical) / data values (continuous) are as homogeneous as possible among the sets, for each individual property to model, while partitioning the overall set of compounds as required. Recent literature shows that meeting both these requirements simultaneously is sometimes very difficult. This is especially true for multi-task learning, but also for single-task learning if one aims to balance the distribution of data labels or values, too. In this work we present a method that resolves this issue by first carrying out a chemistry-guided clustering of the initial dataset to ensure the separation of chemical matter, and subsequently applying linear programming to select the lists of clusters that – once assembled into the final sets – result in the best possible data balance
Selecting an Optimal Number of Binding Site Waters To Improve Virtual Screening Enrichments Against the Adenosine A<sub>2A</sub> Receptor
A major
challenge in structure-based virtual screening (VS) involves
the treatment of explicit water molecules during docking in order
to improve the enrichment of active compounds over decoys. Here we
have investigated this in the context of the adenosine A<sub>2A</sub> receptor, where water molecules have previously been shown to be
important for achieving high enrichment rates with docking, and where
the positions of some binding site waters are known from a high-resolution
crystal structure. The effect of these waters (both their presence
and orientations) on VS enrichment was assessed using a carefully
curated set of 299 high affinity A<sub>2A</sub> antagonists and 17,337
decoys. We show that including certain crystal waters greatly improves
VS enrichment and that optimization of water hydrogen positions is
needed in order to achieve the best results. We also show that waters
derived from a molecular dynamics simulation î—¸ without any
knowledge of crystallographic waters î—¸ can improve enrichments
to a similar degree as the crystallographic waters, which makes this
strategy applicable to structures without experimental knowledge of
water positions. Finally, we used decision trees to select an ensemble
of structures with different water molecule positions and orientations
that outperforms any single structure with water molecules. The approach
presented here is validated against independent test sets of A<sub>2A</sub> receptor antagonists and decoys from the literature. In
general, this water optimization strategy could be applied to any
target with waters-mediated protein–ligand interactions
Enhancing hit discovery in virtual screening through accurate calculation of absolute protein-ligand binding free energies
In the hit identification stage of drug discovery, a diverse chemical space needs to be explored to identify initial hits. Contrary to empirical scoring functions, absolute protein-ligand binding free energy perturbation (ABFEP) provides a theoretically more rigorous and accurate description of protein-ligand binding thermodynamics and could in principle greatly improve the hit rates in virtual screening. In this work, we describe an implementation of an accurate and reliable ABFEP method in FEP+. We validated the ABFEP method on eight congeneric compound series binding to eight protein receptors including both neutral and charged ligands. For ligands with net charges, the alchemical ion approach is adopted to avoid artifacts in electrostatic potential energy calculations. The calculated binding free energies are highly correlated with experimental results with the weighted average of R2 of 0.55 for the entire dataset and an overall RMSE of 1.1 kcal/mol when protein reorganization effect upon ligand binding was accounted for. Through ABFEP calculations using apo versus holo protein structures, we demonstrated that the protein conformational and protonation state changes between the apo and holo proteins are the main physical factors contributing to the protein reorganization free energy manifested by the overestimation of raw ABFEP calculated binding free energies using the holo structures of the proteins. Furthermore, we performed ABFEP calculations in three virtual screening applications for hit enrichment. ABFEP greatly improves the hit rates as compared to docking scores or other methods like metadynamics. The highly accurate ABFEP results demonstrated in this work position it as a useful tool to improve the hit rates in virtual screening, thus facilitate hit discovery