269 research outputs found

    Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks

    Get PDF
    © 2015 American Chemical Society. In this paper, we demonstrate that the kernel target alignment (KTA) parameter can efficiently be used to estimate the relevance of molecular descriptors for QSAR modeling on a given data set, i.e., as a modelability measure. The efficiency of KTA to assess modelability was demonstrated in two series of QSAR modeling studies, either varying different descriptor spaces for one same data set, or comparing various data sets within one same descriptor space. Considered data sets included 25 series of various GPCR binders with ChEMBL-reported pKi values, and a toxicity data set. Employed descriptor spaces covered more than 100 different ISIDA fragment descriptor types, and ChemAxon BCUT terms. Model performances (RMSE) were seen to anticorrelate consistently with the KTA parameter. Two other modelability measures were employed for benchmarking purposes: the Jaccard distance average over the data set (Div), and a measure related to the normalized mean absolute error (MAE) obtained in 1-nearest neighbors calculations on the training set (Sim = 1 - MAE). It has been demonstrated that both Div and Sim perform similarly to KTA. However, a consensus index combining KTA, Div and Sim provides a more robust correlation with RMSE than any of the individual modelability measures

    Aspects of environmental impacts of seawater desalination : Cyprus as a case study

    Get PDF
    Acknowledgements The authors are grateful to the European Commission for supporting the activities carried out in the framework of the H2020 European project ZERO BRINE (project under grant agreement No. 730390). The authors would equally like to thank the TOTAL Foundation (Project “Diversity of brown algae in the Eastern Mediterranean”) and the UK Natural Environment Research Council for their support to FCK (program Oceans 2025 – WP 4.5 and grants NE/D521522/1 and NE/J023094/1). This work also received support from the Marine Alliance for Science and Technology for Scotland pooling initiative. MASTS is funded by the Scottish Funding Council (grant reference HR09011) and contributing institutions. The authors would also like to thank representatives from competent authorities in Cyprus providing data, and specifically Nicoletta Kythreotou from the Department of Environment, George Ashikalis from the Transmission System Operator, Dr. DinosPoullis and Lia Georgiou from the Water Development Department.Peer reviewedPublisher PD

    Seismicity and crustal structure of the southern main Ethiopian rift: new evidence from Lake Abaya

    Get PDF
    The Main Ethiopian Rift (MER) has developed during the 18 Ma-Recent separation of the Nubian and Somalian plates. Extension in its central and northern sectors is associated with seismic activity and active magma intrusion, primarily within the rift, where shallow (urn:x-wiley:15252027:media:ggge22586:ggge22586-math-00015 km) seismicity along magmatic centers is commonly caused by fluid flow through open fractures in hydrothermal systems. However, the extent to which similar magmatic rifting persists into the southern MER is unknown. Using data from a temporary network of five seismograph stations, we analyze patterns of seismicity and crustal structure in the Abaya region of the southern MER. Magnitudes range from 0.9 to 4.0; earthquake depths are 0–30 km. urn:x-wiley:15252027:media:ggge22586:ggge22586-math-0002 ratios of urn:x-wiley:15252027:media:ggge22586:ggge22586-math-00031.69, estimated from Wadati diagram analysis, corroborate bulk-crustal urn:x-wiley:15252027:media:ggge22586:ggge22586-math-0004 ratios determined via teleseismic P-to-S receiver function H-urn:x-wiley:15252027:media:ggge22586:ggge22586-math-0005 stacking and reveal a relative lack of mafic intrusion compared to the MER rift sectors to the north. There is a clear association of seismicity with the western border fault system of the MER everywhere in our study area, but earthquake depths are shallow near Duguna volcano, implying a shallowed geothermal gradient associated with rift valley silicic magmatism. This part of the MER is thus interpreted best as a young magmatic system that locally impacts the geothermal gradient but that has not yet significantly modified continental crustal composition via rift-axial magmatic rifting

    Mappability of drug-like space: Towards a polypharmacologically competent map of drug-relevant compounds

    Get PDF
    © 2015 Springer International Publishing Switzerland. Intuitive, visual rendering - mapping - of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections - either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten - because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of "universality" quantitatively justified, with respect to all the structure-activity information available so far - or, more realistically, an exploitable but significant fraction thereof. The "universal" CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure-activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question "What is a good CS map

    Stargate GTM: Bridging Descriptor and Activity Spaces

    Get PDF
    © 2015 American Chemical Society. Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate"version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method

    GTM-Based QSAR Models and Their Applicability Domains

    Get PDF
    © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca2+, Gd3+ and Lu3+ complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets

    Chemical data visualization and analysis with incremental generative topographic mapping: Big data challenge

    Get PDF
    © 2014 American Chemical Society. This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases

    GTM-Based QSAR Models and Their Applicability Domains

    Get PDF
    © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca2+, Gd3+ and Lu3+ complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets
    corecore