33 research outputs found

    Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models

    Get PDF
    The study was aimed at investigating how the method of splitting data into a training set and a test set influences the external predictivity of quantitative structure-activity and/or structure-property relationships (QSAR/QSPR) models. Six models of good quality were collected from the literature and then redeveloped and validated on the basis of five alternative splitting algorithms, namely: (i) a commonly used algorithm ('Z:1'), in which every zth (e.g. third) from the compounds sorted ascending (according to the response values, y) is selected into the test set; (ii-iv) three variations of the Kennard-Stone algorithm; and (v) the duplex algorithm. The external validation statistics reported for each model served as a basis for the final comparison. We demonstrated that the splitting techniques utilizing the values of molecular descriptors alone (X) or in combination with the model response (y) always lead to the development of the models yielding better external predictivity in comparison with the models designed with methodologies based on the y-values only. Moreover, we showed that the external validation coefficient (Q2EXT) is more sensitive to the splitting technique than the root mean square error of prediction (RMSEP). This difference becomes especially important when the test set is relatively small (between 5-10 compounds). In the case of the models trained/validated with a small number of compounds, it is strongly recommended that both statistics (Q2EXT and RMSEP) are taken into account for the external predictivity evaluation.JRC.I.6-Systems toxicolog

    Similarity of multicomponent nanomaterials in a safer-by-design context : the case of core–shell quantum dots

    Get PDF
    Concepts of similarity, such as grouping, categorization, and read-across, enable a fast comparative screening of hazard, reducing animal testing. These concepts are established primarily for molecular substances. We demonstrate the development of multi-dimensional similarity assessment methods that can be applied to multicomponent nanomaterials (MCNMs) for the case of core–shell quantum dots (QDs). The term ‘multicomponent’ refers to their structural composition, which consists of up to four different heavy metals (cadmium, zinc, copper, indium) in different mass percentages, with different morphologies and surface chemistries. The development of concepts of similarity is also motivated by the increased need for comparison of innovative against conventional materials in the safe and sustainable by design (SSbD) context. This case study thus considers the industrial need for an informed balance of functionality and safety: we propose two different approaches to compare and rank the case study materials amongst themselves and against well-known benchmark materials, here ZnO NM110, BaSO4 NM220, TiO2 NM105, and CuO. Relative differences in the sample set are calibrated against the biologically relevant range. The choice of properties that are subjected to similarity assessment is guided by the integrated approaches to testing and assessment (IATA) for the inhalation hazard of simple nanomaterials, which recommends characterizing QDs by (i) dynamic dissolution in lung simulant fluids and (ii) the surface reactivity in the abiotic ferric reducing ability of serum (FRAS) assay. In addition, the similarity of fluorescence spectra was assessed as a measure of the QD performance for the intended functionality as a color converter. We applied two approaches to evaluate the data matrix: in the first approach, specific descriptors for each assay (i.e., leachable mass (%) and mass based biological oxidative damage (mBOD)) were selected based on expert knowledge and used as input data for generation of similarity matrices. The second approach introduces the possibility of evaluating multidimensional raw data by a meaningful similarity analysis, without the need for predefined descriptors. We discuss the strengths and weaknesses of each of the two approaches. We anticipate that the similarity assessment approach is transferable to the assessment of further advanced materials (AdMa) that are composed of multiple components

    Perspectives from the NanoSafety Modelling Cluster on the validation criteria for (Q)SAR models used in nanotechnology

    Get PDF
    Nanotechnology and the production of nanomaterials have been expanding rapidly in recent years. Since many types of engineered nanoparticles are suspected to be toxic to living organisms and to have a negative impact on the environment, the process of designing new nanoparticles and their applications must be accompanied by a thorough exposure risk analysis. (Quantitative) Structure-Activity Relationship ([Q]SAR) modelling creates promising options among the available methods for the risk assessment. These in silico models can be used to predict a variety of properties, including the toxicity of newly designed nanoparticles. However, (Q)SAR models must be appropriately validated to ensure the clarity, consistency and reliability of predictions. This paper is a joint initiative from recently completed European research projects focused on developing (Q)SAR methodology for nanomaterials. The aim was to interpret and expand the guidance for the well-known “OECD Principles for the Validation, for Regulatory Purposes, of (Q)SAR Models”, with reference to nano-(Q)SAR, and present our opinions on the criteria to be fulfilled for models developed for nanoparticles

    Computational nanotoxicology: challenges and perspectives

    No full text

    A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites

    No full text
    In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), ridge regression (RR), linear support vector regression (LSVR), and random forest (RF) regression. Based on the external prediction quality and interpretability, the PLS model was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science
    corecore