2 research outputs found

    Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties.

    Get PDF
    Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed

    An algorithm for data quality assessment in predictive toxicology

    No full text
    Lack of the quality of the information that is integrated from heterogeneous sources is an important issue in many scientific domains. In toxicology the importance is even greater since the data is used for Quantitative Structure Activity Relationship (QSAR) modeling for prediction of chemical toxicity of new compounds. Much work has been done on QSARs but little attention has been paid to the quality of the data used. The underlying concept points to the absence of the quality criteria framework in this domain. This paper presents a review on some of the existing data quality assessment methods in various domains and their relevance and possible application to predictive toxicology, highlights number of data quality deficiencies from experimental work on internal data and also proposes some quality metrics and an algorithm for assessing data quality concluded from the results
    corecore