30 research outputs found

    Analysis of bootstrap distribution of model parameters for bisphenol AF in ATG_ERa_TRANS_up assay.

    No full text
    <p><b>A-C</b>: Values for the hill model log(AC50) (A), coefficient (B), and top (C) parameters for all 1000 bootstrap samples. <b>D-F</b>: Values for the gain loss model gain log(AC50) (D), gain coefficient (E), and top (F) for all 1000 bootstrap samples. <b>G-I</b>: Values for the winning model gain log(AC50) (G), gain coefficient (H), and top (I) for all 1000 bootstrap samples, colored by winning model (hill = red, gain loss = blue). <b>J</b>: Correlation plot of winning model top vs. winning model gain log(AC50), colored by winning model (hill = red, gain loss = blue). <b>K</b>: Normalized experimentally measured values (black circles) and winning model (gain loss, black curve). Subset of fitted bootstrap resamples, with winning hill (red lines) and gain loss (blue lines) models plotted. Horizontal black lines represent 3x bmad (dashed) and activity cutoff (solid). <b>L</b>: Comparison to results from other assays. Cumulative empirical distribution function of winning model gain log(AC50) value for all bisphenol AF samples in all assays where the experiment results were determined to be a positive hit. Curves are colored by assay source, with TOX21 black, NVS orange, ATG sky blue, OT bluish green, and ACEA yellow.</p

    Estrogen receptor model AUC values for chemicals with an AUC(Agonist) value > 0.1.

    No full text
    <p>Point estimates for agonist (red), antagonist (black), and pseudoreceptor (blue) values are marked by circles for all AUC values with an upper 95% confidence interval > 0.1. Error bars indicate the 95% confidence interval obtained by bootstrap resampling.</p

    Model selection in hit call probability for sixteen estrogen receptor agonist assays.

    No full text
    <p>For each plot, chemicals are ordered on the x-axis based on their hit call probability. The y axis indicates the percent of bootstrap resamples that were calculated to be a positive hit with a hill model (red), gain loss (blue), or a negative hit (black).</p

    Estrogen receptor model agonist AUC values for chemicals tested at least twice in the uterotrophic assay.

    No full text
    <p>Point estimates for agonist are colored by the uterotrophic consensus result being positive (red), equivocal (blue), and negative (black). Equivocal results in the uterotrophic assay indicate some tests were positive while others negative. Error bars indicate the 95% confidence interval obtained by bootstrap resampling.</p

    Estrogen receptor assays included in this study.

    No full text
    <p>Estrogen receptor assays included in this study.</p

    Nordihydroguaiaretic acid bootstrap curves.

    No full text
    <p>Each of the 18 ER assays are shown in a separate panel with the assay cutoff indicated with a dashed horizontal line. Circles represent the pipeline normalized concentration-response data and the solid black line indicates the winning model fit to the data if the hit call was positive. TOX21_ERa_LUC_BG1_Antagonist was not a hit in the pipeline therefore no black line is drawn. All bootstrap curves with a positive hit call are drawn with hill and gns models colored red and blue respectively. All assays had a 100% hit call in the bootstrap results except for ACEA_T47D_80hr_Positive where 602 of the 1000 samples had a positive hit call and assays ATG_ERa_TRANS_up and TOX21_ERa_LUC_BG1_Agonist where a single bootstrap replicate in each assay was inactive.</p

    Applicability of bootstrap methods to assays with only one measurement per concentration, determination of a winning model, and calculating a hit call probability.

    No full text
    <p>Applicability of bootstrap methods to assays with only one measurement per concentration, determination of a winning model, and calculating a hit call probability.</p

    Comparison of normal distribution with standard deviation equal to bmad (orange line) and the empirical cumulative distribution function (ecdf) for points used to calculated bmad (black line).

    No full text
    <p>For each assay, the bmad is calculated as the scaled mad of the response values for the lowest two concentrations per chemical. Deviations between the ecdf and the normal distribution at higher response values can be attributed to highly potent chemicals with a biological response at the lowest two concentrations as well as sources of noise that are from a non-normally distributed process.</p

    Smooth bootstrap resampling.

    No full text
    <p>Normalized experimental concentration-response points (cyan circles) and corresponding hill model (cyan line) are shown. The distribution of smooth bootstrap resampled points (black circles) and fitted values (black lines) are indicated, highlighting the range of resampled observations for response values and the subsequent possibilities for the fitted hill model.</p

    Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods

    No full text
    There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested ∼8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure–activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and inactive compounds at the accuracies of 76.1% and 82.8% with a total accuracy of 81.6% on the internal test set and 70.8% on the external test set. These results demonstrate that a combination of high-quality experimental data and ML methods can lead to robust models that achieve excellent predictive accuracy, which are potentially useful for facilitating the virtual screening of chemicals for environmental risk assessment
    corecore