50 research outputs found

    DPRESS: Localizing estimates of predictive uncertainty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object <it>u</it>: the standard error of prediction <it>s</it><sub>u </sub>can be estimated as the non-cross-validated error <it>s</it><sub>t* </sub>for the closest object <it>t</it>* in the training set adjusted for its separation <it>d </it>from <it>u </it>in the descriptor space relative to the size of the training set.</p> <p><display-formula><graphic file="1758-2946-1-11-i1.gif"/></display-formula></p> <p>The predictive uncertainty factor <it>γ</it><sub>t* </sub>is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: <it>D</it>istributed <it>PR</it>edictive <it>E</it>rror <it>S</it>um of <it>S</it>quares (DPRESS). Note that <it>s</it><sub>t* </sub>and <it>γ</it><sub>t*</sub>are characteristic of each training set compound contributing to the model of interest.</p> <p>Results</p> <p>The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (<it>N </it>= 75) drawn from a large (<it>N </it>= 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so.</p> <p>Conclusion</p> <p>DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, <it>a posteriori </it>approach to defining applicability domains in terms of localized uncertainty.</p

    Patterns of Plant Biomass Partitioning Depend on Nitrogen Source

    Get PDF
    Nitrogen (N) availability is a strong determinant of plant biomass partitioning, but the role of different N sources in this process is unknown. Plants inhabiting low productivity ecosystems typically partition a large share of total biomass to belowground structures. In these systems, organic N may often dominate plant available N. With increasing productivity, plant biomass partitioning shifts to aboveground structures, along with a shift in available N to inorganic forms of N. We tested the hypothesis that the form of N taken up by plants is an important determinant of plant biomass partitioning by cultivating Arabidopsis thaliana on different N source mixtures. Plants grown on different N mixtures were similar in size, but those supplied with organic N displayed a significantly greater root fraction. 15N labelling suggested that, in this case, a larger share of absorbed organic N was retained in roots and split-root experiments suggested this may depend on a direct incorporation of absorbed amino acid N into roots. These results suggest the form of N acquired affects plant biomass partitioning and adds new information on the interaction between N and biomass partitioning in plants

    Quantitative Structure Activity Relationship

    No full text
    corecore