74 research outputs found

    DPRESS: Localizing estimates of predictive uncertainty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object <it>u</it>: the standard error of prediction <it>s</it><sub>u </sub>can be estimated as the non-cross-validated error <it>s</it><sub>t* </sub>for the closest object <it>t</it>* in the training set adjusted for its separation <it>d </it>from <it>u </it>in the descriptor space relative to the size of the training set.</p> <p><display-formula><graphic file="1758-2946-1-11-i1.gif"/></display-formula></p> <p>The predictive uncertainty factor <it>Îł</it><sub>t* </sub>is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: <it>D</it>istributed <it>PR</it>edictive <it>E</it>rror <it>S</it>um of <it>S</it>quares (DPRESS). Note that <it>s</it><sub>t* </sub>and <it>Îł</it><sub>t*</sub>are characteristic of each training set compound contributing to the model of interest.</p> <p>Results</p> <p>The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (<it>N </it>= 75) drawn from a large (<it>N </it>= 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so.</p> <p>Conclusion</p> <p>DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, <it>a posteriori </it>approach to defining applicability domains in terms of localized uncertainty.</p

    Mimicking microbial 'education' of the immune system: a strategy to revert the epidemic trend of atopy and allergic asthma?

    Get PDF
    Deficient microbial stimulation of the immune system, caused by hygiene, may underly the atopy and allergic asthma epidemic we are currently experiencing. Consistent with this 'hygiene hypothesis', research on immunotherapy of allergic diseases also centres on bacteria-derived molecules (eg DNA immunostimulatory sequences) as adjuvants for allergen-specific type 1 immune responses. If we understood how certain microbes physiologically 'educate' our immune system to interact safely with environmental nonmicrobial antigens, we might be able to learn to mimic their beneficial actions. Programmed 'immunoeducation' would consist of safe administration, by the correct route, dose and timing, of those microbial stimuli that are necessary to 'train' the developing mucosal immune system and to maintain an appropriate homeostatic equilibrium between its components. Overall, this would result in a prevention of atopy that is not limited to certain specific allergens. Although such a strategy is far beyond our present potential, it may in principle revert the epidemic trend of atopy and allergic asthma without jeopardizing the fight against infectious diseases

    Seasonal dynamics of active SAR11 ecotypes in the oligotrophic Northwest Mediterranean Sea

    Get PDF
    A seven-year oceanographic time series in NW Mediterranean surface waters was combined with pyrosequencing of ribosomal RNA (16S rRNA) and ribosomal RNA gene copies (16S rDNA) to examine the environmental controls on SAR11 ecotype dynamics and potential activity. SAR11 diversity exhibited pronounced seasonal cycles remarkably similar to total bacterial diversity. The timing of diversity maxima was similar across narrow and broad phylogenetic clades and strongly associated with deep winter mixing. Diversity minima were associated with periods of stratification that were low in nutrients and phytoplankton biomass and characterised by intense phosphate limitation (turnover time80%) by SAR11 Ia. A partial least squares (PLS) regression model was developed that could reliably predict sequence abundances of SAR11 ecotypes (Q2=0.70) from measured environmental variables, of which mixed layer depth was quantitatively the most important. Comparison of clade-level SAR11 rRNA:rDNA signals with leucine incorporation enabled us to partially validate the use of these ratios as an in-situ activity measure. However, temporal trends in the activity of SAR11 ecotypes and their relationship to environmental variables were unclear. The strong and predictable temporal patterns observed in SAR11 sequence abundance was not linked to metabolic activity of different ecotypes at the phylogenetic and temporal resolution of our study

    Natural environments, ancestral diets, and microbial ecology: is there a modern “paleo-deficit disorder”? Part II

    Get PDF
    • 

    corecore