24 research outputs found

    Estimation of the entropy of a multivariate normal distribution

    Get PDF
    AbstractMotivated by problems in molecular biosciences wherein the evaluation of entropy of a molecular system is important for understanding its thermodynamic properties, we consider the efficient estimation of entropy of a multivariate normal distribution having unknown mean vector and covariance matrix. Based on a random sample, we discuss the problem of estimating the entropy under the quadratic loss function. The best affine equivariant estimator is obtained and, interestingly, it also turns out to be an unbiased estimator and a generalized Bayes estimator. It is established that the best affine equivariant estimator is admissible in the class of estimators that depend on the determinant of the sample covariance matrix alone. The risk improvements of the best affine equivariant estimator over the maximum likelihood estimator (an estimator commonly used in molecular sciences) are obtained numerically and are found to be substantial in higher dimensions, which is commonly the case for atomic coordinates in macromolecules such as proteins. We further establish that even the best affine equivariant estimator is inadmissible and obtain Stein-type and Brewster–Zidek-type estimators dominating it. The Brewster–Zidek-type estimator is shown to be generalized Bayes

    QSAR Study of Skin Sensitization Using Local Lymph Node Assay Data

    Get PDF
    Allergic Contact Dermatitis (ACD) is a common work-related skin disease that often develops as a result of repetitive skin exposures to a sensitizing chemical agent. A variety of experimental tests have been suggested to assess the skin sensitization potential. We applied a method of Quantitative Structure-Activity Relationship (QSAR) to relate measured and calculated physical-chemical properties of chemical compounds to their sensitization potential. Using statistical methods, each of these properties, called molecular descriptors, was tested for its propensity to predict the sensitization potential. A few of the most informative descriptors were subsequently selected to build a model of skin sensitization. In this work sensitization data for the murine Local Lymph Node Assay (LLNA) were used. In principle, LLNA provides a standardized continuous scale suitable for quantitative assessment of skin sensitization. However, at present many LLNA results are still reported on a dichotomous scale, which is consistent with the scale of guinea pig tests, which were widely used in past years. Therefore, in this study only a dichotomous version of the LLNA data was used. To the statistical end, we relied on the logistic regression approach. This approach provides a statistical tool for investigating and predicting skin sensitization that is expressed only in categorical terms of activity and nonactivity. Based on the data of compounds used in this study, our results suggest a QSAR model of ACD that is based on the following descriptors: nDB (number of double bonds), C-003 (number of CHR3 molecular subfragments), GATS6M (autocorrelation coefficient) and HATS6m (GETAWAY descriptor), although the relevance of the identified descriptors to the continuous ACD QSAR has yet to be shown. The proposed QSAR model gives a percentage of positively predicted responses of 83% on the training set of compounds, and in cross validation it correctly identifies 79% of responses

    Electrostatic potential on human leukocyte antigen: implications for putative mechanism of chronic beryllium disease.

    Get PDF
    The pathobiology of chronic beryllium disease (CBD) involves the major histocompatibility complex class II human leukocyte antigen (HLA). Although occupational exposure to beryllium is the cause of CBD, molecular epidemiologic studies suggest that specific (Italic)HLA-DPB1(/Italic) alleles may be genetic susceptibility factors. We have studied three-dimensional structural models of HLA-DP proteins encoded by these genes. The extracellular domains of HLA-DPA1*0103/B1*1701, *1901, *0201, and *0401, and HLA-DPA1*0201/B1*1701, *1901, *0201, and *0401 were modeled from the X-ray coordinates of an HLA-DR template. Using these models, the electrostatic potential at the molecular surface of each HLA-DP was calculated and compared. These comparisons identify specific characteristics in the vicinity of the antigen-binding pocket that distinguish the different HLA-DP allotypes. Differences in electrostatics originate from the shape, specific disposition, and variation in the negatively charged groups around the pocket. The more negative the pocket potential, the greater the odds of developing CBD estimated from reported epidemiologic studies. Adverse impact is caused by charged substitutions in positions 55, 56, 69, 84, and 85, namely, the exact same loci identified as genetic markers of CBD susceptibility as well as cobalt-lung hard metal disease. These findings suggest that certain substitutions may promote an involuntary cation-binding site within a putatively metal-free peptide-binding pocket and therefore change the innate specificity of antigen recognition

    A Statistical Model for Assessing Genetic Susceptibility as a Risk Factor in Multifactorial Diseases: Lessons from Occupational Asthma

    Get PDF
    BACKGROUND: Incorporating the influence of genetic variation in the risk assessment process is often considered, but no generalized approach exists. Many common human diseases such as asthma, cancer, and cardiovascular disease are complex in nature, as they are influenced variably by environmental, physiologic, and genetic factors. The genetic components most responsible for differences in individual disease risk are thought to be DNA variants (polymorphisms) that influence the expression or function of mediators involved in the pathological processes. OBJECTIVE: The purpose of this study was to estimate the combinatorial contribution of multiple genetic variants to disease risk. METHODS: We used a logistic regression model to help estimate the joint contribution that multiple genetic variants would have on disease risk. This model was developed using data collected from molecular epidemiology studies of allergic asthma that examined variants in 16 susceptibility genes. RESULTS: Based on the product of single gene variant odds ratios, the risk of developing asthma was assigned to genotype profiles, and the frequency of each profile was estimated for the general population. Our model predicts that multiple disease variants broaden the risk distribution, facilitating the identification of susceptible populations. This model also allows for incorporation of exposure information as an independent variable, which will be important for risk variants associated with specific exposures. CONCLUSION: The present model provided an opportunity to estimate the relative change in risk associated with multiple genetic variants. This will facilitate identification of susceptible populations and help provide a framework to model the genetic contribution in probabilistic risk assessment

    Modeling Chemical Interaction Profiles: II. Molecular Docking, Spectral Data-Activity Relationship, and Structure-Activity Relationship Models for Potent and Weak Inhibitors of Cytochrome P450 CYP3A4 Isozyme

    Get PDF
    Polypharmacy increasingly has become a topic of public health concern, particularly as the U.S. population ages. Drug labels often contain insufficient information to enable the clinician to safely use multiple drugs. Because many of the drugs are bio-transformed by cytochrome P450 (CYP) enzymes, inhibition of CYP activity has long been associated with potentially adverse health effects. In an attempt to reduce the uncertainty pertaining to CYP-mediated drug-drug/chemical interactions, an interagency collaborative group developed a consensus approach to prioritizing information concerning CYP inhibition. The consensus involved computational molecular docking, spectral data-activity relationship (SDAR), and structure-activity relationship (SAR) models that addressed the clinical potency of CYP inhibition. The models were built upon chemicals that were categorized as either potent or weak inhibitors of the CYP3A4 isozyme. The categorization was carried out using information from clinical trials because currently available in vitro high-throughput screening data were not fully representative of the in vivo potency of inhibition. During categorization it was found that compounds, which break the Lipinski rule of five by molecular weight, were about twice more likely to be inhibitors of CYP3A4 compared to those, which obey the rule. Similarly, among inhibitors that break the rule, potent inhibitors were 2–3 times more frequent. The molecular docking classification relied on logistic regression, by which the docking scores from different docking algorithms, CYP3A4 three-dimensional structures, and binding sites on them were combined in a unified probabilistic model. The SDAR models employed a multiple linear regression approach applied to binned 1D 13C-NMR and 1D 15N-NMR spectral descriptors. Structure-based and physical-chemical descriptors were used as the basis for developing SAR models by the decision forest method. Thirty-three potent inhibitors and 88 weak inhibitors of CYP3A4 were used to train the models. Using these models, a synthetic majority rules consensus classifier was implemented, while the confidence of estimation was assigned following the percent agreement strategy. The classifier was applied to a testing set of 120 inhibitors not included in the development of the models. Five compounds of the test set, including known strong inhibitors dalfopristin and tioconazole, were classified as probable potent inhibitors of CYP3A4. Other known strong inhibitors, such as lopinavir, oltipraz, quercetin, raloxifene, and troglitazone, were among 18 compounds classified as plausible potent inhibitors of CYP3A4. The consensus estimation of inhibition potency is expected to aid in the nomination of pharmaceuticals, dietary supplements, environmental pollutants, and occupational and other chemicals for in-depth evaluation of the CYP3A4 inhibitory activity. It may serve also as an estimate of chemical interactions via CYP3A4 metabolic pharmacokinetic pathways occurring through polypharmacy and nutritional and environmental exposures to chemical mixtures

    Improving the Continuum Dielectric Approach to Calculating p K

    No full text
    corecore