42 research outputs found

    Estimation of the entropy of a multivariate normal distribution

    Get PDF
    AbstractMotivated by problems in molecular biosciences wherein the evaluation of entropy of a molecular system is important for understanding its thermodynamic properties, we consider the efficient estimation of entropy of a multivariate normal distribution having unknown mean vector and covariance matrix. Based on a random sample, we discuss the problem of estimating the entropy under the quadratic loss function. The best affine equivariant estimator is obtained and, interestingly, it also turns out to be an unbiased estimator and a generalized Bayes estimator. It is established that the best affine equivariant estimator is admissible in the class of estimators that depend on the determinant of the sample covariance matrix alone. The risk improvements of the best affine equivariant estimator over the maximum likelihood estimator (an estimator commonly used in molecular sciences) are obtained numerically and are found to be substantial in higher dimensions, which is commonly the case for atomic coordinates in macromolecules such as proteins. We further establish that even the best affine equivariant estimator is inadmissible and obtain Stein-type and Brewster–Zidek-type estimators dominating it. The Brewster–Zidek-type estimator is shown to be generalized Bayes

    Nonparametric Estimation of the Survival Function Based on Censored Data with Additional Observations from the Residual Distribution

    Get PDF
    We derive the nonparametric maximum likelihood estimator (NPMLE) of the distribution of the test items using a random, right-censored sample combined with an additional right-censored, residual-lifetime sample in which only lifetimes past a known, fixed time are collected. This framework is suited for samples for which individual test data are combined with left-truncated and randomly censored data from an operating environment. The NPMLE of the survival function using the combined sample is identical to the Kaplan-Meier product-limit estimator only up to the time at which the test items corresponding to the residual sample were known to survive. The limiting distribution for the NPMLE, discussed in detail, leads to confidence bounds for the survival function. For the uncensored case, we study the relative efficiency for the estimator based on the combined sample with respect to the analogous estimator based only on the simple random sample

    Fisher Information in Weighted Distributions

    Get PDF
    Standard inference procedures assume a random sample from a population with density fμ(x) for estimating the parameter μ. However, there are many applications in which the available data are a biased sample instead. Fisher modeled biased sampling using a weight function w(x) ¸ 0, and constructed a weighted distribution with a density fμw(x) that is proportional to w(x)fμ(x). In this paper, we assume that fμ(x) belongs to an exponential family, and study the Fisher information about μ in observations obtained from some commonly arising weighted distributions: (i) the kth order statistic of a random sample of size m, (ii) observations from the stationary distribution of the residual lifetime of a renewal process, and (iii) truncated distributions. We give general conditions under which the weighted distribution has greater Fisher information than the original distribution, and specialize to the normal, gamma, and Weibull distributions. These conditions involve the distributions\u27 hazard rate and the reversed hazard rate functions

    QSAR Study of Skin Sensitization Using Local Lymph Node Assay Data

    Get PDF
    Allergic Contact Dermatitis (ACD) is a common work-related skin disease that often develops as a result of repetitive skin exposures to a sensitizing chemical agent. A variety of experimental tests have been suggested to assess the skin sensitization potential. We applied a method of Quantitative Structure-Activity Relationship (QSAR) to relate measured and calculated physical-chemical properties of chemical compounds to their sensitization potential. Using statistical methods, each of these properties, called molecular descriptors, was tested for its propensity to predict the sensitization potential. A few of the most informative descriptors were subsequently selected to build a model of skin sensitization. In this work sensitization data for the murine Local Lymph Node Assay (LLNA) were used. In principle, LLNA provides a standardized continuous scale suitable for quantitative assessment of skin sensitization. However, at present many LLNA results are still reported on a dichotomous scale, which is consistent with the scale of guinea pig tests, which were widely used in past years. Therefore, in this study only a dichotomous version of the LLNA data was used. To the statistical end, we relied on the logistic regression approach. This approach provides a statistical tool for investigating and predicting skin sensitization that is expressed only in categorical terms of activity and nonactivity. Based on the data of compounds used in this study, our results suggest a QSAR model of ACD that is based on the following descriptors: nDB (number of double bonds), C-003 (number of CHR3 molecular subfragments), GATS6M (autocorrelation coefficient) and HATS6m (GETAWAY descriptor), although the relevance of the identified descriptors to the continuous ACD QSAR has yet to be shown. The proposed QSAR model gives a percentage of positively predicted responses of 83% on the training set of compounds, and in cross validation it correctly identifies 79% of responses

    Null Distribution Of The Likelihood Ratio Statistic For Feed-Forward Neural Networks

    Get PDF
    Despite recent publications exploring model complexity with modern regression methods, their dimensionality is rarely quantified in practice and the distributions of related test statistics are not well characterized. Through a simulation study, we describe the null distribution of the likelihood ratio statistic for several different feed-forward neural network models

    On the estimation of ordered means of two exponential populations

    No full text
    Asymptotic efficiency, exponential distribution, isotonic regression, maximum likelihood estimation, mean square error,

    Sufficient conditions for stochastic equality of two distributions under some partial orders

    No full text
    We establish some conditions for stochastic equality of two nonnegative random variables which are ordered with respect to variability ordering or with respect to mean residual life ordering or with respect to second order stochastic ordering.
    corecore