593 research outputs found

    Building and using semiparametric tolerance regions for parametric multinomial models

    Full text link
    We introduce a semiparametric ``tubular neighborhood'' of a parametric model in the multinomial setting. It consists of all multinomial distributions lying in a distance-based neighborhood of the parametric model of interest. Fitting such a tubular model allows one to use a parametric model while treating it as an approximation to the true distribution. In this paper, the Kullback--Leibler distance is used to build the tubular region. Based on this idea one can define the distance between the true multinomial distribution and the parametric model to be the index of fit. The paper develops a likelihood ratio test procedure for testing the magnitude of the index. A semiparametric bootstrap method is implemented to better approximate the distribution of the LRT statistic. The approximation permits more accurate construction of a lower confidence limit for the model fitting index.Comment: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The topography of multivariate normal mixtures

    Get PDF
    Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. It is shown that their topography, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points, as well as the ridges of the density. A plot of the elevations on the ridgeline shows the key features of the mixed density. In addition, by use of the ridgeline, we uncover a function that determines the number of modes of the mixed density when there are two components being mixed. A followup analysis then gives a curvature function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating the number of classes

    Full text link
    Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be one-sided. Confidence intervals for the number of classes are also necessarily one-sided. A sequence of lower bounds to the odds is developed and used to define pseudo maximum likelihood estimators for the number of classes.Comment: Published at http://dx.doi.org/10.1214/009053606000001280 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Improving cross-validated bandwidth selection using subsampling-extrapolation techniques

    Get PDF
    AbstractCross-validation methodologies have been widely used as a means of selecting tuning parameters in nonparametric statistical problems. In this paper we focus on a new method for improving the reliability of cross-validation. We implement this method in the context of the kernel density estimator, where one needs to select the bandwidth parameter so as to minimize L2 risk. This method is a two-stage subsampling-extrapolation bandwidth selection procedure, which is realized by first evaluating the risk at a fictional sample size m(m≤sample size  n) and then extrapolating the optimal bandwidth from m to n. This two-stage method can dramatically reduce the variability of the conventional unbiased cross-validation bandwidth selector. This simple first-order extrapolation estimator is equivalent to the rescaled “bagging-CV” bandwidth selector in Hall and Robinson (2009) if one sets the bootstrap size equal to the fictional sample size. However, our simplified expression for the risk estimator enables us to compute the aggregated risk without any bootstrapping. Furthermore, we developed a second-order extrapolation technique as an extension designed to improve the approximation of the true optimal bandwidth. To select the optimal choice of the fictional size m given a sample of size n, we propose a nested cross-validation methodology. Based on simulation study, the proposed new methods show promising performance across a wide selection of distributions. In addition, we also investigated the asymptotic properties of the proposed bandwidth selectors

    Quadratic distances on probabilities: A unified foundation

    Full text link
    This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the kernel that generates the distance. We show how this determines the limiting distribution of natural goodness-of-fit tests. Additionally, we develop a new notion, the spectral degrees of freedom of the test, based on this decomposition. The degrees of freedom are easy to compute and estimate, and can be used as a guide in the construction of useful procedures in this class.Comment: Published in at http://dx.doi.org/10.1214/009053607000000956 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Detecting West Nile Virus in Owls and Raptors by an Antigen-capture Assay

    Get PDF
    We evaluated a rapid antigen-capture assay (VecTest) for detection of West Nile virus in oropharyngeal and cloacal swabs, collected at necropsy from owls (N = 93) and raptors (N = 27). Sensitivity was 93.5%–95.2% for northern owl species but <42.9% for all other species. Specificity was 100% for owls and 85.7% for raptors

    Partial Purification and Characterization of Two Peptide Hydrolases from Pea Seeds

    Full text link

    Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries

    Get PDF
    BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing
    • …
    corecore