593 research outputs found
Building and using semiparametric tolerance regions for parametric multinomial models
We introduce a semiparametric ``tubular neighborhood'' of a parametric model
in the multinomial setting. It consists of all multinomial distributions lying
in a distance-based neighborhood of the parametric model of interest. Fitting
such a tubular model allows one to use a parametric model while treating it as
an approximation to the true distribution. In this paper, the Kullback--Leibler
distance is used to build the tubular region. Based on this idea one can define
the distance between the true multinomial distribution and the parametric model
to be the index of fit. The paper develops a likelihood ratio test procedure
for testing the magnitude of the index. A semiparametric bootstrap method is
implemented to better approximate the distribution of the LRT statistic. The
approximation permits more accurate construction of a lower confidence limit
for the model fitting index.Comment: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The topography of multivariate normal mixtures
Multivariate normal mixtures provide a flexible method of fitting
high-dimensional data. It is shown that their topography, in the sense of their
key features as a density, can be analyzed rigorously in lower dimensions by
use of a ridgeline manifold that contains all critical points, as well as the
ridges of the density. A plot of the elevations on the ridgeline shows the key
features of the mixed density. In addition, by use of the ridgeline, we uncover
a function that determines the number of modes of the mixed density when there
are two components being mixed. A followup analysis then gives a curvature
function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Estimating the number of classes
Estimating the unknown number of classes in a population has numerous
important applications. In a Poisson mixture model, the problem is reduced to
estimating the odds that a class is undetected in a sample. The discontinuity
of the odds prevents the existence of locally unbiased and informative
estimators and restricts confidence intervals to be one-sided. Confidence
intervals for the number of classes are also necessarily one-sided. A sequence
of lower bounds to the odds is developed and used to define pseudo maximum
likelihood estimators for the number of classes.Comment: Published at http://dx.doi.org/10.1214/009053606000001280 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Improving cross-validated bandwidth selection using subsampling-extrapolation techniques
AbstractCross-validation methodologies have been widely used as a means of selecting tuning parameters in nonparametric statistical problems. In this paper we focus on a new method for improving the reliability of cross-validation. We implement this method in the context of the kernel density estimator, where one needs to select the bandwidth parameter so as to minimize L2 risk. This method is a two-stage subsampling-extrapolation bandwidth selection procedure, which is realized by first evaluating the risk at a fictional sample size m(m≤sample size n) and then extrapolating the optimal bandwidth from m to n. This two-stage method can dramatically reduce the variability of the conventional unbiased cross-validation bandwidth selector. This simple first-order extrapolation estimator is equivalent to the rescaled “bagging-CV” bandwidth selector in Hall and Robinson (2009) if one sets the bootstrap size equal to the fictional sample size. However, our simplified expression for the risk estimator enables us to compute the aggregated risk without any bootstrapping. Furthermore, we developed a second-order extrapolation technique as an extension designed to improve the approximation of the true optimal bandwidth. To select the optimal choice of the fictional size m given a sample of size n, we propose a nested cross-validation methodology. Based on simulation study, the proposed new methods show promising performance across a wide selection of distributions. In addition, we also investigated the asymptotic properties of the proposed bandwidth selectors
Quadratic distances on probabilities: A unified foundation
This work builds a unified framework for the study of quadratic form distance
measures as they are used in assessing the goodness of fit of models. Many
important procedures have this structure, but the theory for these methods is
dispersed and incomplete. Central to the statistical analysis of these
distances is the spectral decomposition of the kernel that generates the
distance. We show how this determines the limiting distribution of natural
goodness-of-fit tests. Additionally, we develop a new notion, the spectral
degrees of freedom of the test, based on this decomposition. The degrees of
freedom are easy to compute and estimate, and can be used as a guide in the
construction of useful procedures in this class.Comment: Published in at http://dx.doi.org/10.1214/009053607000000956 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
14C-Cobalamin Absorption from Endogenously Labeled Chicken Eggs Assessed in Humans Using Accelerator Mass Spectrometry.
Traditionally, the bioavailability of vitamin B-12 (B12) from in vivo labeled foods was determined by labeling the vitamin with radiocobalt (57Co, 58Co or 60Co). This required use of penetrating radioactivity and sometimes used higher doses of B12 than the physiological limit of B12 absorption. The aim of this study was to determine the bioavailability and absorbed B12 from chicken eggs endogenously labeled with 14C-B12 using accelerator mass spectrometry (AMS). 14C-B12 was injected intramuscularly into hens to produce eggs enriched in vivo with the 14C labeled vitamin. The eggs, which provided 1.4 to 2.6 μg of B12 (~1.1 kBq) per serving, were scrambled, cooked and fed to 10 human volunteers. Baseline and post-ingestion blood, urine and stool samples were collected over a one-week period and assessed for 14C-B12 content using AMS. Bioavailability ranged from 13.2 to 57.7% (mean 30.2 ± 16.4%). Difference among subjects was explained by dose of B12, with percent bioavailability from 2.6 μg only half that from 1.4 μg. The total amount of B12 absorbed was limited to 0.5-0.8 μg (mean 0.55 ± 0.19 μg B12) and was relatively unaffected by the amount consumed. The use of 14C-B12 offers the only currently available method for quantifying B12 absorption in humans, including food cobalamin absorption. An egg is confirmed as a good source of B12, supplying approximately 20% of the average adult daily requirement (RDA for adults = 2.4 μg/day)
Detecting West Nile Virus in Owls and Raptors by an Antigen-capture Assay
We evaluated a rapid antigen-capture assay (VecTest) for detection of West Nile virus in oropharyngeal and cloacal swabs, collected at necropsy from owls (N = 93) and raptors (N = 27). Sensitivity was 93.5%–95.2% for northern owl species but <42.9% for all other species. Specificity was 100% for owls and 85.7% for raptors
Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries
BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing
- …