182 research outputs found

    Finite mixture models of heterogeneous capture probabilities for mark-recapture estimation of closed population size

    Get PDF
    Heterogeneity in capture probabilities among animals is a common problem for estimation of animal population size from mark-recapture data. We model animals as belonging to discrete groups in which animals have the same probabilities of first capture and recapture. For removal data, population size, probabilities of first capture, and mixture proportions are estimated by maximum likelihood for a geometric finite mixture model. For mark-recapture data, a binomial finite mixture for recaptures is combined with a geometric finite mixture for first captures to better estimate mixture proportions. This model can be restricted for the assumption of no behavioral response to first capture.;On Carother\u27s (1973) taxi cab data, estimation with a 2-group mark-recapture finite mixture provided a population size estimate, N = 420, that exactly matched the registered number of cabs. On meadow vole data, estimation with a 3-group model showed heterogeneity in behavioral response to first capture. Simulations show that our 2-group mark-recapture finite mixture estimator with restriction for no behavioral response to first capture is more efficient than Burham and Overtons\u27 (1978) jackknife estimator when the smallest probability of first capture is 0.1 and the number of sampling occasions is 10

    Confidence Interval Estimation of Cumulative Incidence for Clustered Competing Risks

    Get PDF
    In a cluster randomized trial studying a primary outcome patients are sometimes exposed to competing events. These are risks that alter the probability of the primary outcome occurring. Traditional methods of estimating the cumulative incidence for an outcome and its associated confidence interval under competing risks do not account for the effect of clustering. This may cause incorrect estimation of confidence intervals because outcomes among patients from the same center are correlated. This thesis compared six nonparametric methods of confidence interval construction for cumulative incidence, four of which account for clustering effect, under competing risks via simulation study. Over the range of examined scenarios, if the clustering effect is mild (i.e. ICC = 0.01), estimators not accounting for clustering never have worse coverage than those that do. However, in cases with a large clustering effect (i.e. ICC = 0.05), using confidence interval estimators accounting for clustering should be considered

    Statistical Methods for Biodiversity Assessment

    Get PDF
    This thesis focuses on statistical methods for estimating the number of species which is a natural index for measuring biodiversity. Both parametric and nonparametric approaches are investigated for this problem. Species abundance models including homogeneous and heterogeneous model are explored for species richness estimation. Two new improvements to the Chao estimator are developed using the Good-Turing coverage formula. Although the homogeneous abundance model is the simplest model, the species are collected with different probability in practice. This leads to overdispersed data, zero inflation and a heavy tail. The Poisson-Tweedie distribution, a mixed-Poisson distribution including many special cases such as the negative-binomial distribution, Poisson, Poisson inverse Gaussian, P\'{o}lya-Aeppli and so on, is explored for estimating the number of species. The weighted linear regression estimator based on the ratio of successive frequencies is applied \add{to data generated from} the Poisson-Tweedie distribution. There may be a problem with sparse data which provides zero frequencies for species seen ii times. This leads to the weighted linear regression not working. Then, a smoothing technique is considered for improving the performance of the weighted linear regression estimator. Both simulated data and some real data sets are used to study the performance of parametric and nonparametric estimators in this thesis. Finally, the distribution of the number distinct species found in a sample is hard to compute. Many approximations including the Poisson, normal, COM-Poisson Binomial, Altham's multiplicative and additive-binomial and P\'{o}lya distribution are used for approximating the distribution of distinct species. Under various abundance models, Altham's multiplicative-binomial approximation performs well. Building on other recent work, the maximum likelihood and the maximum pseudo-likelihood estimators are applied with Altham's multiplicative-binomial approximation and compared with other estimators

    Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations

    Full text link
    Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the literature. In this work, we review methods for constructing confidence intervals for error rates in matching tasks such as 1:1 face verification. We derive and examine the statistical properties of these methods and demonstrate how coverage and interval width vary with sample size, error rates, and degree of data dependence using both synthetic and real-world datasets. Based on our findings, we provide recommendations for best practices for constructing confidence intervals for error rates in matching tasks.Comment: 32 pages, 8 figure

    A note on bias reduction

    Get PDF
    Let w^\widehat{w} be an unbiased estimate of an unknown w∈Rw\in R. Given a function t(w)t(w), we show how to choose a function fn(w)f_n(w) such that for w∗=w^+fn(w)w^*=\widehat{w} + f_n(w), E t(w∗)=t(w)E\ t\left(w^*\right) =t(w). We illustrate this with t(w)=wat(w)=w^a for a given constant aa. For a=2a=2 and w^\widehat{w} normal, this leads to the convolution equation cr=cr⊗crc_r=c_r\otimes c_r

    A note on bias reduction

    Get PDF
    Let w^\widehat{w} be an unbiased estimate of an unknown w∈Rw\in R. Given a function t(w)t(w), we show how to choose a function fn(w)f_n(w) such that for w∗=w^+fn(w)w^*=\widehat{w} + f_n(w), E t(w∗)=t(w)E\ t\left(w^*\right) =t(w). We illustrate this with t(w)=wat(w)=w^a for a given constant aa. For a=2a=2 and w^\widehat{w} normal, this leads to the convolution equation cr=cr⊗crc_r=c_r\otimes c_r

    Marginal methods and software for clustered data with cluster- and group-size informativeness.

    Get PDF
    Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that is resistant to ICS and IWCGS bias, and this method has been used to establish clustered data analogs of classical hypothesis tests related to ranks and correlation. In this work, we extend the reweighting methodology to develop a versatile collection of marginal hypothesis tests related to proportions, means, and variances in clustered data that are analogous to classical forms. We evaluate the performance of these tests compared to other cluster-appropriate methods through simulation and show that only reweighted tests maintain appropriate size when data have informativeness. We construct reweighted tests of clustered categorical data using several variance estimators, and demonstrate that the method of variance estimation can have substantial effect on these tests. Additionally, we show that when testing simple hypotheses in data lacking informativeness, reweighted tests can outperform other standard cluster-appropriate methods both in terms of size and power. Combining our novel tests with the existing tests of ranks and correlations, we compile a comprehensive R software package that executes this collection of ICS/IWCGS-appropriate methods through a thoughtful and user-friendly design
    • …
    corecore