182 research outputs found
Finite mixture models of heterogeneous capture probabilities for mark-recapture estimation of closed population size
Heterogeneity in capture probabilities among animals is a common problem for estimation of animal population size from mark-recapture data. We model animals as belonging to discrete groups in which animals have the same probabilities of first capture and recapture. For removal data, population size, probabilities of first capture, and mixture proportions are estimated by maximum likelihood for a geometric finite mixture model. For mark-recapture data, a binomial finite mixture for recaptures is combined with a geometric finite mixture for first captures to better estimate mixture proportions. This model can be restricted for the assumption of no behavioral response to first capture.;On Carother\u27s (1973) taxi cab data, estimation with a 2-group mark-recapture finite mixture provided a population size estimate, N = 420, that exactly matched the registered number of cabs. On meadow vole data, estimation with a 3-group model showed heterogeneity in behavioral response to first capture. Simulations show that our 2-group mark-recapture finite mixture estimator with restriction for no behavioral response to first capture is more efficient than Burham and Overtons\u27 (1978) jackknife estimator when the smallest probability of first capture is 0.1 and the number of sampling occasions is 10
Confidence Interval Estimation of Cumulative Incidence for Clustered Competing Risks
In a cluster randomized trial studying a primary outcome patients are sometimes exposed to competing events. These are risks that alter the probability of the primary outcome occurring. Traditional methods of estimating the cumulative incidence for an outcome and its associated confidence interval under competing risks do not account for the effect of clustering. This may cause incorrect estimation of confidence intervals because outcomes among patients from the same center are correlated. This thesis compared six nonparametric methods of confidence interval construction for cumulative incidence, four of which account for clustering effect, under competing risks via simulation study. Over the range of examined scenarios, if the clustering effect is mild (i.e. ICC = 0.01), estimators not accounting for clustering never have worse coverage than those that do. However, in cases with a large clustering effect (i.e. ICC = 0.05), using confidence interval estimators accounting for clustering should be considered
Statistical Methods for Biodiversity Assessment
This thesis focuses on statistical methods for estimating the number of species which is a natural index for measuring biodiversity. Both parametric and nonparametric approaches are investigated for this problem. Species abundance models including homogeneous and heterogeneous model are explored for species richness estimation. Two new improvements to the Chao estimator are developed using the Good-Turing coverage formula.
Although the homogeneous abundance model is the simplest model, the species are collected with different probability in practice. This leads to overdispersed data, zero inflation and a heavy tail. The Poisson-Tweedie distribution, a mixed-Poisson distribution including many special cases such as the negative-binomial distribution, Poisson, Poisson inverse Gaussian, P\'{o}lya-Aeppli and so on, is explored for estimating the number of species. The weighted linear regression estimator based on the ratio of successive frequencies is applied \add{to data generated from} the Poisson-Tweedie distribution. There may be a problem with sparse data which provides zero frequencies for species seen times. This leads to the weighted linear regression not working. Then, a smoothing technique is considered for improving the performance of the weighted linear regression estimator. Both simulated data and some real data sets are used to study the performance of parametric and nonparametric estimators in this thesis.
Finally, the distribution of the number distinct species found in a sample is hard to compute. Many approximations including the Poisson, normal, COM-Poisson Binomial, Altham's multiplicative and additive-binomial and P\'{o}lya distribution are used for approximating the distribution of distinct species. Under various abundance models, Altham's multiplicative-binomial approximation performs well. Building on other recent work, the maximum likelihood and the maximum pseudo-likelihood estimators are applied with Altham's multiplicative-binomial approximation and compared with other estimators
Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations
Matching algorithms are commonly used to predict matches between items in a
collection. For example, in 1:1 face verification, a matching algorithm
predicts whether two face images depict the same person. Accurately assessing
the uncertainty of the error rates of such algorithms can be challenging when
data are dependent and error rates are low, two aspects that have been often
overlooked in the literature. In this work, we review methods for constructing
confidence intervals for error rates in matching tasks such as 1:1 face
verification. We derive and examine the statistical properties of these methods
and demonstrate how coverage and interval width vary with sample size, error
rates, and degree of data dependence using both synthetic and real-world
datasets. Based on our findings, we provide recommendations for best practices
for constructing confidence intervals for error rates in matching tasks.Comment: 32 pages, 8 figure
A note on bias reduction
Let be an unbiased estimate of an unknown . Given a function , we show how to choose a function such that for , . We illustrate this with for a given constant . For and normal, this leads to the convolution equation
A note on bias reduction
Let be an unbiased estimate of an unknown . Given a function , we show how to choose a function such that for , . We illustrate this with for a given constant . For and normal, this leads to the convolution equation
Marginal methods and software for clustered data with cluster- and group-size informativeness.
Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that is resistant to ICS and IWCGS bias, and this method has been used to establish clustered data analogs of classical hypothesis tests related to ranks and correlation. In this work, we extend the reweighting methodology to develop a versatile collection of marginal hypothesis tests related to proportions, means, and variances in clustered data that are analogous to classical forms. We evaluate the performance of these tests compared to other cluster-appropriate methods through simulation and show that only reweighted tests maintain appropriate size when data have informativeness. We construct reweighted tests of clustered categorical data using several variance estimators, and demonstrate that the method of variance estimation can have substantial effect on these tests. Additionally, we show that when testing simple hypotheses in data lacking informativeness, reweighted tests can outperform other standard cluster-appropriate methods both in terms of size and power. Combining our novel tests with the existing tests of ranks and correlations, we compile a comprehensive R software package that executes this collection of ICS/IWCGS-appropriate methods through a thoughtful and user-friendly design
- …