9,385 research outputs found

    Growth Estimators and Confidence Intervals for the Mean of Negative Binomial Random Variables with Unknown Dispersion

    Full text link
    The Negative Binomial distribution becomes highly skewed under extreme dispersion. Even at moderately large sample sizes, the sample mean exhibits a heavy right tail. The standard Normal approximation often does not provide adequate inferences about the data's mean in this setting. In previous work, we have examined alternative methods of generating confidence intervals for the expected value. These methods were based upon Gamma and Chi Square approximations or tail probability bounds such as Bernstein's Inequality. We now propose growth estimators of the Negative Binomial mean. Under high dispersion, zero values are likely to be overrepresented in the data. A growth estimator constructs a Normal-style confidence interval by effectively removing a small, pre--determined number of zeros from the data. We propose growth estimators based upon multiplicative adjustments of the sample mean and direct removal of zeros from the sample. These methods do not require estimating the nuisance dispersion parameter. We will demonstrate that the growth estimators' confidence intervals provide improved coverage over a wide range of parameter values and asymptotically converge to the sample mean. Interestingly, the proposed methods succeed despite adding both bias and variance to the Normal approximation

    Estimation of the basic reproductive number and mean serial interval of a novel pathogen in a small, well-observed discrete population

    Get PDF
    BACKGROUND:Accurately assessing the transmissibility and serial interval of a novel human pathogen is public health priority so that the timing and required strength of interventions may be determined. Recent theoretical work has focused on making best use of data from the initial exponential phase of growth of incidence in large populations. METHODS:We measured generational transmissibility by the basic reproductive number R0 and the serial interval by its mean Tg. First, we constructed a simulation algorithm for case data arising from a small population of known size with R0 and Tg also known. We then developed an inferential model for the likelihood of these case data as a function of R0 and Tg. The model was designed to capture a) any signal of the serial interval distribution in the initial stochastic phase b) the growth rate of the exponential phase and c) the unique combination of R0 and Tg that generates a specific shape of peak incidence when the susceptible portion of a small population is depleted. FINDINGS:Extensive repeat simulation and parameter estimation revealed no bias in univariate estimates of either R0 and Tg. We were also able to simultaneously estimate both R0 and Tg. However, accurate final estimates could be obtained only much later in the outbreak. In particular, estimates of Tg were considerably less accurate in the bivariate case until the peak of incidence had passed. CONCLUSIONS:The basic reproductive number and mean serial interval can be estimated simultaneously in real time during an outbreak of an emerging pathogen. Repeated application of these methods to small scale outbreaks at the start of an epidemic would permit accurate estimates of key parameters

    Multiple Imputation Using Gaussian Copulas

    Get PDF
    Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use method for generating multiple imputations using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff, 2007) allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. Multiple imputations are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: MICE and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease-of-use suggest that the approach examined provides an attractive alternative for applied researchers undertaking multiple imputations

    TESTING SIGNIFICANCE OF MULTI-DESTINATION AND MULTI-PURPOSE TRIP EFFECTS IN A TRAVEL COST METHOD DEMAND MODEL FOR WHALE WATCHING TRIPS

    Get PDF
    Inclusion of multi-destination and multi-purpose visitors has an appreciable influence on a standard count data travel cost model derived estimate of willingness to pay but the differences are not statistically significant. We adapt a more general travel cost model (TCM) of Parsons and Wilson (1997) that allows for inclusion of multi-destination visitors as incidental demand to allow estimation of an unbiased measure of single and multi-destination willingness to pat for whale viewing using a single pooled equation. The primary purpose trip values from the standard TCM and simple generalized TCM model are identical at 43perpersonperdayandneitheraresignificantlydifferentfromthe43 per person per day and neither are significantly different from the 50 day value from a generalized model that distinguishes between joint and incidental trips. The general models avoid underestimation of total recreation site benefits that would result from omitting the consumer surplus of multi-destination visitors.Resource /Energy Economics and Policy,

    Unweighted regression models perform better than weighted regression techniques for respondent-driven sampling data: results from a simulation study

    Get PDF
    Background: It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS). Methods: Twelve networked populations, with varying levels of homophily and prevalence, based on a known distribution of a continuous predictor were simulated using 1000 RDS samples from each population. Weighted and unweighted binomial and Poisson general linear models, with and without various clustering controls and standard error adjustments were modelled for each sample and evaluated with respect to validity, bias and coverage rate. Population prevalence was also estimated. Results: In the regression analysis, the unweighted log-link (Poisson) models maintained the nominal type-I error rate across all populations. Bias was substantial and type-I error rates unacceptably high for weighted binomial regression. Coverage rates for the estimation of prevalence were highest using RDS-weighted logistic regression, except at low prevalence (10%) where unweighted models are recommended. Conclusions: Caution is warranted when undertaking regression analysis of RDS data. Even when reported degree is accurate, low reported degree can unduly influence regression estimates. Unweighted Poisson regression is therefore recommended.York University Librarie
    • …
    corecore