23 research outputs found
Growth Estimators and Confidence Intervals for the Mean of Negative Binomial Random Variables with Unknown Dispersion
The Negative Binomial distribution becomes highly skewed under extreme
dispersion. Even at moderately large sample sizes, the sample mean exhibits a
heavy right tail. The standard Normal approximation often does not provide
adequate inferences about the data's mean in this setting. In previous work, we
have examined alternative methods of generating confidence intervals for the
expected value. These methods were based upon Gamma and Chi Square
approximations or tail probability bounds such as Bernstein's Inequality. We
now propose growth estimators of the Negative Binomial mean. Under high
dispersion, zero values are likely to be overrepresented in the data. A growth
estimator constructs a Normal-style confidence interval by effectively removing
a small, pre--determined number of zeros from the data. We propose growth
estimators based upon multiplicative adjustments of the sample mean and direct
removal of zeros from the sample. These methods do not require estimating the
nuisance dispersion parameter. We will demonstrate that the growth estimators'
confidence intervals provide improved coverage over a wide range of parameter
values and asymptotically converge to the sample mean. Interestingly, the
proposed methods succeed despite adding both bias and variance to the Normal
approximation
Time-Dependent Performance Comparison of Stochastic Optimization Algorithms
This paper proposes a statistical methodology for comparing the performance of stochastic optimization algorithms that iteratively generate candidate optima. The fundamental data structure of the results of these algorithms is a time series. Algorithmic differences may be assessed through a procedure of statistical sampling and multiple hypothesis testing of time series data. Shilane et al. propose a general framework for performance comparison of stochastic optimization algorithms that result in a single candidate optimum. This project seeks to extend this framework to assess performance in time series data structures. The proposed methodology analyzes empirical data to determine the generation intervals in which algorithmic performance differences exist and may be used to guide the selection and design of optimization procedures for the task at hand. Such comparisons may be drawn for general performance metrics of any iterative stochastic optimization algorithm under any (typically unknown) data generating distribution. Additionally, this paper proposes a data reduction procedure to estimate performance differences in a more computationally feasible manner. In doing so, we provide a statistical framework to assess the performance of stochastic optimization algorithms and to design improved procedures for the task at hand
The Confounding Influence of Older Age in Statistical Models of Telehealth Utilization
Older age is a potentially confounding variable in models of telehealth utilization. We compared unified and stratified logistic regression models using data from the 2021 National Health Interview Survey. A total of 27,626 patients were identified, of whom 38.9% had utilized telehealth. Unified and stratified modeling showed a number of important differences in their quantitative estimates, especially for gender, Hispanic ethnicity, heart disease, COPD, food allergies, high cholesterol, weak or failing kidneys, liver conditions, difficulty with self-care, the use of mobility equipment, health problems that limit the ability to work, problems paying bills, and filling a recent prescription. Telehealth utilization odds ratios differ meaningfully between younger and older patients in stratified modeling. Traditional statistical adjustments in logistic regression may not sufficiently account for the confounding influence of older age in models of telehealth utilization. Stratified modeling by age may be more effective in obtaining clinical inferences
A General Framework for Statistical Performance Comparison of Evolutionary Computation Algorithms
This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A two-fold sampling scheme for collecting performance data is introduced, and these data are analyzed using bootstrap-based multiple hypothesis testing procedures. The proposed method is sufficiently flexible to allow the researcher to choose how performance is measured, does not rely upon distributional assumptions, and can be extended to analyze many other randomized numeric optimization routines. As a result, this approach offers a convenient, flexible, and reliable technique for comparing algorithms in a wide variety of applications
Medication adherence and visit-to-visit variability of systolic blood pressure in African Americans with chronic kidney disease in the AASK trial
Lower adherence to antihypertensive medications may increase visit-to-visit variability of blood pressure (VVV of BP), a risk factor for cardiovascular events and death. We used data from the African American Study of Kidney Disease and Hypertension (AASK) trial to examine whether lower medication adherence is associated with higher systolic VVV of BP in African Americans with hypertensive chronic kidney disease (CKD). Determinants of VVV of BP were also explored. AASK participants (n=988) were categorized by self-report or pill count as having perfect (100%), moderately high (75–99%), moderately low (50–74%) or low ( < 50%) proportion of study visits with high medication adherence over a 1-year follow-up period. We used multinomial logistic regression to examine determinants of medication adherence, and multivariable-adjusted linear regression to examine the association between medication adherence and systolic VVV of BP, defined as the coefficient of variation or the average real variability (ARV). Participants with lower self-reported adherence were generally younger and had a higher prevalence of comorbid conditions. Compared with perfect adherence, moderately high, moderately low and low adherence was associated with 0.65% (±0.31%), 0.99% (±0.31%) and 1.29% (±0.32%) higher systolic VVV of BP (defined as the coefficient of variation) in fully adjusted models. Results were qualitatively similar when using ARV or when using pill counts as the measure of adherence. Lower medication adherence is associated with higher systolic VVV of BP in African Americans with hypertensive CKD; efforts to improve medication adherence in this population may reduce systolic VVV of BP
Loss-Based Estimation with Evolutionary Algorithms and Cross-Validation
Many statistical inference methods rely upon selection procedures to estimate a parameter of the joint distribution of explanatory and outcome data, such as the regression function. Within the general framework for loss-based estimation of Dudoit and van der Laan, this project proposes an evolutionary algorithm (EA) as a procedure for risk optimization. We also analyze the size of the parameter space for polynomial regression under an interaction constraints along with constraints on either the polynomial or variable degree
A general framework for statistical performance comparison of evolutionary computation algorithms
This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A two-fold sampling scheme for collecting performance data is introduced, and this data is assessed using a multiple hypothesis testing framework relying on a bootstrap resampling procedure. The proposed method offers a convenient, flexible, and reliable approach to comparing algorithms in a wide variety of applications. KEY WORDS Evolutionary computation, statistics, performanc
Confidence Intervals for Negative Binomial Random Variables of High Dispersion
This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, it is a common practice to rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence in distribution to the Normal as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) will construct confidence intervals for the mean that are typically too narrow and significantly undercover in the case of high dispersion. To address this problem, we propose methods based upon Bernstein’s inequality along with the Gamma and Chi Square distributions as alternatives to the standard methods when the sample size is small and the dispersion is high. A confidence interval based upon Bernstein’s inequality relies upon less stringent assumptions than those required of parametric models. Moreover, we prove a limit theorem demonstrating that the sample mean of Negative Binomials converges in distribution to a Gamma random variable under suitable hypotheses, and we use this observation to construct approximate confidence intervals. Furthermore, we investigate the applicability of the Chi Square distribution as a special case of the Gamma model. We then undertake a variety of simulation experiments to compare the proposed methods to standard techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals are preferred. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard techniques
sj-docx-1-jtt-10.1177_1357633X231202284 - Supplemental material for Declining trends in telehealth utilization in the ongoing COVID-19 pandemic
Supplemental material, sj-docx-1-jtt-10.1177_1357633X231202284 for Declining trends in telehealth utilization in the ongoing COVID-19 pandemic by David Shilane and Ting’an Heidi Lu in Journal of Telemedicine and Telecare</p