Search CORE

23 research outputs found

Growth Estimators and Confidence Intervals for the Mean of Negative Binomial Random Variables with Unknown Dispersion

Author: Bean Derek
Shilane David
Publication venue
Publication date: 04/03/2012
Field of study

The Negative Binomial distribution becomes highly skewed under extreme dispersion. Even at moderately large sample sizes, the sample mean exhibits a heavy right tail. The standard Normal approximation often does not provide adequate inferences about the data's mean in this setting. In previous work, we have examined alternative methods of generating confidence intervals for the expected value. These methods were based upon Gamma and Chi Square approximations or tail probability bounds such as Bernstein's Inequality. We now propose growth estimators of the Negative Binomial mean. Under high dispersion, zero values are likely to be overrepresented in the data. A growth estimator constructs a Normal-style confidence interval by effectively removing a small, pre--determined number of zeros from the data. We propose growth estimators based upon multiplicative adjustments of the sample mean and direct removal of zeros from the sample. These methods do not require estimating the nuisance dispersion parameter. We will demonstrate that the growth estimators' confidence intervals provide improved coverage over a wide range of parameter values and asymptotically converge to the sample mean. Interestingly, the proposed methods succeed despite adding both bias and variance to the Normal approximation

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Time-Dependent Performance Comparison of Stochastic Optimization Algorithms

Author: Martikainen Jarno
Ovaska Seppo
Shilane David
Publication venue: Collection of Biostatistics Research Archive
Publication date: 27/08/2007
Field of study

This paper proposes a statistical methodology for comparing the performance of stochastic optimization algorithms that iteratively generate candidate optima. The fundamental data structure of the results of these algorithms is a time series. Algorithmic differences may be assessed through a procedure of statistical sampling and multiple hypothesis testing of time series data. Shilane et al. propose a general framework for performance comparison of stochastic optimization algorithms that result in a single candidate optimum. This project seeks to extend this framework to assess performance in time series data structures. The proposed methodology analyzes empirical data to determine the generation intervals in which algorithmic performance differences exist and may be used to guide the selection and design of optimization procedures for the task at hand. Such comparisons may be drawn for general performance metrics of any iterative stochastic optimization algorithm under any (typically unknown) data generating distribution. Additionally, this paper proposes a data reduction procedure to estimate performance differences in a more computationally feasible manner. In doing so, we provide a statistical framework to assess the performance of stochastic optimization algorithms and to design improved procedures for the task at hand

Collection Of Biostatistics Research Archive

The Confounding Influence of Older Age in Statistical Models of Telehealth Utilization

Author: Lu Heidi Ting’an
Shilane David
Zheng Zhenyi
Publication venue: U:1:{s:5:"en_US";s:51:"University Library System, University of Pittsburgh";}
Publication date: 12/12/2023
Field of study

Older age is a potentially confounding variable in models of telehealth utilization. We compared unified and stratified logistic regression models using data from the 2021 National Health Interview Survey. A total of 27,626 patients were identified, of whom 38.9% had utilized telehealth. Unified and stratified modeling showed a number of important differences in their quantitative estimates, especially for gender, Hispanic ethnicity, heart disease, COPD, food allergies, high cholesterol, weak or failing kidneys, liver conditions, difficulty with self-care, the use of mobility equipment, health problems that limit the ability to work, problems paying bills, and filling a recent prescription. Telehealth utilization odds ratios differ meaningfully between younger and older patients in stratified modeling. Traditional statistical adjustments in logistic regression may not sufficiently account for the confounding influence of older age in models of telehealth utilization. Stratified modeling by age may be more effective in obtaining clinical inferences

International Journal of Telerehabilitation

A General Framework for Statistical Performance Comparison of Evolutionary Computation Algorithms

Author: Dudoit Sandrine
Martikainen Jarno
Ovaska Seppo
Shilane David
Publication venue: Collection of Biostatistics Research Archive
Publication date: 16/03/2006
Field of study

This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A two-fold sampling scheme for collecting performance data is introduced, and these data are analyzed using bootstrap-based multiple hypothesis testing procedures. The proposed method is sufficiently flexible to allow the researcher to choose how performance is measured, does not rely upon distributional assumptions, and can be extended to analyze many other randomized numeric optimization routines. As a result, this approach offers a convenient, flexible, and reliable technique for comparing algorithms in a wide variety of applications

Collection Of Biostatistics Research Archive

Medication adherence and visit-to-visit variability of systolic blood pressure in African Americans with chronic kidney disease in the AASK trial

Author: Chang Tara I.
Hong Karen
Kronish Ian M.
Muntner Paul
Shilane David
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 02/04/2015
Field of study

Lower adherence to antihypertensive medications may increase visit-to-visit variability of blood pressure (VVV of BP), a risk factor for cardiovascular events and death. We used data from the African American Study of Kidney Disease and Hypertension (AASK) trial to examine whether lower medication adherence is associated with higher systolic VVV of BP in African Americans with hypertensive chronic kidney disease (CKD). Determinants of VVV of BP were also explored. AASK participants (n=988) were categorized by self-report or pill count as having perfect (100%), moderately high (75–99%), moderately low (50–74%) or low ( < 50%) proportion of study visits with high medication adherence over a 1-year follow-up period. We used multinomial logistic regression to examine determinants of medication adherence, and multivariable-adjusted linear regression to examine the association between medication adherence and systolic VVV of BP, defined as the coefficient of variation or the average real variability (ARV). Participants with lower self-reported adherence were generally younger and had a higher prevalence of comorbid conditions. Compared with perfect adherence, moderately high, moderately low and low adherence was associated with 0.65% (±0.31%), 0.99% (±0.31%) and 1.29% (±0.32%) higher systolic VVV of BP (defined as the coefficient of variation) in fully adjusted models. Results were qualitatively similar when using ARV or when using pill counts as the measure of adherence. Lower medication adherence is associated with higher systolic VVV of BP in African Americans with hypertensive CKD; efforts to improve medication adherence in this population may reduce systolic VVV of BP

Crossref

Columbia University Academic Commons

PubMed Central

Loss-Based Estimation with Evolutionary Algorithms and Cross-Validation

Author: Dudoit Sandrine
Liang Richard H.
Shilane David
Publication venue: Collection of Biostatistics Research Archive
Publication date: 11/11/2007
Field of study

Many statistical inference methods rely upon selection procedures to estimate a parameter of the joint distribution of explanatory and outcome data, such as the regression function. Within the general framework for loss-based estimation of Dudoit and van der Laan, this project proposes an evolutionary algorithm (EA) as a procedure for risk optimization. We also analyze the size of the parameter space for polynomial regression under an interaction constraints along with constraints on either the polynomial or variable degree

Collection Of Biostatistics Research Archive

A general framework for statistical performance comparison of evolutionary computation algorithms

Author: D. Shilane
David Shilane
J. Martikainen
Jarno Martikainen
S. Dudoit
S. J. Ovaska
Publication venue: ACTA Press
Publication date
Field of study

This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A two-fold sampling scheme for collecting performance data is introduced, and this data is assessed using a multiple hypothesis testing framework relying on a bootstrap resampling procedure. The proposed method offers a convenient, flexible, and reliable approach to comparing algorithms in a wide variety of applications. KEY WORDS Evolutionary computation, statistics, performanc

CiteSeerX

Confidence Intervals for Negative Binomial Random Variables of High Dispersion

Author: Alan E. Hubbard
David Shilane
Steven N. Evans
Publication venue
Publication date: 25/08/2008
Field of study

This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, it is a common practice to rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence in distribution to the Normal as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) will construct confidence intervals for the mean that are typically too narrow and significantly undercover in the case of high dispersion. To address this problem, we propose methods based upon Bernstein’s inequality along with the Gamma and Chi Square distributions as alternatives to the standard methods when the sample size is small and the dispersion is high. A confidence interval based upon Bernstein’s inequality relies upon less stringent assumptions than those required of parametric models. Moreover, we prove a limit theorem demonstrating that the sample mean of Negative Binomials converges in distribution to a Gamma random variable under suitable hypotheses, and we use this observation to construct approximate confidence intervals. Furthermore, we investigate the applicability of the Chi Square distribution as a special case of the Gamma model. We then undertake a variety of simulation experiments to compare the proposed methods to standard techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals are preferred. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard techniques

CiteSeerX

Collection Of Biostatistics Research Archive

sj-docx-1-jtt-10.1177_1357633X231202284 - Supplemental material for Declining trends in telehealth utilization in the ongoing COVID-19 pandemic

Author: David Shilane (17066235)
Ting’an Heidi Lu (17066238)
Publication venue
Publication date: 28/09/2023
Field of study

Supplemental material, sj-docx-1-jtt-10.1177_1357633X231202284 for Declining trends in telehealth utilization in the ongoing COVID-19 pandemic by David Shilane and Ting’an Heidi Lu in Journal of Telemedicine and Telecare</p

FigShare