126 research outputs found

    Generating correlated discrete ordinal data using R and SAS IML.

    Get PDF
    Correlated ordinal data are common in many areas of research. The data may arise from longitudinal studies in biology, medical, or clinical fields. The prominent characteristic of these data is that the within-subject observations are correlated, whilst between-subject observations are independent. Many methods have been proposed to analyze correlated ordinal data. One way to evaluate the performance of a proposed model or the performance of small or moderate size data sets is through simulation studies. It is thus important to provide a tool for generating correlated ordinal data to be used in simulation studies. In this paper, we describe a macro program on how to generate correlated ordinal data based on R language and SAS IML

    Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

    Get PDF
    Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output

    Regression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSS

    Get PDF
    A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results

    Probit models: Regression parameter estimation using the ML principle despite misspecification of the correlation structure

    Get PDF
    In this paper it is shown that using the maximum likelihood (ML) principle for the estimation of multivariate probit models leads to consistent and normally distributed pseudo maximum likelihood regression parameter estimators (PML estimators) even if the `true' correlation structure of the responses is misspecified. As a consequence, e.g. the PML estimator of the random effects probit model may be used to estimate the regression parameters of a model with any `true' correlation structure. This result is independent of the kind of covariates included in the model. The results of a Monte Carlo experiment show that the PML estimator of the independent binary probit model is inefficient relative to the PML estimator of the random effects binary panel probit model and two alternative estimators using the `generalized estimating equations' approach proposed by Liang and Zeger (1986), if the `true' correlations are high. If the `true' correlations are low, the differences between the estimators are small in finite samples and for the models used. Generally, the PML estimator of the random effects probit panel model is very efficient relative to the GEE estimators. Therefore, if correlations between binary responses cannot be ruled out and the `true' structure of association is unknown or infeasible to estimate, the PML estimator of the random effects probit model is recommended

    Exact Approaches for Bias Detection and Avoidance with Small, Sparse, or Correlated Categorical Data

    Get PDF
    Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results. One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, better/worse, ect.) is logistic regression. While some potential complications with this approach are widely known, many investigators are unaware that their particular data does not meet the foundational assumptions, since they are not easy to verify. We have developed a routine for determining if a researcher should be concerned about potential bias in logistic regression results, so they can take steps to mitigate the bias or use a different procedure altogether to model the data. Correlated data may arise from common situations such as multi-site medical studies, research on family units, or investigations on student achievement within classrooms. In these circumstance the associations between cluster members must be included in any statistical analysis testing the hypothesis of a connection be-tween two variables in order for results to be valid. Previously investigators had to choose between using a method intended for small or sparse data while assuming independence between observations or a method that allowed for correlation between observations, while requiring large samples to be reliable. We present a new method that allows for small, clustered samples to be assessed for a relationship between a two-level predictor (eg. treatment/control) and a categorical outcome (eg. low/medium/high)

    Statistical approaches for handling longitudinal and cross sectional discrete data with missing values focusing on multiple imputation and probability weighting.

    Get PDF
    Doctor of Philosophy in Science. University of KwaZulu-Natal, Pietermaritzburg, 2018.Abstract available in PDF file

    Vol. 16, No. 2 (Full Issue)

    Get PDF

    Population-averaged models for diagnostic accuracy studies and meta-analysis

    Get PDF
    Modern medical decision making often involves one or more diagnostic tools (such as laboratory tests and/or radiographic images) that must be evaluated for their discriminatory ability to detect presence (or absence) of current health state. The first paper of this dissertation extends regression model diagnostics to the Receiver Operating Characteristic (ROC) curve generalized linear model (ROC-GLM) in the setting of individual-level data from a single study through application of generalized estimating equations (GEE) within a correlated binary data framework (Alonzo and Pepe, 2002). Motivated by the need for model diagnostics for the ROC-GLM model (Krzanowski and Hand, 2009), GEE cluster-deletion diagnostics (Preisser and Qaqish, 1996) are applied in an example data set to identify cases that have undue influence on the model parameters describing the ROC curve. In addition, deletion diagnostics are applied in an earlier stage in the estimation of the ROC-GLM, when a linear model is chosen to represent the relationship between the test measurement and covariates in the control subjects. The second paper presents a new model for diagnostic test accuracy meta-analysis. The common analysis framework for the meta-analysis of diagnostic studies is the generalized linear mixed model, in particular, the bivariate logistic-normal random effects model. Considering that such cluster-specific models are most appropriately used if the model for a given cluster (i.e. study) is of interest, a population-average (PA) model may be appropriate in diagnostic test meta-analysis settings where mean estimates of sensitivity and specificity are desired. A PA model for correlated binomial outcomes is estimated with GEE in the meta-analysis of two data sets. It is compared to an indirect method of estimation of PA parameters based on transformations of bivariate random effects model parameters. The third paper presents an analysis guide for a new SAS macro, PAMETA (Population-averaged meta-analysis), for fitting population-averaged (PA) diagnostic accuracy models with GEE as described in the second paper. The impact of covariates, influential clusters and observations is investigated in the analysis of two example data sets.Doctor of Public Healt

    Sensitivity analysis approaches for incomplete longitudinal data in a multi-centre clinical trial

    Get PDF
    The first major contribution of the thesis is the development of sensitivity analysis strategy for dealing with incomplete longitudinal data. The second important contribution is setting up of simulation experiment to evaluate the performance of some of the sensitivity analysis approaches. The third contribution is that the thesis offers recommendations on which sensitivity analysis strategy to use and in what circumstance. It is recommended that when drawing statistical inferences in the presence of missing data, methods of analysis based on plausible scientific assumptions should be used. One major issue is that such assumptions cannot be verified using the data at hand. In order to verify these assumptions, sensitivity analysis should be performed to investigate the robustness of statistical inferences to plausible alternative assumptions about the missing data. The thesis implemented various sensitivity analysis strategies to incomplete longitudinal CD4 count data in order to investigate the effect of tuberculosis pericarditis (TBP) treatment on CD4 count changes over time. The thesis achieved the first contribution by formulating primary analysis (which assume that the data are missing at random) and then conducting sensitivity analyses to assess whether statistical inferences under the primary analysis model are sensitive to models that assume that the data are not missing at random. The second contribution was achieved via simulation experiment involving formulating hypotheses on how sensitivity analysis strategies would performed under varying rate of missing values and model mis-specification (when the model is mis-specified). The third contribution was achieved based on our experience from the development and application of the sensitivity analysis strategies as well as the simulation experiment. Using the CD4 count data, we observed that statistical inferences under the primary analysis formulation are robust to the sensitivity analyses formulations, suggesting that the mechanism that generated the missing CD4 count measurements is likely to be missing at random. The results also revealed that TBP does not interact with the HIV/AIDS treatment and that TBP treatment had no significant effect on CD4 count changes over time. We have observed in our simulation results that the sensitivity analysis strategies produced unbiased statistical inferences except when a strategy is inappropriately applied in a given trial setting and also, when a strategy is mis-specified. Although the methods considered were applied to data in the IMPI trial setting, these methods can also be applied to clinical trials with similar settings. A sensitivity analysis strategy may not necessarily give bias results because it has been mis-specified, but it may also be that it has been applied in a wrongly defined trial setting. We therefore strongly encourage analysts to carefully study these sensitivity analysis frameworks together with a clearly and precise definition of the trial objective in order to decide on which sensitivity analysis strategy to use

    Powerful modifications of William' test on trend

    Get PDF
    [no abstract
    corecore