234 research outputs found

    Mixtures of Receiver Operating Characteristic Curves

    Get PDF
    Rationale and Objectives: ROC curves are ubiquitous in the analysis of imaging metrics as markers of both diagnosis and prognosis. While empirical estimation of ROC curves remains the most popular method, there are several reasons to consider smooth estimates based on a parametric model. Materials and Methods: A mixture model is considered for modeling the distribution of the marker in the diseased population motivated by the biological observation that there is more heterogeneity in the diseased population than there is in the normal one. It is shown that this model results in an analytically tractable ROC curve which is itself a mixture of ROC curves. Results: The use of CK-BB isoenzyme in diagnosis of severe head trauma is used as an example. ROC curves are fit using the direct binormal method, ROCKIT and the Box-Cox transformation as well as the proposed mixture model. The mixture model generates an ROC curve that is much closer to the empirical one than the other methods considered. Conclusions: Mixtures of ROC curves can be helpful in fitting smooth ROC curves in datasets where the diseased population has higher variability than can be explained by a single distribution

    Visualizing Longitudinal Data with Dropouts

    Get PDF
    A triangle plot is proposed to display longitudinal data with dropouts. The triangle plot is a tool of data visualization that can also serve as a graphical check for informativeness of the dropout process. There are similarities between the lasagna plot and the triangle plot but the explicit use of dropout time as an axis is an advantage of the triangle plot over the more commonly used graphical strategies for longitudinal data. It is possible to interpret the triangle plot as a trellis plot 1 which gives rise to several extensions such as the triangle histogram and the triangle boxplot. R code is available to streamline the use of the triangle plot in practice

    Computing the Total Sample Size When Group Sizes Are Not Fixed

    Get PDF
    This article is concerned with computing the total sample size required for a two-sample comparison when the sizes of the two groups to be compared cannot be fixed in advance. This is frequently encountered when group membership depends on a variable which is observable only after the subject is enrolled to the study, such as a genetic or a biological marker. The most common way of circumventing this problem is assuming a fixed number for the prevalence of the condition that will determine the group membership and compute the required sample size conditionally. In this article this practice is formalized by placing a prior distribution on the prevalence which results in an analytically tractable formula for the unconditional sample size. In particular a sample size inflation factor, a number that can be multiplied with conditional sample size, is presented. An example is given from the planning of a clinical trial investigating the prognostic role of molecular markers in gastrointestinal stromal cancer

    Optimal Cutpoint Estimation with Censored Data

    Get PDF
    We consider the problem of selecting an optimal cutpoint for a continuous marker when the outcome of interest is subject to right censoring. Maximal chi square methods and receiver operating characteristic (ROC) curves-based methods are commonly-used when the outcome is binary. In this article we show that selecting the cutpoint that maximizes the concordance, a metric similar to the area under an ROC curve, is equivalent to maximizing the Youden index, a popular criterion when the ROC curve is used to choose a threshold. We use this as a basis for proposing maximal concordance as a metric to use with censored endpoints. Through simulations we evaluate the performance of two concordance estimates and three chi-square statistics under various assumptions. Maximizing the partial likelihood ratio test statistic has the best performance in our simulations

    Building a Nomogram for Survey-Weighted Cox Models Using R

    Get PDF
    Nomograms have become a very useful tool among clinicians as they provide individualized predictions based on the characteristics of the patient. For complex design survey data with survival outcome, Binder (1992) proposed methods for fitting survey-weighted Cox models, but to the best of our knowledge there is no available software to build a nomogram based on such models. This paper introduces R software to accomplish this goal and illustrates its use on a gastric cancer dataset. Validation and calibration routines are also included

    Lehmann Family of ROC Curves

    Get PDF
    Receiver operating characteristic (ROC) curves are useful in evaluating the ability of a continuous marker in discriminating between the two states of a binary outcome such as diseased/not diseased. The most popular parametric model for an ROC curve is the binormal model which assumes that the marker is normally distributed conditional on the outcome. Here we present an alternative to the binormal model based on the Lehmann family, also known as the proportional hazards specification. The resulting ROC curve and its functionals (such as the area under the curve) have simple analytic forms. We derive closed-form expressions for the asymptotic variances of the estimators for various quantities of interest. This family easily accommodates comparison of multiple markers, covariate adjustments and clustered data through a regression formulation. Evaluation of the underlying assumptions, model fitting and model selection can all be performed using any off the shelf proportional hazards statistical software package

    Semiparametric Bayesian Modeling of Multivariate Average Bioequivalence

    Get PDF
    Bioequivalence trials are usually conducted to compare two or more formulations of a drug. Simultaneous assessment of bioequivalence on multiple endpoints is called multivariate bioequivalence. Despite the fact that some tests for multivariate bioequivalence are suggested, current practice usually involves univariate bioequivalence assessments ignoring the correlations between the endpoints such as AUC and Cmax. In this paper we develop a semiparametric Bayesian test for bioequivalence under multiple endpoints. Specifically, we show how the correlation between the endpoints can be incorporated in the analysis and how this correlation affects the inference. Resulting estimates and posterior probabilities ``borrow strength\u27\u27 from one another where the amount and direction of the strength borrowed are determined by the prior correlations. The method developed is illustrated using a real data set

    Bland-Altman Plots for Evaluating Agreement Between Solid Tumor Measurements

    Get PDF
    Rationale and Objectives. Solid tumor measurements are regularly used in clinical trials of anticancer therapeutic agents and in clinical practice managing patients\u27 care. Consequently studies evaluating the reproducibility of solid tumor measurements are important as lack of reproducibility may directly affect patient management. The authors propose utilizing a modified Bland-Altman plot with a difference metric that lends itself naturally to this situation and facilitates interpretation. Materials and Methods. The modification to the Bland-Altman plot involves replacing the difference plotted on the vertical axis with the relative percent change (RC) between the two measurements. This quantity is the same one used in assessing tumor response to therapeutic agents and is very familiar to radiologists and clinicians working with cancer patients.The distribution of the RC is explored and revised equations for the limits of agreement (LoA) are presented. These methods are applied to positron emission tomography (PET) data studying two radiotracers. Results. The RC can be calculated separately for each lesion measured or at the patient level by summing over lesions within patient. In both cases, the distribution of the RC is highly skewed and is approximated by a negative shifted lognormal distribution. The standard equations for the 95% LoA assume the differences are approximately normally distributed and are not appropriate for the RC. Conclusions. The modified Bland-Altman plot with correctly calculated LoA can aid in evaluating agreement between solid tumor measurements

    A Hybrid Bayesian Laplacian Approach for Generalized Linear Mixed Models

    Get PDF
    The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods such as the adaptive Gaussian quadrature approach perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Similarly, Bayesian methods, though they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article we build on our previous method (Capanu and Begg 2010) and propose a hybrid approach that provides a bridge between the likelihood-based and Bayesian approaches by employing Bayesian estimation for the variance compo- nents followed by Laplacian estimation for the regression coefficients with the goal of obtaining good statistical properties, with relatively good computing speed, and using widely available software. The hybrid approach is shown to perform well against the other competitors considered. Another impor- tant finding of this research is the surprisingly good performance of the Laplacian approximation in the difficult case of binary clustered data with small clusters sizes. We apply the methods to a real study of head and neck squamous cell carcinoma and illustrate their properties using simulations based on a widely-analyzed salamander mating dataset and on another important dataset involving the Guatemalan Child Health survey

    Optimized Variable Selection Via Repeated Data Splitting

    Get PDF
    We introduce a new variable selection procedure that repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. Simulation results show that the proposed variable selection technique enjoys superior performance compared to candidate methods, being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution
    • …
    corecore