Search CORE

98 research outputs found

Model Checking for ROC Regression Analysis

Author: Cai Tianxi
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 06/12/2005
Field of study

The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual process and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the Cystic Fibrosis registry

Collection Of Biostatistics Research Archive

Semiparametric Estimation of Time-Dependent: ROC Curves for Longitudinal Marker Data

Author: Heagerty Patrick
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/12/2003
Field of study

One approach to evaluating the strength of association between a longitudinal marker process and a key clinical event time is through predictive regression methods such as a time-dependent covariate hazard model. For example, a time-varying covariate Cox model specifies the instantaneous risk of the event as a function of the time-varying marker and additional covariates. In this manuscript we explore a second complementary approach which characterizes the distribution of the marker as a function of both the measurement time and the ultimate event time. Our goal is to flexibly extend the standard diagnostic accuracy concepts of sensitivity and specificity to explicitly recognize both the timing of the marker measurement and the timing of disease. The accuracy of a longitudinal marker can be fully characterized using time-dependent receiver operating characteristic (ROC) curves. We detail a semiparametric estimation method for time-dependent ROC curves that adopts a regression quantile approach for longitudinal data introduced by Heagerty and Pepe (1999}. We extend the work of Heagerty and Pepe (1999} by developing asymptotic distribution theory for the ROC estimators where the distributional shape for the marker is allowed to depend on covariates. To illustrate our method, we analyze pulmonary function measurements among cystic fibrosis subjects to assemble a case-control study and estimate ROC curves that assess how well the pulmonary function measurement can distinguish subjects that progress to death from subjects that remain alive. Comparing the results from our semiparametric analysis to a fully parametric method discussed by Etzioni and Pepe (1999} suggests that the ability to relax distributional assumptions may be important in practice

Collection Of Biostatistics Research Archive

Calibrating Observed Differential Gene Expression for the Multiplicity of Genes on the Array

Author: Pepe Margaret S.
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 29/01/2004
Field of study

In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We propose to choose genes based on adjusted p-values (AP values). The AP values are calculated with a resampling based algorithm assuming that no genes are truly differentially expressed, and take into account the multiplicity and dependence encountered in microarray data. Using both simulated data and real microarray data, we assess and compare the performance of our new method with existing methods. The intuitive basis for the AP values and the fact that our procedure has operating characteristics at least as good as existing procedures make it attractive for practical application

Collection Of Biostatistics Research Archive

Survival Model Predictive Accuracy and ROC Curves

Author: Heagerty Patrick
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/12/2003
Field of study

The predictive accuracy of a survival model can be summarized using extensions of the proportion of variation explained by the model, or R^2, commonly used for continuous response models, or using extensions of sensitivity and specificity which are commonly used for binary response models. In this manuscript we propose new time-dependent accuracy summaries based on time-specific versions of sensitivity and specificity calculated over risk sets. We connect the accuracy summaries to a previously proposed global concordance measure which is a variant of Kendall\u27s tau. In addition, we show how standard Cox regression output can be used to obtain estimates of time-dependent sensitivity and specificity, and time-dependent reciever operating characteristic (ROC) curves. Semi-parametric estimation methods appropriate for both proportional hazards and non-proportional hazards data are introduced, evaluated in simulations, and illustrated using two familiar survival data sets

Collection Of Biostatistics Research Archive

Partly Conditional Survival Models for Longitudinal Data

Author: Heagerty Patrick
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/12/2003
Field of study

It is common in longitudinal studies to collect information on the time until a key clinical event, such as death, and to measure markers of patient health at multiple follow-up times. One approach to the joint analysis of survival and repeated measures data adopts a time-varying covariate regression model for the event time hazard. Using this standard approach the instantaneous risk of death at time t is specified as a possibly semi-parametric function of covariate information that has accrued through time t. In this manuscript we decouple the time scale for modeling the hazard from the time scale for accrual of available longitudinal covariate information. Specifically, we propose a class of models that condition on the covariate information through time s and then specifies the conditional hazard for times t where t \u3e s. Our approach parallels the “partly conditional” models proposed by Pepe and Couper (1997} for pure repeated measures applications. Estimation is based on the use of estimating equations applied to clusters of data formed through the creation of derived survival times that measure the time from measurement of covariates to the end of follow-up. Patient follow-up may be terminated either by the occurrence of the event or by censoring. The proposed methods allow a flexible characterization of the association between a longitudinal covariate process and a survival time, and facilitate the direct prediction of survival probabilities in the time-varying covariate setting

Collection Of Biostatistics Research Archive

On combining triads and unrelated subjects data in candidate gene studies: an application to data on testicular cancer.

Author: Hsu Li
Schwartz Stephen M
Starr Jacqueline R
Zheng Yingye
Publication venue: 'S. Karger AG'
Publication date: 01/01/2009
Field of study

Combining data collected from different sources is a cost-effective and time-efficient approach for enhancing the statistical efficiency in estimating weak-to-modest genetic effects or gene-gene or gene-environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family-based and unrelated individual-based (e.g., population-based case-control design). In this paper, we describe a general method that permits the joint estimation of effects on disease risk of genes, environmental factors, and gene-gene/gene-environment interactions under a hybrid design that includes cases, parents of cases, and unrelated individuals. We provide both asymptotic theory and statistical inference. Extensive simulation experiments demonstrate that the proposed estimation and inferential methods perform well in realistic settings. We illustrate the method by an application to a study of testicular cancer

authors@Fred Hutch

The Sensitivity and Specificity of Markers for Event Times

Author: Cai Tianxi
Lumley Thomas
Pepe Margaret S.
Swords Jenny Nancy
Zheng Yingye
Publication venue: Collection of Biostatistics Research Archive
Publication date: 05/04/2005
Field of study

Collection Of Biostatistics Research Archive