310 research outputs found

    Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction

    Get PDF
    We propose a new method for predicting censored (and non-censored) clinical outcomes from a highly-complex covariate space. Previously we suggested a unified strategy for predictor construction, selection, and performance assessment. Here we introduce a new algorithm which generates a piecewise constant estimation sieve of candidate predictors based on an intensive and comprehensive search over the entire covariate space. This algorithm allows us to elucidate interactions and correlation patterns in addition to main effects

    Cross-Validating and Bagging Partitioning Algorithms with Variable Importance

    Get PDF
    We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated

    Regression Trees and Ensembles for Cumulative Incidence Functions

    Full text link
    The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods

    Comparative Genomic Hybridization Array Analysis

    Get PDF
    At the present time, there is increasing evidence that cancer may be regulated by the number of copies of genes in tumor cells. Through microarray technology it is now possible to measure the number of copies of thousands of genes and gene segments in samples of chromosomal DNA. Microarray comparative genomic hybridization (array CGH) provides the opportunity to both measure DNA sequence copy number gains and losses and map these aberrations to the genomic sequence. Gains can signify the over-expression of oncogenes, genes which stimulate cell growth and have become hyperactive, while losses can signify under-expression of tumor suppressor genes, genes whose activity stops the formation of tumors. In order to better understand the progression of cancer and the differences between cancer and non-cancer tissue it is of great importance to fully understand what is happening at the chromosomal level. In the hopes of finding a genetic signature for subtypes of cancer, it is our intention to explore statistical approaches to array CGH data. The Waldman Lab at UCSF-CCC graciously allowed us to access data from their renal cancer study. This project was designed to determine whether microarray information on copy number of genes could be used to discriminate among four subtypes of renal cancer

    Factor analysis for survival time prediction with informative censoring and diverse covariates

    Get PDF
    Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential proportional hazards (EPH) model for continuous survival time with informative censoring. The factor and EPH models are connected through low‐dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model's utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower‐grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high‐dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to three alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low‐dimensional visualization for diffuse lower‐grade glioma displays known subpopulations

    Survival Point Estimate Prediction in Matched and Non-Matched Case-Control Subsample Designed Studies

    Get PDF
    Providing information about the risk of disease and clinical factors that may increase or decrease a patient\u27s risk of disease is standard medical practice. Although case-control studies can provide evidence of strong associations between diseases and risk factors, clinicians need to be able to communicate to patients the age-specific risks of disease over a defined time interval for a set of risk factors. An estimate of absolute risk cannot be determined from case-control studies because cases are generally chosen from a population whose size is not known (necessary for calculation of absolute risk) and where duration of follow-up is not known (necessary for calculation of incidence). This problem can sometimes be overcome by using a nested case-control design. We have collected data on a National Cancer Institute funded population-based cohort study. This study contains a matched set of cases and controls within the cohort. This design is more cost-efficient than a full cohort study since expensive predictor variables (genomic measures, sex hormone levels, mammographic breast density) are measured on all of the cases, but on only a sample of the cohort who did not develop the outcome of interest (the controls). In addition, this design avoids the potential biases of conventional case-control studies that draw cases and controls from different populations. Importantly, the presence or absence of the outcome of interest has been established for the entire cohort within the same time period. The specifics of the sampling in our study do not adhere to the assumptions for absolute risk estimation methods previously developed in the literature. Here we introduce a novel method which provides locally efficient estimators to predict the absolute risk of a cohort from measures only taken on the matched case-control participants. The proposed method is evaluated using simulation studies and survival data from women with ductal carcinoma in situ, a non-invasive form of breast cancer. A generalization of the proposed method is related to other similar sampling designs such as nested case-control, case-cohort, and two-stage case-control

    Characterization of Metabolic, Diffusion, and Perfusion Properties in GBM: Contrast-Enhancing versus Non-Enhancing Tumor.

    Get PDF
    BackgroundAlthough the contrast-enhancing (CE) lesion on T1-weighted MR images is widely used as a surrogate for glioblastoma (GBM), there are also non-enhancing regions of infiltrative tumor within the T2-weighted lesion, which elude radiologic detection. Because non-enhancing GBM (Enh-) challenges clinical patient management as latent disease, this study sought to characterize ex vivo metabolic profiles from Enh- and CE GBM (Enh+) samples, alongside histological and in vivo MR parameters, to assist in defining criteria for estimating total tumor burden.MethodsFifty-six patients with newly diagnosed GBM received a multi-parametric pre-surgical MR examination. Targets for obtaining image-guided tissue samples were defined based on in vivo parameters that were suspicious for tumor. The actual location from where tissue samples were obtained was recorded, and half of each sample was analyzed for histopathology while the other half was scanned using HR-MAS spectroscopy.ResultsThe Enh+ and Enh- tumor samples demonstrated comparable mitotic activity, but also significant heterogeneity in microvascular morphology. Ex vivo spectroscopic parameters indicated similar levels of total choline and N-acetylaspartate between these contrast-based radiographic subtypes of GBM, and characteristic differences in the levels of myo-inositol, creatine/phosphocreatine, and phosphoethanolamine. Analysis of in vivo parameters at the sample locations were consistent with histological and ex vivo metabolic data.ConclusionsThe similarity between ex vivo levels of choline and NAA, and between in vivo levels of choline, NAA and nADC in Enh+ and Enh- tumor, indicate that these parameters can be used in defining non-invasive metrics of total tumor burden for patients with GBM
    corecore