118 research outputs found

    Building a Nomogram for Survey-Weighted Cox Models Using R

    Get PDF
    Nomograms have become a very useful tool among clinicians as they provide individualized predictions based on the characteristics of the patient. For complex design survey data with survival outcome, Binder (1992) proposed methods for fitting survey-weighted Cox models, but to the best of our knowledge there is no available software to build a nomogram based on such models. This paper introduces R software to accomplish this goal and illustrates its use on a gastric cancer dataset. Validation and calibration routines are also included

    Optimized Variable Selection Via Repeated Data Splitting

    Get PDF
    We introduce a new variable selection procedure that repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. Simulation results show that the proposed variable selection technique enjoys superior performance compared to candidate methods, being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution

    A Hybrid Bayesian Laplacian Approach for Generalized Linear Mixed Models

    Get PDF
    The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods such as the adaptive Gaussian quadrature approach perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Similarly, Bayesian methods, though they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article we build on our previous method (Capanu and Begg 2010) and propose a hybrid approach that provides a bridge between the likelihood-based and Bayesian approaches by employing Bayesian estimation for the variance compo- nents followed by Laplacian estimation for the regression coefficients with the goal of obtaining good statistical properties, with relatively good computing speed, and using widely available software. The hybrid approach is shown to perform well against the other competitors considered. Another impor- tant finding of this research is the surprisingly good performance of the Laplacian approximation in the difficult case of binary clustered data with small clusters sizes. We apply the methods to a real study of head and neck squamous cell carcinoma and illustrate their properties using simulations based on a widely-analyzed salamander mating dataset and on another important dataset involving the Guatemalan Child Health survey

    Identification of Rare Causal Variants in Sequence-Based Studies: Methods and Applications to VPS13B, a Gene Involved in Cohen Syndrome and Autism

    Get PDF
    Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk

    Building a Nomogram for Survey-Weighted Cox Models Using R

    Get PDF
    Nomograms have become very useful tools among clinicians as they provide individualized predictions based on the characteristics of the patient. For complex design survey data with survival outcome, Binder (1992) proposed methods for fitting survey-weighted Cox models, but to the best of our knowledge there is no available software to build a nomogram based on such models. This paper introduces an R package, SvyNom, to accomplish this goal and illustrates its use on a gastric cancer dataset. Validation and calibration routines are also included

    Pancreatic adenocarcinoma: insights into patterns of recurrence and disease behavior

    No full text
    Abstract Background Pancreatic ductal adenocarcinoma (PDAC) is one of the most aggressive cancers with high metastatic potential. Clinical observations suggest that there is disease heterogeneity among patients with different sites of distant metastases, yielding distinct clinical outcomes. Herein, we investigate the impact of clinical and pathological parameters on recurrence patterns and compare survival outcomes for patients with a first site of recurrence in the liver versus lung from PDAC following original curative surgical resection. Methods Using the Memorial Sloan Kettering Cancer Center ICD billing codes and tumor registry database over a 10 years period (January 2004–December 2014), we identified PDAC patients who underwent resection and subsequently presented with either liver or lung recurrence. Time from relapse to death (TRD) was calculated from date of recurrence to date of death. Using the Kaplan-Meier method, TRD was estimated and compared by recurrence site using log-rank test. Results The median overall follow-up was 37.3 months among survivors in the entire cohort. Median TRD in this cohort was 10.7 months (95%CI: 8.9–14.6 months). Patients with first site of lung recurrence had a more favorable outcome compared to patients who recurred with liver metastasis as the first site of recurrence (median TRD of 15 versus 9 months respectively, P = 0.02). Moderate to poorly or poor differentiation was associated more often with liver than lung recurrence (40% vs 21% respectively, P = 0.047). A trend to increased lymph node metastasis in the lung recurrence cohort was observed. Conclusion PDAC patients who recur with a first site of lung metastasis have an improved clinical outcome compared to patients with first site of liver recurrence. Our data suggests there may be epidemiologic and pathologic determinants related to patterns of recurrence in PDAC
    • …
    corecore