6 research outputs found

    Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis

    Get PDF
    BACKGROUND: There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. RESULTS: We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. CONCLUSIONS: These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing

    YesWorkflow:A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

    Get PDF
    Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems

    Low PCA3 expression is a marker of poor differentiation in localized prostate tumors: exploratory analysis from 12,076 patients.

    Get PDF
    BACKGROUND Prostate cancer antigen 3 (PCA3) is a prostate cancer diagnostic biomarker that has been clinically validated. The limitations of the diagnostic role of PCA3 in initial biopsy and the prognostic role are not well established. Here, we elucidate the limitations of tissue PCA3 to predict high grade tumors in initial biopsy. RESULTS PCA3 has a bimodal distribution in both biopsy and radical prostatectomy (RP) tissues, where low PCA3 expression was significantly associated with high grade disease (p<0.001). PCA3 had a poor performance of predicting high grade disease in initial biopsy (GS≥8) with 55% sensitivity and high false negative rates; 42% of high Gleason (≥8) samples had low PCA3. In RP, low PCA3 is associated with adverse pathological features, clinical recurrence outcome and greater probability of metastatic progression (p<0.001). MATERIALS AND METHODS A total of 1,694 expression profiles from biopsy and 10,382 from RP patients with high risk tumors were obtained from the Decipher Genomic Resource Information Database (GRIDTM)prostate cancer database. The primary clinical endpoint was distant metastasis-free survival for RP and high Gleason grade for biopsy. Logistic regression analyses and Cox proportional hazards models were used to evaluate the association of PCA3 with clinical variables and risk of metastasis. CONCLUSIONS There is high prevalence of high grade tumors with low PCA3 expression in the biopsy setting. Therefore, urologists should be warned that using PCA3 as stand-alone test may lead to high rate of under-diagnosis of high grade disease in initial biopsy setting

    Ability of a Genomic Classifier to Predict Metastasis and Prostate Cancer-specific Mortality after Radiation or Surgery based on Needle Biopsy Specimens

    No full text
    Decipher is a validated genomic classifier developed to determine the biological potential for metastasis after radical prostatectomy (RP). To evaluate the ability of biopsy Decipher to predict metastasis and Prostate cancer-specific mortality (PCSM) in primarily intermediate- to high-risk patients treated with RP or radiation therapy (RT). Two hundred and thirty-five patients treated with either RP (n=105) or RT±androgen deprivation therapy (n=130) with available genomic expression profiles generated from diagnostic biopsy specimens from seven tertiary referral centers. The highest-grade core was sampled and Decipher was calculated based on a locked random forest model. Metastasis and PCSM were the primary and secondary outcomes of the study, respectively. Cox analysis and c-index were used to evaluate the performance of Decipher. With a median follow-up of 6 yr among censored patients, 34 patients developed metastases and 11 died of prostate cancer. On multivariable analysis, biopsy Decipher remained a significant predictor of metastasis (hazard ratio: 1.37 per 10% increase in score, 95% confidence interval [CI]: 1.06–1.78, p=0.018) after adjusting for clinical variables. For predicting metastasis 5-yr post-biopsy, Cancer of the Prostate Risk Assessment score had a c-index of 0.60 (95% CI: 0.50–0.69), while Cancer of the Prostate Risk Assessment plus biopsy Decipher had a c-index of 0.71 (95% CI: 0.60–0.82). National Comprehensive Cancer Network risk group had a c-index of 0.66 (95% CI: 0.53–0.77), while National Comprehensive Cancer Network plus biopsy Decipher had a c-index of 0.74 (95% CI: 0.66–0.82). Biopsy Decipher was a significant predictor of PCSM (hazard ratio: 1.57 per 10% increase in score, 95% CI: 1.03–2.48, p=0.037), with a 5-yr PCSM rate of 0%, 0%, and 9.4% for Decipher low, intermediate, and high, respectively. Biopsy Decipher predicted metastasis and PCSM from diagnostic biopsy specimens of primarily intermediate- and high-risk men treated with first-line RT or RP. Biopsy Decipher predicted metastasis and prostate cancer-specific mortality risk from diagnostic biopsy specimens. Biopsy Decipher was able to predict metastasis and prostate cancer-specific mortality from diagnostic biopsy specimens in a cohort of primarily intermediate- and high-risk men regardless of type of first-line treatment

    Development and Validation of a Novel Integrated Clinical-Genomic Risk Group Classification for Localized Prostate Cancer.

    No full text
    Purpose It is clinically challenging to integrate genomic-classifier results that report a numeric risk of recurrence into treatment recommendations for localized prostate cancer, which are founded in the framework of risk groups. We aimed to develop a novel clinical-genomic risk grouping system that can readily be incorporated into treatment guidelines for localized prostate cancer. Materials and Methods Two multicenter cohorts (n = 991) were used for training and validation of the clinical-genomic risk groups, and two additional cohorts (n = 5,937) were used for reclassification analyses. Competing risks analysis was used to estimate the risk of distant metastasis. Time-dependent c-indices were constructed to compare clinicopathologic risk models with the clinical-genomic risk groups. Results With a median follow-up of 8 years for patients in the training cohort, 10-year distant metastasis rates for National Comprehensive Cancer Network (NCCN) low, favorable-intermediate, unfavorable-intermediate, and high-risk were 7.3%, 9.2%, 38.0%, and 39.5%, respectively. In contrast, the three-tier clinical-genomic risk groups had 10-year distant metastasis rates of 3.5%, 29.4%, and 54.6%, for low-, intermediate-, and high-risk, respectively, which were consistent in the validation cohort (0%, 25.9%, and 55.2%, respectively). C-indices for the clinical-genomic risk grouping system (0.84; 95% CI, 0.61 to 0.93) were improved over NCCN (0.73; 95% CI, 0.60 to 0.86) and Cancer of the Prostate Risk Assessment (0.74; 95% CI, 0.65 to 0.84), and 30% of patients using NCCN low/intermediate/high would be reclassified by the new three-tier system and 67% of patients would be reclassified from NCCN six-tier (very-low- to very-high-risk) by the new six-tier system. Conclusion A commercially available genomic classifier in combination with standard clinicopathologic variables can generate a simple-to-use clinical-genomic risk grouping that more accurately identifies patients at low, intermediate, and high risk for metastasis and can be easily incorporated into current guidelines to better risk-stratify patients
    corecore