10 research outputs found

    Stable Iterative Variable Selection

    Get PDF
    Motivation: The emergence of datasets with tens of thousands of features, such as high-throughput omics biomedical data, highlights the importance of reducing the feature space into a distilled subset that can truly capture the signal for research and industry by aiding in finding more effective biomarkers for the question in hand. A good feature set also facilitates building robust predictive models with improved interpretability and convergence of the applied method due to the smaller feature space. Results: Here, we present a robust feature selection method named Stable Iterative Variable Selection (SIVS) and assess its performance over both omics and clinical data types. As a performance assessment metric, we compared the number and goodness of the selected feature using SIVS to those selected by Least Absolute Shrinkage and Selection Operator regression. The results suggested that the feature space selected by SIVS was, on average, 41% smaller, without having a negative effect on the model performance. A similar result was observed for comparison with Boruta and caret RFE. Availability and implementation: The method is implemented as an R package under GNU General Public License v3.0 and is accessible via Comprehensive R Archive Network (CRAN) via https://cran.r-project.org/package¼sivs. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.</p

    Easy-to-use tool for evaluating the elevated acute kidney injury risk against reduced cardiovascular disease risk during intensive blood pressure control

    Get PDF
    Objective: The Systolic Blood Pressure Intervention Trial (SPRINT) reported that lowering SBP to below 120 mmHg (intensive treatment) reduced cardiovascular morbidity and mortality among adults with hypertension but increased the incidence of adverse events, particularly acute kidney injury (AKI). The goal of this study was to develop an accurate risk estimation tool for comparing the risk of cardiovascular events and adverse kidney-related outcomes between standard and intensive antihypertensive treatment strategies.Methods: By applying Lasso regression on the baseline characteristics and health outcomes of 8760 participants with complete baseline information in the SPRINT trial, we developed predictive models for primary cardiovascular disease (CVD) outcome and incidence of AKI. Both models were validated against an independent test set of the SPRINT trial (one third of data not used for model building) and externally against the cardiovascular and renal outcomes available in Action to Control Cardiovascular Risk in Diabetes Blood Pressure trial, consisting of 4733 participants with type 2 diabetes mellitus.Results: Lasso regression identified a subset of variables that accurately predicted the primary CVD outcome and the incidence of AKI (areas under receiver-operating characteristic curves 0.70 and 0.77, respectively). Based on the validated risk models, an easy-to-use risk assessment tool was developed and made available as an easy-to-use online tool.Conclusion: By predicting the risks of CVD and AKI at baseline, the developed tool can be used to weigh the benefits of intensive versus standard blood pressure control and to identify those who are likely to benefit most from intensive treatment.</p

    A predictive model of overall survival in patients with metastatic castration-resistant prostate cancer

    Get PDF
    Metastatic castration resistant prostate cancer (mCRPC) is one of the most common cancers with a poor prognosis. To improve prognostic models of mCRPC, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) Consortium organized a crowdsourced competition known as the Prostate Cancer DREAM Challenge. In the competition, data from four phase III clinical trials were utilized. A total of 1600 patients’ clinical information across three of the trials was used to generate prognostic models, whereas one of the datasets (313 patients) was held out for blinded validation. As a performance baseline, a model presented in a recent study (so called Halabi model) was used to assess improvements of the new models. This paper presents the model developed by the team TYTDreamChallenge to predict survival risk scores for mCRPC patients at 12, 18, 24 and 30-months after trial enrollment based on clinical features of each patient, as well as an improvement of the model developed after the challenge. The TYTDreamChallenge model performed similarly as the gold-standard Halabi model, whereas the post-challenge model showed markedly improved performance. Accordingly, a main observation in this challenge was that the definition of the clinical features used plays a major role and replacing our original larger set of features with a small subset for training increased the performance in terms of integrated area under the ROC curve from 0.748 to 0.779.</p

    Differential ATAC-seq and ChIP-seq peak detection using ROTS

    Get PDF
    Changes in cellular chromatin states fine-tune transcriptional output and ultimately lead to phenotypic changes. Here we propose a novel application of our reproducibility-optimized test statistics (ROTS) to detect differential chromatin states (ATAC-seq) or differential chromatin modification states (ChIP-seq) between conditions. We compare the performance of ROTS to existing and widely used methods for ATAC-seq and ChIP-seq data using both synthetic and real datasets. Our results show that ROTS outperformed other commonly used methods when analyzing ATAC-seq data. ROTS also displayed the most accurate detection of small differences when modeling with synthetic data. We observed that two-step methods that require the use of a separate peak caller often more accurately called enrichment borders, whereas one-step methods without a separate peak calling step were more versatile in calling sub-peaks. The top ranked differential regions detected by the methods had marked correlation with transcriptional differences of the closest genes. Overall, our study provides evidence that ROTS is a useful addition to the available differential peak detection methods to study chromatin and performs especially well when applied to study differential chromatin states in ATAC-seq data. </p

    Data-Independent Acquisition Mass Spectrometry in Metaproteomics of Gut Microbiota—Implementation and Computational Analysis

    Get PDF
    Metagenomic approaches focus on taxonomy or gene annotation but lack power in defining functionality of gut microbiota. Therefore, metaproteomics approaches have been introduced to overcome this limitation. However, the common metaproteomics approach uses data-dependent acquisition mass spectrometry, which is known to have limited reproducibility when analyzing samples with complex microbial composition. In this work, we provide a proof-of-concept for data-independent acquisition (DIA) metaproteomics. To this end, we analyze metaproteomes using DIA mass spectrometry and introduce an open-source data analysis software package diatools, which enables accurate and consistent quantification of DIA metaproteomics data. We demonstrate the feasibility of our approach in gut microbiota metaproteomics using laboratory assembled microbial mixtures as well as human fecal samples. </p

    Histone H3K4me3 breadth in hypoxia reveals endometrial core functions and stress adaptation linked to endometriosis

    Get PDF
    Trimethylation of histone H3 at lysine 4 (H3K4me3) is a marker of active promoters. Broad H3K4me3 promoter domains have been associated with cell type identity, but H3K4me3 dynamics upon cellular stress have not been well characterized. We assessed this by exposing endometrial stromal cells to hypoxia, which is a major cellular stress condition. We observed that hypoxia modifies the existing H3K4me3 marks and that promoter H3K4me3 breadth rather than height correlates with transcription. Broad H3K4me3 domains mark genes for endometrial core functions and are maintained or selectively extended upon hypoxia. Hypoxic extension of H3K4me3 breadth associates with stress adaptation genes relevant for the survival of endometrial cells including transcription factor KLF4, for which we found increased protein expression in the stroma of endometriosis lesions. These results substantiate the view on broad H3K4me3 as a marker of cell identity genes and reveal participation of H3K4me3 extension in cellular stress adaptation

    A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection

    Get PDF
    The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

    A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection

    Get PDF
    The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses

    Source code for "A predictive model of overall survival in patients with metastatic castration-resistant prostate cancer"

    No full text
    <p>These files are the source code that can be used in replicating our analysis, which has been explained in the paper "A predictive model of overall survival in patients with metastatic castration-resistant prostate cancer". Minor modifications of the source code might be required to produce the different models in the paper.</p> <p>The files presented here are the latest version at the time of publishing the paper. To get the most recent version of these files, navigate to:</p> <p>https://bitbucket.org/mehrad_mahmoudian/dream-prostate-cancer-challenge-q.1a/</p> <p>For further support, please submit any issues to the "Issues" section of the Bitbucket repository.</p
    corecore