36 research outputs found

    Feature Selection with the Boruta Package

    Get PDF
    This article describes a R package Boruta, implementing a novel feature selection algorithm for finding \emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

    Generalized Strong Curvature Singularities and Cosmic Censorship

    Get PDF
    A new definition of a strong curvature singularity is proposed. This definition is motivated by the definitions given by Tipler and Krolak, but is significantly different and more general. All causal geodesics terminating at these new singularities, which we call generalized strong curvature singularities, are classified into three possible types; the classification is based on certain relations between the curvature strength of the singularities and the causal structure in their neighborhood. A cosmic censorship theorem is formulated and proved which shows that only one class of generalized strong curvature singularities, corresponding to a single type of geodesics according to our classification, can be naked. Implications of this result for the cosmic censorship hypothesis are indicated.Comment: LaTeX, 11 pages, no figures, to appear in Mod. Phys. Lett.

    Feature Selection with the Boruta Package

    Get PDF
    This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented

    The need for standardisation in life science research - an approach to excellence and trust

    Get PDF
    Today, academic researchers benefit from the changes driven by digital technologies and the enormous growth of knowledge and data, on globalisation, enlargement of the scientific community, and the linkage between different scientific communities and the society. To fully benefit from this development, however, information needs to be shared openly and transparently. Digitalisation plays a major role here because it permeates all areas of business, science and society and is one of the key drivers for innovation and international cooperation. To address the resulting opportunities, the EU promotes the development and use of collaborative ways to produce and share knowledge and data as early as possible in the research process, but also to appropriately secure results with the European strategy for Open Science (OS). It is now widely recognised that making research results more accessible to all societal actors contributes to more effective and efficient science; it also serves as a boost for innovation in the public and private sectors. However for research data to be findable, accessible, interoperable and reusable the use of standards is essential. At the metadata level, considerable efforts in standardisation have already been made (e.g. Data Management Plan and FAIR Principle etc.), whereas in context with the raw data these fundamental efforts are still fragmented and in some cases completely missing. The CHARME consortium, funded by the European Cooperation in Science and Technology (COST) Agency, has identified needs and gaps in the field of standardisation in the life sciences and also discussed potential hurdles for implementation of standards in current practice. Here, the authors suggest four measures in response to current challenges to ensure a high quality of life science research data and their re-usability for research and innovation

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe
    corecore