605 research outputs found

    Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer

    Full text link
    With the long-term rapid increase in incidences of colorectal cancer (CRC), there is an urgent clinical need to improve risk stratification. The conventional pathology report is usually limited to only a few histopathological features. However, most of the tumor microenvironments used to describe patterns of aggressive tumor behavior are ignored. In this work, we aim to learn histopathological patterns within cancerous tissue regions that can be used to improve prognostic stratification for colorectal cancer. To do so, we propose a self-supervised learning method that jointly learns a representation of tissue regions as well as a metric of the clustering to obtain their underlying patterns. These histopathological patterns are then used to represent the interaction between complex tissues and predict clinical outcomes directly. We furthermore show that the proposed approach can benefit from linear predictors to avoid overfitting in patient outcomes predictions. To this end, we introduce a new well-characterized clinicopathological dataset, including a retrospective collective of 374 patients, with their survival time and treatment information. Histomorphological clusters obtained by our method are evaluated by training survival models. The experimental results demonstrate statistically significant patient stratification, and our approach outperformed the state-of-the-art deep clustering methods

    Improving SIEM for critical SCADA water infrastructures using machine learning

    Get PDF
    Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work helps in accelerating the mitigation process by notifying the operator with additional information when an anomaly occurs. This additional information includes the probability and confidence level of event(s) occurring. The model is trained and tested using a real-world dataset

    Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches

    Get PDF
    Background: Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism. Methods: Five clinical trial datasets that employed a reminder system at follow-up were used. Some quality of life questionnaires were initially missing, but later recovered through reminders. Four methods of determining the missing data mechanism were applied. Two response data scenarios were considered. Firstly, immediate data only; secondly, all observed responses (including reminder-response). Results: In three of five trials the hypothesis tests found evidence against the MCAR assumption. Logistic regression suggested MAR, but was able to use the reminder-collected data to highlight potential MNAR data in two trials. Conclusion: The four methods were consistent in determining the missingness mechanism. One hypothesis test was preferred as it is applicable with intermittent missingness. Some inconsistencies between the two data scenarios were found. Ignoring the reminder data could potentially give a distorted view of the missingness mechanism. Utilising reminder data allowed the possibility of MNAR to be considered.The Chief Scientist Office of the Scottish Government Health Directorate. Research Training Fellowship (CZF/1/31

    Factors affecting post-fire crown regeneration in cork oak (Quercus suber L.) trees

    Get PDF
    Cork oak (Quercus suber) forests are acknowledged for their biodiversity and economic (mainly cork production) values. WildWres are one of the main threats contributing to cork oak decline in the Mediterranean Basin, and one major question that managers face after Wre in cork oak stands is whether the burned trees should be coppiced or not. This decision can be based on the degree of expected crown regeneration assessed immediately after Wre. In this study we carried out a post-Wre assessment of the degree of crown recovery in 858 trees being exploited for cork production in southern Portugal, 1.5 years after a wildWre. Using logistic regression, we modelled good or poor crown recovery probability as a function of tree and stand variables. The main variables inXuencing the likelihood of good or poor crown regeneration were bark thickness, charring height, aspect and tree diameter. We also developed management models, including simpler but easier to measure variables, which had a lower predictive power but can be used to help managers to identify, immediately after Wre, trees that will likely show good crown regeneration, and trees that will likely die or show poor regeneration (and thus, potential candidates for trunk coppicin

    Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004.</p> <p>Methods</p> <p>Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0). Since nutrition status is ordinal, an OLR model-proportional odds model (POM) can be developed instead of two separate BLR models to find predictors of both malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption.</p> <p>Results</p> <p>All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models.</p> <p>Conclusion</p> <p>These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models.</p

    A prognostic tool to identify adolescents at high risk of becoming daily smokers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The American Academy of Pediatrics advocates that pediatricians should be involved in tobacco counseling and has developed guidelines for counseling. We present a prognostic tool for use by health care practitioners in both clinical and non-clinical settings, to identify adolescents at risk of becoming daily smokers.</p> <p>Methods</p> <p>Data were drawn from the Nicotine Dependence in Teens (NDIT) Study, a prospective investigation of 1293 adolescents, initially aged 12-13 years, recruited in 10 secondary schools in Montreal, Canada in 1999. Questionnaires were administered every three months for five years. The prognostic tool was developed using estimated coefficients from multivariable logistic models. Model overfitting was corrected using bootstrap cross-validation. Goodness-of-fit and predictive ability of the models were assessed by R<sup>2</sup>, the c-statistic, and the Hosmer-Lemeshow test.</p> <p>Results</p> <p>The 1-year and 2-year probability of initiating daily smoking was a joint function of seven individual characteristics: age; ever smoked; ever felt like you needed a cigarette; parent(s) smoke; sibling(s) smoke; friend(s) smoke; and ever drank alcohol. The models were characterized by reasonably good fit and predictive ability. They were transformed into user-friendly tables such that the risk of daily smoking can be easily computed by summing points for responses to each item. The prognostic tool is also available on-line at <url>http://episerve.chumontreal.qc.ca/calculation_risk/daily-risk/daily_smokingadd.php</url>.</p> <p>Conclusions</p> <p>The prognostic tool to identify youth at high risk of daily smoking may eventually be an important component of a comprehensive tobacco control system.</p
    corecore