80 research outputs found

    Feature selection and validated predictive performance in the domain of Legionella pneumophila: a comparative study

    Get PDF
    textabstractBackground: Genetic comparisons of clinical and environmental Legionella strains form an essential part of outbreak investigations. DNA microarrays often comprise many DNA markers (features). Feature selection and the development of prediction models are particularly challenging in this domain with many variables and comparatively few subjects or data points. We aimed to compare modeling strategies to develop prediction models for classifying infections as clinical or environmental. Methods: We applied a bootstrap strategy for preselecting important features to a database containing 222 Legionella pneumophila strains with 448 continuous markers and a dichotomous outcome (clinical or environmental). Feature selection was done with 50 bootstrap samples resulting in a top 10 of most important features for each of four modeling techniques: classification and regression trees (CART), random forests (RF), support vector machines (SVM) and least absolute shrinkage and selection operator (LASSO). Validation was done in a second bootstrap resampling loop (200x) for evaluation of discriminatory model performance according to the AUC. Results: The top 5 of selected features differed considerably between the various modeling techniques, with only one common feature ("LePn.007B8"). The mean validated AUC-values of the SVM model and the CART model were 0.859 and 0.873 respectively. The LASSO and the RF model showed higher validated AUC-values (0.925 and 0.975 respectively). Conclusions: In the domain of Legionella pneumophila, which comprises many potential features for classifying of infections as clinical or environmental, the RF and LASSO techniques provide good prediction models. The identification of potentially biologically relevant features is highly dependent on the technique used, and should hence be interpreted with caution

    Prediction of COVID-19 Infections for Municipalities in the Netherlands:Algorithm Development and Interpretation

    Get PDF
    BACKGROUND: COVID-19 was first identified in December 2019 in the city of Wuhan, China. The virus quickly spread and was declared a pandemic on March 11, 2020. After infection, symptoms such as fever, a (dry) cough, nasal congestion, and fatigue can develop. In some cases, the virus causes severe complications such as pneumonia and dyspnea and could result in death. The virus also spread rapidly in the Netherlands, a small and densely populated country with an aging population. Health care in the Netherlands is of a high standard, but there were nevertheless problems with hospital capacity, such as the number of available beds and staff. There were also regions and municipalities that were hit harder than others. In the Netherlands, there are important data sources available for daily COVID-19 numbers and information about municipalities. OBJECTIVE: We aimed to predict the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands, using a data set with the properties of 355 municipalities in the Netherlands and advanced modeling techniques. METHODS: We collected relevant static data per municipality from data sources that were available in the Dutch public domain and merged these data with the dynamic daily number of infections from January 1, 2020, to May 9, 2021, resulting in a data set with 355 municipalities in the Netherlands and variables grouped into 20 topics. The modeling techniques random forest and multiple fractional polynomials were used to construct a prediction model for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands. RESULTS: The final prediction model had an R(2) of 0.63. Important properties for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality in the Netherlands were exposure to particulate matter with diameters <10 μm (PM10) in the air, the percentage of Labour party voters, and the number of children in a household. CONCLUSIONS: Data about municipality properties in relation to the cumulative number of confirmed infections in a municipality in the Netherlands can give insight into the most important properties of a municipality for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality. This insight can provide policy makers with tools to cope with COVID-19 and may also be of value in the event of a future pandemic, so that municipalities are better prepared

    Prediction of Medical Outcomes with Modern Modelling Techniques

    Get PDF
    Het doel van dit onderzoek is te onderzoeken onder welke omstandigheden en onder welke condities relatief moderne modelleringstechnieken zoals support vector machines, neural networks en random forests voordelen zouden kunnen hebben in medisch-wetenschappelijk onderzoek en in de medische praktijk in vergelijking met meer traditionele modelleringstechnieken, zoals lineaire regressie, logistische regressie en Cox regressie

    Prediction of survival with alternative modeling techniques using pseudo values

    Get PDF
    Background: The use of alternative modeling techniques for predicting patient survival is complicated by the fact that some alternative techniques cannot readily deal with censoring, which is essential for analyzing survival data. In the current study, we aimed to demonstrate that pseudo values enable statistically appropriate analyses of survival outcomes when used in seven alternative modeling techniques. Methods: In this case study, we analyzed survival of 1282 Dutch patients with newly diagnosed Head and Neck Squamous Cell Carcinoma (HNSCC) with conventional Kaplan-Meier and Cox regression analysis. We subsequently calculated pseudo values to reflect the individual survival patterns. We used these pseudo values to compare recursive partitioning (RPART), neural nets (NNET), logistic regression (LR) general linear models (GLM) and three variants of support vector machines (SVM) with respect to dichotomous 60-month survival, and continuous pseudo values at 60 months or estimated survival time. We used the area under the ROC curve (AUC) and the root of the mean squared error (RMSE) to compare the performance of these models using bootstrap validation. Results: Of a total of 1282 patients, 986 patients died during a median follow-up of 66 months (60-month survival: 52% [95% CI: 50%-55%]). The L

    Single-and multiple viral respiratory infections in children: Disease and management cannot be related to a specific pathogen

    Get PDF
    Background: The number of viral pathogens associated with pediatric acute respiratory tract infection (ARI) has grown since the introduction of reverse transcription real-time polymerase chain reaction (RT-PCR) assays. Multiple viruses are detected during a single ARI episode in approximately a quarter of all cases. The clinical relevance of these multiple detections is unclear, as is the role of the individual virus. We therefore investigated the correlation between clinical data and RT-PCR results in children with single- and multiple viral ARI. Methods: Data from children with ARI were prospectively collected during two winter seasons. RT-PCR testing for 15 viruses was performed in 560 ARI episodes. In the patients with a single-viral etiology, clinical data, laboratory findings, patient management- and outcome data were compared between the different viruses. With this information, we compared data from children of whom RT-PCR data were negative, with children with single- and multiple viral positive results. Results: The viral detection rate was 457/560 (81.6%) of which 331/560 (59.1%) were single infections and 126/560 (22.5%) were multiple infections. In single viral infections, some statistically significant differences in demographics, clinical findings, disease severity and outcome were found between children with different viral etiologies. However, no clinically recognizable pattern was established to be virus-specific. In a multivariate analysis, the only variables that were correlated with longer hospital stay were the use of oxygen and nebulizer therapy, irrespective of the viral pathogen. Children with RT-PCR positive test results had a significant higher disease severity, fever, length of hospital stay, days of extra oxygen supply, and days of antibiotic treatment than children with a negative RT-PCR test result. For children with single- versus children with multiple positive RT-PCR test results, these differences were not significant. Conclusions: Disease (severity), management and outcome in pediatric ARI are not associated with a specific virus. Single- and multiple viral ARI do not significantly differ with regard to clinical outcome and patient management. For general pediatrics, RT-PCR assays should be restricted to pathogens for which therapy is available or otherwise may have clinical consequences. Further research with an extended panel of RT-PCR assays and a larger number of inclusions is necessary to further validate our findings

    Dementia and delirium, the outcomes in elderly hip fracture patients

    Get PDF
    Background: Delirium in hip fractured patients is a frequent complication. Dementia is an important risk factor for delirium and is common in frail elderly. This study aimed to extend the previous knowledge on risk factors for delirium and the consequences. Special attention was given to patients with dementia and delirium. Methods: This is a retrospective cohort study performed in the Amphia Hospital, Breda, the Netherlands. A full electronic patient file system (Hyperspace Version IU4: Epic, Inc., Verona, WI, USA) was used to assess data between January 2014 and September 2015. All patients presented were aged ≥70 years with a hip fracture, who underwent surgery with osteosynthesis or arthroplasty. Patients were excluded in case of a pathological or a periprosthetic hip fracture, multiple traumatic injuries, and high-energy trauma. Patient and surgical characteristics were documented. Postoperative outcomes were noted. Delirium was screened using Delirium Observation Screening Scale and dementia was assessed from medical notes. Results: Of a total of 566 included patients, 75% were females. The median age was 84 years (interquartile range: 9). Delirium was observed in 35%. Significant risk factors for delirium were a high American Society of Anesthesiology score, delirium in medical history, functional dependency, preoperative institutionalization, low hemoglobin level, and high amount of blood transfusion. Delirium was correlated with a longer hospital stay (P=0.001), increased association with comp

    Prediction of intracranial findings on CT-scans by alternative modelling techniques

    Get PDF
    Background: Prediction rules for intracranial traumatic findings in patients with minor head injury are designed to reduce the use of computed tomography (CT) without missing patients at risk for complications. This study investigates whether alternative modelling techniques might improve the applicability and simplicity of such prediction rules. Methods. We included 3181 patients with minor head injury who had received CT scans be
    • …
    corecore