570 research outputs found

    Spatial Bayesian Modeling Applied to the Surveys of Xylella fastidiosa in Alicante (Spain) and Apulia (Italy)

    Get PDF
    The plant-pathogenic bacterium Xylella fastidiosa was first reported in Europe in 2013, in the province of Lecce, Italy, where extensive areas were affected by the olive quick decline syndrome, caused by the subsp. pauca. In Alicante, Spain, almond leaf scorch, caused by X. fastidiosa subsp. multiplex, was detected in 2017. The effects of climatic and spatial factors on the geographic distribution of X. fastidiosa in these two infested regions in Europe were studied. The presence/absence data of X. fastidiosa in the official surveys were analyzed using Bayesian hierarchical models through the integrated nested Laplace approximation (INLA) methodology. Climatic covariates were obtained from the WorldClim v.2 database. A categorical variable was also included according to Purcell’s minimum winter temperature thresholds for the risk of occurrence of Pierce’s disease of grapevine, caused by X. fastidiosa subsp. fastidiosa. In Alicante, data were presented aggregated on a 1 km grid (lattice data), where the spatial effect was included in the model through a conditional autoregressive structure. In Lecce, data were observed at continuous locations occurring within a defined spatial domain (geostatistical data). Therefore, the spatial effect was included via the stochastic partial differential equation approach. In Alicante, the pathogen was detected in all four of Purcell’s categories, illustrating the environmental plasticity of the subsp. multiplex. Here, none of the climatic covariates were retained in the selected model. Only two of Purcell’s categories were represented in Lecce. The mean diurnal range (bio2) and the mean temperature of the wettest quarter (bio8) were retained in the selected model, with a negative relationship with the presence of the pathogen. However, this may be due to the heterogeneous sampling distribution having a confounding effect with the climatic covariates. In both regions, the spatial structure had a strong influence on the models, but not the climatic covariates. Therefore, pathogen distribution was largely defined by the spatial relationship between geographic locations

    Muons in air showers at the Pierre Auger Observatory: Mean number in highly inclined events

    Get PDF
    We present the first hybrid measurement of the average muon number in air showers at ultrahigh energies, initiated by cosmic rays with zenith angles between 62° and 80°. The measurement is based on 174 hybrid events recorded simultaneously with the surface detector array and the fluorescence detector of the Pierre Auger Observatory. The muon number for each shower is derived by scaling a simulated reference profile of the lateral muon density distribution at the ground until it fits the data. A 10^19  eV shower with a zenith angle of 67°, which arrives at the surface detector array at an altitude of 1450 m above sea level, contains on average (2.68±0.04±0.48(sys))×10^7 muons with energies larger than 0.3 GeV. The logarithmic gain d ln N_μ/dlnE of muons with increasing energy between 4×1018  eV and 5×1019  eV is measured to be (1.029±0.024±0.030(sys)

    Cost-sensitive ordinal classification methods to predict SARS-CoV-2 pneumonia severity

    Get PDF
    Objective: To study the suitability of cost-sensitive ordinal artificial intelligence-machine learning (AI-ML) strategies in the prognosis of SARS-CoV-2 pneumonia severity. Materials & methods: Observational, retrospective, longitudinal, cohort study in 4 hospitals in Spain. Information regarding demographic and clinical status was supplemented by socioeconomic data and air pollution exposures. We proposed AI-ML algorithms for ordinal classification via ordinal decomposition and for cost-sensitive learning via resampling techniques. For performance-based model selection, we defined a custom score including per-class sensitivities and asymmetric misprognosis costs. 260 distinct AI-ML models were evaluated via 10 repetitions of 5×5 nested cross-validation with hyperparameter tuning. Model selection was followed by the calibration of predicted probabilities. Final overall performance was compared against five well-established clinical severity scores and against a ‘standard’ (non-cost sensitive, non-ordinal) AI-ML baseline. In our best model, we also evaluated its explainability with respect to each of the input variables. Results: The study enrolled nn=1548 patients: 712 experienced low, 238 medium, and 598 high clinical severity. dd=131 variables were collected, becoming d′d′=148 features after categorical encoding. Model selection resulted in our best-performing AI-ML pipeline having: a) no imputation of missing data, b) no feature selection (i.e. using the full set of d′d′ features), c) ‘Ordered Partitions’ ordinal decomposition, d) cost-based reimbalance, and e) a Histogram-based Gradient Boosting classifier. This best model (calibrated) obtained a median accuracy of 68.1% [67.3%, 68.8%] (95% confidence interval), a balanced accuracy of 57.0% [55.6%, 57.9%], and an overall area under the curve (AUC) 0.802 [0.795, 0.808]. In our dataset, it outperformed all five clinical severity scores and the ‘standard’ AI-ML baseline. Discussion & conclusion: We conducted an exhaustive exploration of AI-ML methods designed for both ordinal and cost-sensitive classification, motivated by a real-world application domain (clinical severity prognosis) in which these topics arise naturally. Our model with the best classification performance exploited successfully the ordering information of ground truth classes, coping with imbalance and asymmetric costs. However, these ordinal and cost-sensitive aspects are seldom explored in the literature

    Long non-coding RNA SNHG8 drives stress granule formation in tauopathies

    Get PDF
    Tauopathies are a heterogenous group of neurodegenerative disorders characterized by tau aggregation in the brain. In a subset of tauopathies, rare mutations in the MAPT gene, which encodes the tau protein, are sufficient to cause disease; however, the events downstream of MAPT mutations are poorly understood. Here, we investigate the role of long non-coding RNAs (lncRNAs), transcripts \u3e200 nucleotides with low/no coding potential that regulate transcription and translation, and their role in tauopathy. Using stem cell derived neurons from patients carrying a MAPT p.P301L, IVS10 + 16, or p.R406W mutation and CRISPR-corrected isogenic controls, we identified transcriptomic changes that occur as a function of the MAPT mutant allele. We identified 15 lncRNAs that were commonly differentially expressed across the three MAPT mutations. The commonly differentially expressed lncRNAs interact with RNA-binding proteins that regulate stress granule formation. Among these lncRNAs, SNHG8 was significantly reduced in a mouse model of tauopathy and in FTLD-tau, progressive supranuclear palsy, and Alzheimer\u27s disease brains. We show that SNHG8 interacts with tau and stress granule-associated RNA-binding protein TIA1. Overexpression of mutant tau in vitro is sufficient to reduce SNHG8 expression and induce stress granule formation. Rescuing SNHG8 expression leads to reduced stress granule formation and reduced TIA1 levels in immortalized cells and in MAPT mutant neurons, suggesting that dysregulation of this non-coding RNA is a causal factor driving stress granule formation via TIA1 in tauopathies

    Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

    Get PDF
    With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n=1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing ⩾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d=148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (⩾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient’s C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels –saturation SpO2, quotients SpO2/RR and arterial SatO2/FiO2 –, the neutrophil-to-lymphocyte ratio (NLR) –to certain extent, also neutrophil and lymphocyte counts separately–, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives
    • …