Search CORE

NORA - Norwegian Open Research Archives

Corrigendum to "A validated clinical risk prediction model for lung cancer in smokers of all ages and exposure types:A HUNT study" [EBioMedicine 31 (2018) 36-46]

Author: Hveem Kristian
Lagani Vincenzo
Langhammer Arnulf
Markaki Maria
Røe Oluf Dimitri
Tsamardinos Ioannis
Publication venue: 'Elsevier BV'
Publication date: 23/07/2022
Field of study

Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Author: Chatzaki Ekaterini
Karaglani Makrina
Lagani Vincenzo
Papoutsoglou Georgios
Røe Oluf Dimitri
Thomson Naomi
Tsamardinos Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2021
Field of study

COVID-19 outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Here, we employed Automated Machine Learning (AutoML) to analyze three publicly available high throughput COVID-19 datasets, including proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset led to 10 equivalent signatures of two features each, with AUC 0.840 (CI 0.723–0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each, with AUC 0.914 (CI 0.865–0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. Another transcriptomic dataset led to two equivalent signatures of nine features each, with AUC 0.967 (CI 0.899–0.996) in identifying COVID-19 patients from virus-free individuals. Signature predictive performance remained high upon validation. Multiple new features emerged and pathway analysis revealed biological relevance by implication in Viral mRNA Translation, Interferon gamma signaling and Innate Immune System pathways. In conclusion, AutoML analysis led to multiple biosignatures of high predictive performance, with reduced features and large choice of alternative predictors. These favorable characteristics are eminent for development of cost-effective assays to contribute to better disease management

Springer - Publisher Connector

A novel similarity-measure for the analysis of genetic data in complex phenotypes

Author: Conforti Domenico
Di Cianni Fausta
Lagani Vincenzo
Landi Stefano
Montesanto Alberto
Moreno Victor
Passarino Giuseppe
Rose Giuseppina
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

BACKGROUND: Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium. RESULTS: We set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel". The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data. CONCLUSION: The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study

Archivio della Ricerca - Università di Pisa

Diposit Digital de la Universitat de Barcelona

Improving Lung Cancer Screening Selection:The HUNT Lung Cancer Risk Model for Ever-Smokers Versus the NELSON and 2021 United States Preventive Services Task Force Criteria in the Cohort of Norway: A Population-Based Prospective Study

Author: Fotopoulos Ioannis
Lagani Vincenzo
Markaki Maria
Nguyen Olav Toai Duc
Røe Oluf Dimitri
Tsamardinos Ioannis
Publication venue
Publication date: 01/04/2024
Field of study

Background: Improving the method for selecting participants for lung cancer (LC) screening is an urgent need. Here, we compared the performance of the Helseundersøkelsen i Nord-Trøndelag (HUNT) Lung Cancer Model (HUNT LCM) versus the Dutch-Belgian lung cancer screening trial (Nederlands-Leuvens Longkanker Screenings Onderzoek (NELSON)) and 2021 United States Preventive Services Task Force (USPSTF) criteria regarding LC risk prediction and efficiency. Methods: We used linked data from 10 Norwegian prospective population-based cohorts, Cohort of Norway. The study included 44,831 ever-smokers, of which 686 (1.5%) patients developed LC; the median follow-up time was 11.6 years (0.01–20.8 years). Results: Within 6 years, 222 (0.5%) individuals developed LC. The NELSON and 2021 USPSTF criteria predicted 37.4% and 59.5% of the LC cases, respectively. By considering the same number of individuals as the NELSON and 2021 USPSTF criteria selected, the HUNT LCM increased the LC prediction rate by 41.0% and 12.1%, respectively. The HUNT LCM significantly increased sensitivity (p < 0.001 and p = 0.028), and reduced the number needed to predict one LC case (29 versus 40, p < 0.001 and 36 versus 40, p = 0.02), respectively. Applying the HUNT LCM 6-year 0.98% risk score as a cutoff (14.0% of ever-smokers) predicted 70.7% of all LC, increasing LC prediction rate with 89.2% and 18.9% versus the NELSON and 2021 USPSTF, respectively (both p < 0.001). Conclusions: The HUNT LCM was significantly more efficient than the NELSON and 2021 USPSTF criteria, improving the prediction of LC diagnosis, and may be used as a validated clinical tool for screening selection.</p

Src and Memory: A Study of Filial Imprinting and Predispositions in the Domestic Chick.

Author: Chitadze Lela
Lagani Vincenzo
McCabe Brian
Meparishvili Maia
Solomonia Revaz
Publication venue: Front Physiol
Publication date: 20/09/2021
Field of study

Visual imprinting is a learning process whereby young animals come to prefer a visual stimulus after exposure to it (training). The available evidence indicates that the intermediate medial mesopallium (IMM) in the domestic chick forebrain is a site of memory formation during visual imprinting. We have studied the role of Src, an important non-receptor tyrosine kinase, in memory formation. Amounts of total Src (Total-Src) and its two phosphorylated forms, tyrosine-416 (activated, 416P-Src) and tyrosine-527 (inhibited, 527P-Src), were measured 1 and 24 h after training in the IMM and in a control brain region, the posterior pole of nidopallium (PPN). One hour after training, in the left IMM, we observed a positive correlation between the amount of 527P-Src and learning strength that was attributable to learning, and there was also a positive correlation between 416P-Src and learning strength that was attributable to a predisposition to learn readily. Twenty-four hours after training, the amount of Total-Src increased with learning strength in both the left and right IMM, and amount of 527P-Src increased with learning strength only in the left IMM; both correlations were attributable to learning. A further, negative, correlation between learning strength and 416P-Src/Total-Src in the left IMM reflected a predisposition to learn. No learning-related changes were found in the PPN control region. We suggest that there are two pools of Src; one of them in an active state and reflecting a predisposition to learn, and the second one in an inhibited condition, which increases as a result of learning. These two pools may represent two or more signaling pathways, namely, one pathway downstream of Src activated by tyrosine-416 phosphorylation and another upstream of Src, keeping the enzyme in an inactivated state via phosphorylation of tyrosine-527

Munich RePEc Personal Archive

Apollo (Cambridge)

Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets

Author: Athineou Giorgos
Farcomeni Alessio
Lagani Vincenzo
Tsagris Michail
Tsamardinos Ioannis
Publication venue
Publication date: 01/01/2016
Field of study

The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constrained-based learning of Bayesian Networks. Most of the currently available feature-selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. Under that respect SES subsumes and extends previous feature selection algorithms, like the maxmin parent children algorithm. SES is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data-analysis tasks, namely classi�cation, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data

arXiv.org e-Print Archive