18 research outputs found

    PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data

    Get PDF
    Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO) were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO) with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings. Such multivariate model represents high goodness-of-fit to the data, however the risk of overfitting increases relevantly. Therefore, the LASSO method was for the first time applied for statistical analysis of datasets obtained in untargeted metabolomics studies. The advantage behind LASSO lies in the ability to model different types of omics data, account for multicollinearity and p >> n problems

    Bayesian multilevel model of micro RNA levels in ovarian-cancer and healthy subjects.

    No full text
    In transcriptomics, micro RNAs (miRNAs) has gained much interest especially as potential disease indicators. However, apart from holding a great promise related to their clinical application, a lot of inconsistent results have been published. Our aim was to compare the miRNA expression levels in ovarian cancer and healthy subjects using the Bayesian multilevel model and to assess their potential usefulness in diagnosis. We have analyzed a case-control observational data on expression profiling of 49 preselected miRNA-based ovarian cancer indicators in 119 controls and 59 patients. A Bayesian multilevel model was used to characterize the effect of disease on miRNA levels controlling for differences in age and body weight. The difference between the miRNA level and health status of the patient on the scale of the data variability were discussed in the context of their potential usefulness in diagnosis. Additionally, the cross-validated area under the ROC curve (AUC) was used to assess the expected out-of-sample discrimination index of a different sets of miRNAs. The proposed model allowed us to describe the set of miRNA levels in patients and controls. Three highly correlated miRNAs: miR-101-3p, miR-142-5p, miR-148a-3p rank the highest with almost identical effect sizes that ranges from 0.45 to 1.0. For those miRNAs the credible interval for AUC ranged from 0.63 to 0.67 indicating their limited discrimination potential. A little benefit in adding information from other miRNAs was observed. There were several miRNAs in the dataset (miR-604, hsa-miR-221-5p) for which inferences were uncertain. For those miRNAs more experimental effort is needed to fully assess their effect in the context of new hits discovery and usefulness as disease indicators. The proposed multilevel Bayesian model can be used to characterize the panel of miRNA profile and to assess the difference in expression levels between healthy and cancer individuals

    Metabolomic Signature of Early Vascular Aging (EVA) in Hypertension

    No full text
    Arterial stiffening is a hallmark of early vascular aging (EVA) syndrome and an independent predictor of cardiovascular morbidity and mortality. In this case-control study we sought to identify plasma metabolites associated with EVA syndrome in the setting of hypertension. An untargeted metabolomic approach was used to identify plasma metabolites in an age-, BMI-, and sex-matched groups of EVA (n = 79) and non-EVA (n = 73) individuals with hypertension. After raw data processing and filtration, 497 putative compounds were characterized, out of which 4 were identified as lysophosphaditylcholines (LPCs) [LPC (18:2), LPC (16:0), LPC (18:0), and LPC (18:1)]. A main finding of this study shows that identified LPCs were independently associated with EVA status. Although LPCs have been shown previously to be positively associated with inflammation and atherosclerosis, we observed that hypertensive individuals characterized by 4 down-regulated LPCs had 3.8 times higher risk of EVA compared to those with higher LPC levels (OR = 3.8, 95% CI 1.7–8.5, P < 0.001). Our results provide new insights into a metabolomic phenotype of vascular aging and warrants further investigation of negative association of LPCs with EVA status. This study suggests that LPCs are potential candidates to be considered for further evaluation and validation as predictors of EVA in patients with hypertension
    corecore