107 research outputs found

    Metabolic network discovery through reverse engineering of metabolome data

    Get PDF
    Reverse engineering of high-throughput omics data to infer underlying biological networks is one of the challenges in systems biology. However, applications in the field of metabolomics are rather limited. We have focused on a systematic analysis of metabolic network inference from in silico metabolome data based on statistical similarity measures. Three different data types based on biological/environmental variability around steady state were analyzed to compare the relative information content of the data types for inferring the network. Comparing the inference power of different similarity scores indicated the clear superiority of conditioning or pruning based scores as they have the ability to eliminate indirect interactions. We also show that a mathematical measure based on the Fisher information matrix gives clues on the information quality of different data types to better represent the underlying metabolic network topology. Results on several datasets of increasing complexity consistently show that metabolic variations observed at steady state, the simplest experimental analysis, are already informative to reveal the connectivity of the underlying metabolic network with a low false-positive rate when proper similarity-score approaches are employed. For experimental situations this implies that a single organism under slightly varying conditions may already generate more than enough information to rightly infer networks. Detailed examination of the strengths of interactions of the underlying metabolic networks demonstrates that the edges that cannot be captured by similarity scores mainly belong to metabolites connected with weak interaction strength

    Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp

    Get PDF
    A compressed folder (XERp Software.zip) containing the Matlab scripts to perform XERp as well as an example application. (ZIP 11 kb

    Divide et impera: How disentangling common and distinctive variability in multiset data analysis can aid industrial process troubleshooting and understanding

    Full text link
    [EN] The possibility of addressing the problem of process troubleshooting and understanding by modelling common and distinctive sources of variation (factorsorcomponents) underlying two sets of measurements was explored in a real-world industrial case study. The used strategy includes a novel approach to systematically detect the number of common and distinctive components. An extension of this strategy for the analysis of a larger number of data blocks, which allows the comparison of data from multiple processing units, is also discussed.Spanish Ministry of Economy and Competitiveness, Grant/Award Number: DPI2017-82896-C2-1-RVitale, R.; Noord, OED.; Westerhuis, JA.; Smilde, AK.; Ferrer, A. (2021). Divide et impera: How disentangling common and distinctive variability in multiset data analysis can aid industrial process troubleshooting and understanding. Journal of Chemometrics. 35(2):1-12. https://doi.org/10.1002/cem.3266S11235

    Lipidomic Response to Coffee Consumption

    Get PDF
    Coffee is widely consumed and contains many bioactive compounds, any of which may impact pathways related to disease development. Our objective was to identify individual lipid changes in response to coffee drinking. We profiled the lipidome of fasting serum samples collected from a previously reported single blinded, three-stage clinical trial. Forty-seven habitual coffee consumers refrained from drinking coffee for 1 month, consumed 4 cups of coffee/day in the second month and 8 cups/day in the third month. Samples collected after each coffee stage were subject to quantitative lipidomic profiling using ion-mobility spectrometry-mass spectrometry. A total of 853 lipid species mapping to 14 lipid classes were included for univariate analysis. Three lysophosphatidylcholine (LPC) species including LPC (20:4), LPC (22:1) and LPC (22:2), significantly decreased after coffee intake (p 0.05); 58 of these decreased after coffee intake. In conclusion, coffee intake leads to lower levels of specific LPC species with potential impacts on glycerophospholipid metabolism more generally.Peer reviewe

    Variable selection and validation in multivariate modelling

    Get PDF
    Motivation Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed. Results We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omics datasets, MUVR yielded parsimonious models with minimal overfitting and improved model performance compared with state-of-the-art rdCV. Moreover, MUVR showed advantages over other variable selection algorithms, i.e. Boruta and VSURF, including simultaneous variable selection and validation scheme and wider applicability

    Heterofusion:Fusing genomics data of different measurement scales

    Get PDF
    In systems biology, it is becoming increasingly common to measure biochemical entities at different levels of the same biological system. Hence, data fusion problems are abundant in the life sciences. With the availability of a multitude of measuring techniques, one of the central problems is the heterogeneity of the data. In this paper, we discuss a specific form of heterogeneity, namely, that of measurements obtained at different measurement scales, such as binary, ordinal, interval, and ratio‐scaled variables. Three generic fusion approaches are presented of which two are new to the systems biology community. The methods are presented, put in context, and illustrated with a real‐life genomics example
    corecore