378 research outputs found

    Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp

    Get PDF
    A compressed folder (XERp Software.zip) containing the Matlab scripts to perform XERp as well as an example application. (ZIP 11 kb

    Centering, scaling, and transformations: improving the biological information content of metabolomics data

    Get PDF
    BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. RESULTS: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. CONCLUSION: Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important

    Identifying inhibitory compounds in lignocellulosic biomass hydrolysates using an exometabolomics approach

    Get PDF
    BACKGROUND: Inhibitors are formed that reduce the fermentation performance of fermenting yeast during the pretreatment process of lignocellulosic biomass. An exometabolomics approach was applied to systematically identify inhibitors in lignocellulosic biomass hydrolysates. RESULTS: We studied the composition and fermentability of 24 different biomass hydrolysates. To create diversity, the 24 hydrolysates were prepared from six different biomass types, namely sugar cane bagasse, corn stover, wheat straw, barley straw, willow wood chips and oak sawdust, and with four different pretreatment methods, i.e. dilute acid, mild alkaline, alkaline/peracetic acid and concentrated acid. Their composition and that of fermentation samples generated with these hydrolysates were analyzed with two GC-MS methods. Either ethyl acetate extraction or ethyl chloroformate derivatization was used before conducting GC-MS to prevent sugars are overloaded in the chromatograms, which obscure the detection of less abundant compounds. Using multivariate PLS-2CV and nPLS-2CV data analysis models, potential inhibitors were identified through establishing relationship between fermentability and composition of the hydrolysates. These identified compounds were tested for their effects on the growth of the model yeast, Saccharomyces. cerevisiae CEN.PK 113-7D, confirming that the majority of the identified compounds were indeed inhibitors. CONCLUSION: Inhibitory compounds in lignocellulosic biomass hydrolysates were successfully identified using a non-targeted systematic approach: metabolomics. The identified inhibitors include both known ones, such as furfural, HMF and vanillin, and novel inhibitors, namely sorbic acid and phenylacetaldehyde

    Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

    Get PDF
    Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications (NMC), the Area Under the Receiver Operating Characteristic (AUROC), Q2 and Discriminant Q2 (DQ2) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q2 and Discriminant Q2 (DQ2). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ2 and Q2 (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies

    Harmonization of quality metrics and power calculation in multi-omic studies

    Get PDF
    Multi-omic studies combine measurements at different molecular levels to build comprehensive models of cellular systems. The success of a multi-omic data analysis strategy depends largely on the adoption of adequate experimental designs, and on the quality of the measurements provided by the different omic platforms. However, the field lacks a comparative description of performance parameters across omic technologies and a formulation for experimental design in multi-omic data scenarios. Here, we propose a set of harmonized Figures of Merit (FoM) as quality descriptors applicable to different omic data types. Employing this information, we formulate the MultiPower method to estimate and assess the optimal sample size in a multi-omics experiment. MultiPower supports different experimental settings, data types and sample sizes, and includes graphical for experimental design decision-making. MultiPower is complemented with MultiML, an algorithm to estimate sample size for machine learning classification problems based on multi-omic data. Multi-omics studies are popular but lack rigorous criteria for experimental design. We define Figures of Merit across omics to comparatively describe their performance, and present new algorithms for sample size calculation in multi-omics experiments aiming either at feature selection or sample classification.Analytical BioScience

    Detecting Regulatory Mechanisms in Endocrine Time Series Measurements

    Get PDF
    The regulatory mechanisms underlying pulsatile secretion are complex, especially as it is partly controlled by other hormones and the combined action of multiple agents. Regulatory relations between hormones are not directly observable but may be deduced from time series measurements of plasma hormone concentrations. Variation in plasma hormone levels are the resultant of secretion and clearance from the circulation. A strategy is proposed to extract inhibition, activation, thresholds and circadian synchronicity from concentration data, using particular association methods. Time delayed associations between hormone concentrations and/or extracted secretion pulse profiles reveal the information on regulatory mechanisms. The above mentioned regulatory mechanisms are illustrated with simulated data. Additionally, data from a lean cohort of healthy control subjects is used to illustrate activation (ACTH and cortisol) and circadian synchronicity (ACTH and TSH) in real data. The simulation and the real data both consist of 145 equidistant samples per individual, matching a 24-hr time span with 10 minute intervals. The results of the simulation and the real data are in concordance

    Prevalence of intradialytic hypotension, clinical symptoms and nursing interventions - a three-months, prospective study of 3818 haemodialysis sessions

    Get PDF
    Background: Intradialytic hypotension (IDH) is considered one of the most frequent complications of haemodialysis with an estimated prevalence of 20-50 %, but studies investigating its exact prevalence are scarce. A complicating factor is that several definitions of IDH are used. The goal of this study was, to assess the prevalence of IDH, primarily in reference to the European Best Practice Guideline (EBPG) on haemodynamic instability: A decrease in systolic blood pressure (SBP) >= 20 mmHg or in mean arterial pressure (MAP) >= 10 mmHg associated with a clinical event and the need for nursing intervention. Methods: During 3 months we prospectively collected haemodynamic data, clinical events, and nursing interventions of 3818 haemodialysis sessions from 124 prevalent patients who dialyzed with constant ultrafiltration rate and dialysate conductivity. Patients were considered as having frequent IDH if it occurred in >20 % of dialysis sessions. Results: Decreases in SBP >= 20 mmHg or MAP >= 10 mmHg occurred in 77.7 %, clinical symptoms occurred in 21.4 %, and nursing interventions were performed in 8.5 % of dialysis sessions. Dialysis hypotension according to the full EBPG definition occurred in only 6.7 % of dialysis sessions. Eight percent of patients had frequent IDH. Conclusions: The prevalence of IDH according to the EBPG definition is low. The dominant determinant of the EBPG definition was nursing intervention since this was the component with the lowest prevalence. IDH seems to be less common than indicated in the literature but a proper comparison with previous studies is complicated by the lack of a uniform definition
    corecore