106 research outputs found

    To aggregate or not to aggregate high-dimensional classifiers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data.</p> <p>Results</p> <p>Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets.</p> <p>Conclusions</p> <p>The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed.</p

    Repeatability and reproducibility of lipoprotein particle profile measurements in plasma samples by ultracentrifugation

    Get PDF
    Background: Characterization of lipoprotein particle profiles (LPPs) (including main classes and subclasses) by means of ultracentrifugation (UC) is highly requested given its clinical potential. However, rapid methods are required to replace the very labor-intensive UC method and one solution is to calibrate rapid nuclear magnetic resonance (NMR)-based prediction models, but the reliability of the UC-response method required for the NMR calibration has been largely overlooked. Methods:  This study provides a comprehensive repeatability and reproducibility study of various UC-based lipid measurements (cholesterol, triglycerides [TGs], free cholesterol, phospholipids, apolipoprotein [apo]A1 and apoB) in different main classes and subclasses of 25 duplicated fresh plasma samples and of 42 quality control (QC) frozen pooled plasma samples of healthy individuals. Results: Cholesterol, apoA1 and apoB measurements were very repeatable in all classes (intraclass correlation coefficient [ICC]: 92.93%-99.54%). Free cholesterol and phospholipid concentrations in main classes and subclasses and TG concentrations in high-density lipoproteins (HDL), HDL subclasses and low-density lipoproteins (LDL) subclasses, showed worse repeatability (ICC: 19.21%-99.08%) attributable to low concentrations, variability introduced during UC and assay limitations. On frozen QC samples, the reproducibility of cholesterol, apoA1 and apoB concentrations was found to be better than for the free cholesterol, phospholipids and TGs concentrations. Conclusions: This study shows that for LPPs measurements near or below the limit of detection (LOD) in some of the subclasses, as well as the use of frozen samples, results in worsened repeatability and reproducibility. Furthermore, we show that the analytical assay coupled to UC for free cholesterol and phospholipids have different repeatability and reproducibility. All of this needs to be taken into account when calibrating future NMR-based models

    Centering, scaling, and transformations: improving the biological information content of metabolomics data

    Get PDF
    BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. RESULTS: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. CONCLUSION: Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important

    (Tissue) P Systems with Vesicles of Multisets

    Full text link
    We consider tissue P systems working on vesicles of multisets with the very simple operations of insertion, deletion, and substitution of single objects. With the whole multiset being enclosed in a vesicle, sending it to a target cell can be indicated in those simple rules working on the multiset. As derivation modes we consider the sequential mode, where exactly one rule is applied in a derivation step, and the set maximal mode, where in each derivation step a non-extendable set of rules is applied. With the set maximal mode, computational completeness can already be obtained with tissue P systems having a tree structure, whereas tissue P systems even with an arbitrary communication structure are not computationally complete when working in the sequential mode. Adding polarizations (-1, 0, 1 are sufficient) allows for obtaining computational completeness even for tissue P systems working in the sequential mode.Comment: In Proceedings AFL 2017, arXiv:1708.0622

    Detecting Regulatory Mechanisms in Endocrine Time Series Measurements

    Get PDF
    The regulatory mechanisms underlying pulsatile secretion are complex, especially as it is partly controlled by other hormones and the combined action of multiple agents. Regulatory relations between hormones are not directly observable but may be deduced from time series measurements of plasma hormone concentrations. Variation in plasma hormone levels are the resultant of secretion and clearance from the circulation. A strategy is proposed to extract inhibition, activation, thresholds and circadian synchronicity from concentration data, using particular association methods. Time delayed associations between hormone concentrations and/or extracted secretion pulse profiles reveal the information on regulatory mechanisms. The above mentioned regulatory mechanisms are illustrated with simulated data. Additionally, data from a lean cohort of healthy control subjects is used to illustrate activation (ACTH and cortisol) and circadian synchronicity (ACTH and TSH) in real data. The simulation and the real data both consist of 145 equidistant samples per individual, matching a 24-hr time span with 10 minute intervals. The results of the simulation and the real data are in concordance

    Multivariate paired data analysis: multilevel PLSDA versus OPLSDA

    Get PDF
    Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects

    Individual differences in metabolomics: individualised responses and between-metabolite relationships

    Get PDF
    Many metabolomics studies aim to find ‘biomarkers’: sets of molecules that are consistently elevated or decreased upon experimental manipulation. Biological effects, however, often manifest themselves along a continuum of individual differences between the biological replicates in the experiment. Such differences are overlooked or even diminished by methods in standard use for metabolomics, although they may contain a wealth of information on the experiment. Properly understanding individual differences is crucial for generating knowledge in fields like personalised medicine, evolution and ecology. We propose to use simultaneous component analysis with individual differences constraints (SCA-IND), a data analysis method from psychology that focuses on these differences. This method constructs axes along the natural biochemical differences between biological replicates, comparable to principal components. The model may shed light on changes in the individual differences between experimental groups, but also on whether these differences correspond to, e.g., responders and non-responders or to distinct chemotypes. Moreover, SCA-IND reveals the individuals that respond most to a manipulation and are best suited for further experimentation. The method is illustrated by the analysis of individual differences in the metabolic response of cabbage plants to herbivory. The model reveals individual differences in the response to shoot herbivory, where two ‘response chemotypes’ may be identified. In the response to root herbivory the model shows that individual plants differ strongly in response dynamics. Thereby SCA-IND provides a hitherto unavailable view on the chemical diversity of the induced plant response, that greatly increases understanding of the system

    Between Metabolite Relationships: an essential aspect of metabolic change

    Get PDF
    Not only the levels of individual metabolites, but also the relations between the levels of different metabolites may indicate (experimentally induced) changes in a biological system. Component analysis methods in current ‘standard’ use for metabolomics, such as Principal Component Analysis (PCA), do not focus on changes in these relations. We therefore propose the concept of ‘Between Metabolite Relationships’ (BMRs): common changes in the covariance (or correlation) between all metabolites in an organism. Such structural changes may indicate metabolic change brought about by experimental manipulation but which are lost with standard data analysis methods. These BMRs can be analysed by the INdividual Differences SCALing (INDSCAL) method. First the BMR quantification is described and subsequently the INDSCAL method. Finally, two studies illustrate the power and the applicability of BMRs in metabolomics. The first study is about the induced plant response of cabbage to herbivory, of which BMRs are a considerable part. In the second study—a human nutritional intervention study of green tea extract—standard data analysis tools did not reveal any metabolic change, although the BMRs were considerably affected. The presented results show that BMRs can be easily implemented in a wide variety of metabolomic studies. They provide a new source of information to describe biological systems in a way that fits flawlessly into the next generation of systems biology questions, dealing with personalized responses

    Dynamic elementary mode modelling of non-steady state flux data

    Get PDF
    [EN] A novel framework is proposed to analyse metabolic fluxes in non-steady state conditions, based on the new concept of dynamic elementary mode (dynEM): an elementary mode activated partially depending on the time point of the experiment.This research work was partially supported by the Spanish Ministry of Economy and Competitiveness under the project DPI2014-55276-C5-1R.Folch-Fortuny, A.; Teusink, B.; Hoefsloot, HC.; Smilde, AK.; Ferrer, A. (2018). Dynamic elementary mode modelling of non-steady state flux data. BMC Systems Biology. 12:1-15. https://doi.org/10.1186/s12918-018-0589-3S11512Bro R, Smilde AK. Principal component analysis. Anal Methods. 2014; 6(9):2812–31.González-Martínez JM, Folch-Fortuny A, Llaneras F, Tortajada M, Picó J, Ferrer A. Metabolic flux understanding of Pichia pastoris grown on heterogenous culture media. Chemometr Intell Lab Syst. 2014; 134:89–99.Barrett CL, Herrgard MJ, Palsson B. Decomposing complex reaction networks using random sampling, principal component analysis and basis rotation. BMC Syst Biol. 2009; 3(30):1–8.Jaumot J, Gargallo R, De Juan A, Tauler R. A graphical user-friendly interface for MCR-ALS: A new tool for multivariate curve resolution in MATLAB. Chemometr Intell Lab Syst. 2005; 76(1):101–10.Folch-Fortuny A, Tortajada M, Prats-Montalbán JM, Llaneras F, Picó J, Ferrer A. MCR-ALS on metabolic networks: Obtaining more meaningful pathways. Chemometr Intell Lab Syst. 2015; 142:293–303.Folch-Fortuny A, Marques R, Isidro IA, Oliveira R, Ferrer A. Principal elementary mode analysis (PEMA). Mol BioSyst. 2016; 12(3):737–46.Hood L. Systems biology: Integrating technology, biology, and computation. Mech Ageing Dev. 2003; 124(1):9–16.Teusink B, Passarge J, Reijenga CA, Esgalhado E, van der Weijden CC, Schepper M, Walsh MC, Bakker BM, van Dam K, Westerhoff HV, Snoep JL. Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing biochemistry. Eur J Biochem / FEBS. 2000; 267(17):5313–29.Mahadevan R, Edwards JS, Doyle FJ. Dynamic flux balance analysis of diauxic growth in Escherichia coli. Biophys J. 2002; 83(3):1331–40.Willemsen AM, Hendrickx DM, Hoefsloot HCJ, Hendriks MMWB, Wahl SA, Teusink B, Smilde AK, van Kampen AHC. MetDFBA: incorporating time-resolved metabolomics measurements into dynamic flux balance analysis. Mol BioSyst. 2015; 11(1):137–45.Barker M, Rayens W. Partial least squares for discrimination. J Chemom. 2003; 17(3):166–73.Bartel J, Krumsiek J, Theis FJ. Statistical methods for the analysis of high-throughput metabolomics data. Comput Struct Biotechnol J. 2013; 4:201301009.Hendrickx DM, Hoefsloot HCJ, Hendriks MMWB, Canelas AB, Smilde AK. Global test for metabolic pathway differences between conditions. Anal Chim Acta. 2012; 719:8–15.Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006; 34(Database issue):354–7.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1):27–30.Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010; 38(Database issue):355–60.Andersson CA, Bro R. The N-way Toolbox for MATLAB. Chemometr Intell Lab Syst. 2000; 52(1):1–4.Terzer M, Stelling J. Large-scale computation of elementary flux modes with bit pattern trees. Bioinformatics. 2008; 24(19):2229–35.Heerden JHv, Wortel MT, Bruggeman FJ, Heijnen JJ, Bollen YJM, Planqué R, Hulshof J, O’Toole TG, Wahl SA, Teusink B. Lost in Transition: Start-Up of Glycolysis Yields Subpopulations of Nongrowing Cells. Science. 2014; 343(6174):1245114.Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes P, Kummer U. COPASI–a COmplex PAthway SImulator. Bioinformatics. 2006; 22(24):3067–74.Petzold L. Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations. SIAM J Sci Stat Comput. 1983; 4:136–48.Canelas AB, van Gulik WM, Heijnen JJ. Determination of the cytosolic free NAD/NADH ratio in Saccharomyces cerevisiae under steady-state and highly dynamic conditions. Biotechnol Bioeng. 2008; 100(4):734–43.Nikerel IE, Canelas AB, Jol SJ, Verheijen PJT, Heijnen JJ. Construction of kinetic models for metabolic reaction networks: Lessons learned in analysing short-term stimulus response data. Math Comput Model Dyn Syst. 2011; 17(3):243–60.Llaneras F, Picó J. Stoichiometric modelling of cell metabolism. J Biosci Bioeng. 2008; 105(1):1–11.Bro R. Multiway calibration. Multilinear PLS. J Chemom. 1998; 10(1):47–61.Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ, Smilde AK, Velzen EJJv, Duijnhoven JPMv, Dorsten FAv. Assessment of PLSDA cross validation. Metabolomics. 2008; 4(1):81–9.Szymańska E, Saccenti E, Smilde AK, Westerhuis JA. Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics. 2012; 8(Suppl 1):3–16.Rodrigues F, Ludovico P, Leão C. Sugar Metabolism in Yeasts: an Overview of Aerobic and Anaerobic Glucose Catabolism. In: Biodiversity and Ecophysiology of Yeasts. The Yeast Handbook. Berlin: Springer: 2006. p. 101–21.Larsson K, Ansell R, Eriksson P, Adler L. A gene encoding sn-glycerol 3-phosphate dehydrogenase (NAD+) complements an osmosensitive mutant of Saccharomyces cerevisiae. Mol Microbiol. 1993; 10(5):1101–11.Eriksson P, André L, Ansell R, Blomberg A, Adler L. Cloning and characterization of GPD2, a second gene encoding sn-glycerol 3-phosphate dehydrogenase (NAD+) in Saccharomyces cerevisiae, and its comparison with GPD1. Mol Microbiol. 1995; 17(1):95–107.Norbeck J, Pâhlman AK, Akhtar N, Blomberg A, Adler L. Purification and characterization of two isoenzymes of DL-glycerol-3-phosphatase from Saccharomyces cerevisiae. Identification of the corresponding GPP1 and GPP2 genes and evidence for osmotic regulation of Gpp2p expression by the osmosensing mitogen-activated protein kinase signal transduction pathway. J Biol Chem. 1996; 271(23):13875–81
    corecore