147 research outputs found

    A structured overview of simultaneous component based data integration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.</p> <p>Results</p> <p>We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for <it>Escherichia coli </it>as obtained with different analytical chemical measurement methods.</p> <p>Conclusion</p> <p>Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.</p

    Individual differences in metabolomics: individualised responses and between-metabolite relationships

    Get PDF
    Many metabolomics studies aim to find ‘biomarkers’: sets of molecules that are consistently elevated or decreased upon experimental manipulation. Biological effects, however, often manifest themselves along a continuum of individual differences between the biological replicates in the experiment. Such differences are overlooked or even diminished by methods in standard use for metabolomics, although they may contain a wealth of information on the experiment. Properly understanding individual differences is crucial for generating knowledge in fields like personalised medicine, evolution and ecology. We propose to use simultaneous component analysis with individual differences constraints (SCA-IND), a data analysis method from psychology that focuses on these differences. This method constructs axes along the natural biochemical differences between biological replicates, comparable to principal components. The model may shed light on changes in the individual differences between experimental groups, but also on whether these differences correspond to, e.g., responders and non-responders or to distinct chemotypes. Moreover, SCA-IND reveals the individuals that respond most to a manipulation and are best suited for further experimentation. The method is illustrated by the analysis of individual differences in the metabolic response of cabbage plants to herbivory. The model reveals individual differences in the response to shoot herbivory, where two ‘response chemotypes’ may be identified. In the response to root herbivory the model shows that individual plants differ strongly in response dynamics. Thereby SCA-IND provides a hitherto unavailable view on the chemical diversity of the induced plant response, that greatly increases understanding of the system

    Simplivariate Models: Ideas and First Examples

    Get PDF
    One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli

    Between Metabolite Relationships: an essential aspect of metabolic change

    Get PDF
    Not only the levels of individual metabolites, but also the relations between the levels of different metabolites may indicate (experimentally induced) changes in a biological system. Component analysis methods in current ‘standard’ use for metabolomics, such as Principal Component Analysis (PCA), do not focus on changes in these relations. We therefore propose the concept of ‘Between Metabolite Relationships’ (BMRs): common changes in the covariance (or correlation) between all metabolites in an organism. Such structural changes may indicate metabolic change brought about by experimental manipulation but which are lost with standard data analysis methods. These BMRs can be analysed by the INdividual Differences SCALing (INDSCAL) method. First the BMR quantification is described and subsequently the INDSCAL method. Finally, two studies illustrate the power and the applicability of BMRs in metabolomics. The first study is about the induced plant response of cabbage to herbivory, of which BMRs are a considerable part. In the second study—a human nutritional intervention study of green tea extract—standard data analysis tools did not reveal any metabolic change, although the BMRs were considerably affected. The presented results show that BMRs can be easily implemented in a wide variety of metabolomic studies. They provide a new source of information to describe biological systems in a way that fits flawlessly into the next generation of systems biology questions, dealing with personalized responses

    DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

    Get PDF
    BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided

    Evaluation of metabolomics profiles of grain from maize hybrids derived from near-isogenic GM positive and negative segregant inbreds demonstrates that observed differences cannot be attributed unequivocally to the GM trait

    Full text link
    Introduction: Past studies on plant metabolomes have highlighted the influence of growing environments and varietal differences in variation of levels of metabolites yet there remains continued interest in evaluating the effect of genetic modification (GM). Objectives: Here we test the hypothesis that metabolomics differences in grain from maize hybrids derived from a series of GM (NK603, herbicide tolerance) inbreds and corresponding negative segregants can arise from residual genetic variation associated with backcrossing and that the effect of insertion of the GM trait is negligible. Methods: Four NK603-positive and negative segregant inbred males were crossed with two different females (testers). The resultant hybrids, as well as conventional comparator hybrids, were then grown at three replicated field sites in Illinois, Minnesota, and Nebraska during the 2013 season. Metabolomics data acquisition using gas chromatography–time of flight-mass spectrometry (GC–TOF-MS) allowed the measurement of 367 unique metabolite features in harvested grain, of which 153 were identified with small molecule standards. Multivariate analyses of these data included multi-block principal component analysis and ANOVA-simultaneous component analysis. Univariate analyses of all 153 identified metabolites was conducted based on significance testing (α = 0.05), effect size evaluation (assessing magnitudes of differences), and variance component analysis. Results: Results demonstrated that the largest effects on metabolomic variation were associated with different growing locations and the female tester. They further demonstrated that differences observed between GM and non-GM comparators, even in stringent tests utilizing near-isogenic positive and negative segregants, can simply reflect minor genomic differences associated with conventional back-crossing practices. Conclusion: The effect of GM on metabolomics variation was determined to be negligible and supports that there is no scientific rationale for prioritizing GM as a source of variation.</p

    Metabolomics-Based Discovery of Diagnostic Biomarkers for Onchocerciasis

    Get PDF
    Onchocerciasis, caused by the filarial parasite Onchocerca volvulus, afflicts millions of people, causing such debilitating symptoms as blindness and acute dermatitis. There are no accurate, sensitive means of diagnosing O. volvulus infection. Clinical diagnostics are desperately needed in order to achieve the goals of controlling and eliminating onchocerciasis and neglected tropical diseases in general. In this study, a metabolomics approach is introduced for the discovery of small molecule biomarkers that can be used to diagnose O. volvulus infection. Blood samples from O. volvulus infected and uninfected individuals from different geographic regions were compared using liquid chromatography separation and mass spectrometry identification. Thousands of chromatographic mass features were statistically compared to discover 14 mass features that were significantly different between infected and uninfected individuals. Multivariate statistical analysis and machine learning algorithms demonstrated how these biomarkers could be used to differentiate between infected and uninfected individuals and indicate that the diagnostic may even be sensitive enough to assess the viability of worms. This study suggests a future potential of these biomarkers for use in a field-based onchocerciasis diagnostic and how such an approach could be expanded for the development of diagnostics for other neglected tropical diseases
    corecore