46 research outputs found

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    Mapping Analyte-Signal Relations in LC-MS Based Untargeted Metabolomics

    Get PDF
    The goal of untargeted metabolomics is to profile metabolism by measuring as many metabolites as possible. A major advantage of the untargeted approach is the detection of unexpected or unknown metabolites. These metabolites have chemical structures, metabolic pathways, or cellular functions that have not been previously described. Hence, they represent exciting opportunities to advance our understanding of biology. This beneficial approach, however, also adds considerable complexity to the analysis of metabolomics data - an individual signal cannot be readily identified as a unique metabolite. As such, a major challenge faced by the untargeted metabolomic workflow is extracting the analyte content from a dataset. Successful applications of metabolomics bypass this limitation by throwing away the 99% of the dataset that is not statistically altered between sample groups.1 This widely accepted approach to untargeted metabolomics is functional for a very narrow set of applications, but critically, it fails to provide a comprehensive view of metabolism

    Development of data processing methods for high resolution mass spectrometry-based metabolomics with an application to human liver transplantation

    Get PDF
    Direct Infusion (DI) Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry (MS) is becoming a popular measurement platform in metabolomics. This thesis aims to advance the data processing and analysis pipeline of the DI FT-ICR based metabolomics, and broaden its applicability to a clinical research. To meet the first objective, the issue of missing data that occur in a final data matrix containing metabolite relative abundances measured for each sample analysed, is addressed. The nature of these data and their effect on the subsequent data analyses are investigated. Eight common and/or easily accessible missing data estimation algorithms are examined and a three stage approach is proposed to aid the identification of the optimal one. Finally, a novel survival analysis approach is introduced and assessed as an alternative way of missing data treatment prior univariate analysis. To address the second objective, DI FT-ICR MS based metabolomics is assessed in terms of its applicability to research investigating metabolomic changes occurring in liver grafts throughout the human orthotopic liver transplantation (OLT). The feasibility of this approach to a clinical setting is validated and its potential to provide a wealth of novel metabolic information associated with OLT is demonstrated

    Metabolomics Data Processing and Data Analysis—Current Best Practices

    Get PDF
    Metabolomics data analysis strategies are central to transforming raw metabolomics data files into meaningful biochemical interpretations that answer biological questions or generate novel hypotheses. This book contains a variety of papers from a Special Issue around the theme “Best Practices in Metabolomics Data Analysis”. Reviews and strategies for the whole metabolomics pipeline are included, whereas key areas such as metabolite annotation and identification, compound and spectral databases and repositories, and statistical analysis are highlighted in various papers. Altogether, this book contains valuable information for researchers just starting in their metabolomics career as well as those that are more experienced and look for additional knowledge and best practice to complement key parts of their metabolomics workflows

    Lipid metabolite biomarkers in cardiovascular disease: Discovery and biomechanism translation from human studies

    Get PDF
    Lipids represent a valuable target for metabolomic studies since altered lipid metabolism is known to drive the pathological changes in cardiovascular disease (CVD). Metabolomic technologies give us the ability to measure thousands of metabolites providing us with a metabolic fingerprint of individual patients. Metabolomic studies in humans have supported previous findings into the pathomechanisms of CVD, namely atherosclerosis, apoptosis, inflammation, oxidative stress, and insulin resistance. The most widely studied classes of lipid metabolite biomarkers in CVD are phos-pholipids, sphingolipids/ceramides, glycolipids, cholesterol esters, fatty acids, and acylcarnitines. Technological advancements have enabled novel strategies to discover individual biomarkers or panels that may aid in the diagnosis and prognosis of CVD, with sphingolipids/ceramides as the most promising class of biomarkers thus far. In this review, application of metabolomic profiling for biomarker discovery to aid in the diagnosis and prognosis of CVD as well as metabolic abnormalities in CVD will be discussed with particular emphasis on lipid metabolites

    Lipid metabolite biomarkers in cardiovascular disease: Discovery and biomechanism translation from human studies

    Get PDF
    Lipids represent a valuable target for metabolomic studies since altered lipid metabolism is known to drive the pathological changes in cardiovascular disease (CVD). Metabolomic technologies give us the ability to measure thousands of metabolites providing us with a metabolic fingerprint of individual patients. Metabolomic studies in humans have supported previous findings into the pathomechanisms of CVD, namely atherosclerosis, apoptosis, inflammation, oxidative stress, and insulin resistance. The most widely studied classes of lipid metabolite biomarkers in CVD are phos-pholipids, sphingolipids/ceramides, glycolipids, cholesterol esters, fatty acids, and acylcarnitines. Technological advancements have enabled novel strategies to discover individual biomarkers or panels that may aid in the diagnosis and prognosis of CVD, with sphingolipids/ceramides as the most promising class of biomarkers thus far. In this review, application of metabolomic profiling for biomarker discovery to aid in the diagnosis and prognosis of CVD as well as metabolic abnormalities in CVD will be discussed with particular emphasis on lipid metabolites

    Lipidomics reveals similar changes in serum phospholipid signatures of overweight and obese pediatric subjects

    Get PDF
    Obesity is a public health problem and a risk factor for pathologies such type 2 diabetes mellitus, cardiovascular diseases, and nonalcoholic fatty liver disease. Given these clinical implications, there is a growing interest to understand the pathophysiological mechanism of obesity. Changes in lipid metabolism have been associated with obesity and obesity-related complications. However, changes in the lipid profile of obese children have been overlooked. In the present work, we analyzed the serum phospholipidome of overweight and obese children by HILIC-MS/MS and GC-MS. Using this approach, we have identified 165 lipid species belonging to the classes PC, PE, PS, PG, PI, LPC, and SM. The phospholipidome of overweight (OW) and obese (OB) children was significantly different from normal-weight children (control). Main differences were observed in the PI class that was less abundant in OW and OB children and some PS, PE, SM, and PC lipid species are upregulated in obese and overweight children. Although further studies are needed to clarify some association between phospholipid alterations and metabolic changes, our results highlight the alteration that occurs in the serum phospholipid profile in obesity in children.publishe

    Bayesian networks for omics data analysis

    Get PDF
    This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throughput technologies such as microarrays and mass spectrometry, and thus generate huge amounts of data. Experiments conducted using these technologies allow one to compare different states of a living cell, for example a healthy cell versus a cancer cell or the effect of food on cell condition, and at different levels. The tools needed to apply omics technologies, in particular microarrays, are often manufactured by different vendors and require separate storage and analysis software for the data generated by them. Moreover experiments conducted using different technologies cannot be analyzed simultaneously to answer a biological question. Chapter 3 presents MADMAX, our software system which supports storage and analysis of data from multiple microarray platforms. It consists of a vendor-independent database which is tightly coupled with vendor-specific analysis tools. Upcoming technologies like metabolomics, proteomics and high-throughput sequencing can easily be incorporated in this system. Once the data are stored in this system, one obviously wants to deduce a biological relevant meaning from these data and here statistical and machine learning techniques play a key role. The aim of such analysis is to search for relationships between entities of interest, such as genes, metabolites or proteins. One of the major goals of these techniques is to search for causal relationships rather than mere correlations. It is often emphasized in the literature that "correlation is not causation" because people tend to jump to conclusions by making inferences about causal relationships when they actually only see correlations. Statistics are often good in finding these correlations; techniques called linear regression and analysis of variance form the core of applied multivariate statistics. However, these techniques cannot find causal relationships, neither are they able to incorporate prior knowledge of the biological domain. Graphical models, a machine learning technique, on the other hand do not suffer from these limitations. Graphical models, a combination of graph theory, statistics and information science, are one of the most exciting things happening today in the field of machine learning applied to biological problems (see chapter 2 for a general introduction). This thesis deals with a special type of graphical models known as probabilistic graphical models, belief networks or Bayesian networks. The advantage of Bayesian networks over classical statistical techniques is that they allow the incorporation of background knowledge from a biological domain, and that analysis of data is intuitive as it is represented in the form of graphs (nodes and edges). Standard statistical techniques are good in describing the data but are not able to find non-linear relations whereas Bayesian networks allow future prediction and discovering nonlinear relations. Moreover, Bayesian networks allow hierarchical representation of data, which makes them particularly useful for representing biological data, since most biological processes are hierarchical by nature. Once we have such a causal graph made either by a computer program or constructed manually we can predict the effects of a certain entity by manipulating the state of other entities, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very difficult and CPU-expensive, and in such cases approximate methods are used. Chapter 4 demonstrates the use of Bayesian networks to determine the metabolic state of feeding and fasting mice to determine the effect of a high fat diet on gene expression. This chapter also shows how selection of genes based on key biological processes generates more informative results than standard statistical tests. In chapter 5 the use of Bayesian networks is shown on the combination of gene expression data and clinical parameters, to determine the effect of smoking on gene expression and which genes are responsible for the DNA damage and the raise in plasma cotinine levels of blood of a smoking population. This study was conducted at Maastricht University where 22 twin smokers were profiled. Chapter 6 presents the reconstruction of a key metabolic pathway which plays an important role in ripening of tomatoes, thus showing the versatility of the use of Bayesian networks in metabolomics data analysis. The general trend in research shows a flood of data emerging from sequencing and metabolomics experiments. This means that to perform data mining on these data one requires intelligent techniques that are computationally feasible and able to take the knowledge of experts into account to generate relevant results. Graphical models fit this paradigm well and we expect them to play a key role in mining the data generated from omics experiments. <br/

    Variable Selection Methods for Classification: Application to Metabolomics Data

    Get PDF
    Metabolomics is an emerging field, which focuses on the study of small molecules (metabolites) and their chemical processes. Metabolomics data are highly dimensional, with p>>n where p is the number of variables and n is the sample size. Variable selection is therefore a key step in metabolomics studies. There are three categories of variable selection, such as filter, wrapper and embedded methods. Common univariate filter methods such as the t-test and ANOVA (analysis of variance) have been often used in the literature to identify important metabolites for a given clinical problem. A challenge in metabolomics research is that metabolite variables tend to be highly correlated. Multivariate approaches that take into account the correlation among variables, such as PCA (principal component analysis), have been applied to reduce the dimensionality of metabolite datasets. The correlation-sharing t-test method (corT) is a filter method that also considers the correlation among variables, but to my knowledge it has only been applied to genomic data. Penalized regression, and in particular the embedded method Lasso, has also been applied for variable selection with the aim of minimising the problem of overfitting that often affects prediction models in this area. In this thesis I presented a literature review on variable selection methods and classification methods applied to metabolomics data. I proposed an extended version of the variable selection method corT, which I name adjusted correlation-sharing t-test (adjcorT). Simulation studies were carried out to compare the performance of several variable selection methods (T, corT, adjcorT and Lasso) using logistic regression for data classification. Simulations assumed a set of 200 variables of which 2 variables were discriminators. A range of sample sizes (n=50, 76, 100, 300, 500, 1000, 2000 and 20000) and of different correlation values among the discriminant variables (\rho=-0.8, -0.5, -0.2, 0, 0.2, 0.5, 0.8) were considered to explore the effect that sample size and correlation have on the classification accuracy of each method. These methods were also applied to metabolomics datasets, including data from patients with colorectal cancer (aimed at discriminating between non-cancer vs colorectal cancer groups, and healthy control vs adenoma groups) as well as, kidney disease and infant sepsis datasets. R code was developed to analyse the datasets. Cross validation, with data split into two sets (80% for training and 20% for validation) was used to compare the performance of the variable selection methods using classification accuracy, sensitivity, specificity and area under ROC. Results from the simulation studies indicate that for small sample sizes (n=50, 76), T, corT, adjcorT and Lasso often failed to select the two discriminatory variables. For example, for \rho=0.5 and n=50, only 3%, 12%, 11% and 0% of the times the two discriminatory variables were selected. Nevertheless, the detection rates for adjcorT and Lasso improved for negative strong correlations (Table 4.3). These results are consistent with the better performance in classification accuracy observed for adjcorT and Lasso for negative strong correlations ( -0.5 ≤ \rho < -1.0; Table 4.4). As the sample size increased towards n=300, all methods increased their ability to select the two discriminatory variables, with Lasso underperforming for positive strong correlations and corT underperforming for moderate and strong negative correlations. These differences can explain the dissimilarities observed across methods in classification accuracy for sample sizes n=300, 500 and 1000; with Lasso showing poorer performance than T, corT and adjcorT for positive strong correlations, and corT showing poorer performance than T, corT and adjcorT for moderate and strong negative correlations (Tables 4.5 and 4.6). As the sample size increases, T, adjcorT and Lasso offered a similar level of accuracy but corT still underperforms for moderate and strong negative correlations and larger sample sizes (Table 4.7). In the clinical applications, corT and adjcorT show a similar level of classification accuracy, possibly due to the positive correlation that exists among most metabolites. For non-cancer and cancer discrimination, the method T showed the worst classification accuracy followed by Lasso. Methods corT and adjcorT achieved the best level of discrimination although this was still low (AUC of 0.60; Table 5.3). For healthy control and adenoma discrimination however, methods corT and adjcorT showed the lowest AUC, followed by the T method. Lasso achieved the best level of discrimination, although this remained low (AUC of 0.65; Table 5.8). For the discrimination between bacterial and non-bacterial sepsis cases, Lasso exhibited a better performance that the other variable selection methods with 83.1% classification accuracy (Table 5.13). Lasso also offered the best level of discrimination between healthy controls and kidney disease (AUC=0.90, Table 5.21), although the four methods showed a comparable performance (AUCs=0.86 and 0.87 were achieved with the T and with the corT and adjcorT methods respectively). My work based on simulations shows that adjcorT offers a flexible approach for variable selection aimed at clinical classification, especially for datasets involving negative correlations between discriminators for medium and large samples where adjcorT consistently shows a better performance than corT. These findings were however not reproduced by the analyses on real data. I believe this is possibly due to the lack of negative correlations among metabolites in the datasets considered. Both adjcorT and corT are filter variable selection methods. Given that adjcorT showed a better performance compared to corT for negative correlations and a similar performance for positive correlations across all sample sizes investigated, adjcorT is expected to offer advantages compared to corT as a variable selection method for the analysis of some metabolomics data

    Lipid metabolite biomarkers in cardiovascular disease: discovery and biomechanism translation from human studies

    Get PDF
    Lipids represent a valuable target for metabolomic studies since altered lipid metabolism is known to drive the pathological changes in cardiovascular disease (CVD). Metabolomic technologies give us the ability to measure thousands of metabolites providing us with a metabolic fingerprint of individual patients. Metabolomic studies in humans have supported previous findings into the pathomechanisms of CVD, namely atherosclerosis, apoptosis, inflammation, oxidative stress, and insulin resistance. The most widely studied classes of lipid metabolite biomarkers in CVD are phospholipids, sphingolipids/ceramides, glycolipids, cholesterol esters, fatty acids, and acylcarnitines. Technological advancements have enabled novel strategies to discover individual biomarkers or panels that may aid in the diagnosis and prognosis of CVD, with sphingolipids/ceramides as the most promising class of biomarkers thus far. In this review, application of metabolomic profiling for biomarker discovery to aid in the diagnosis and prognosis of CVD as well as metabolic abnormalities in CVD will be discussed with particular emphasis on lipid metabolites
    corecore