6 research outputs found

    Model selection for metabolomics : predicting diagnosis of coronary artery disease using automated machine learning (AutoML)

    Get PDF
    MOTIVATION: Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES). RESULTS: We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes. AVAILABILITY AND IMPLEMENTATION: TPOT is freely available via http://epistasislab.github.io/tpot/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Characterizing the roles of changing population size and selection on the evolution of flux control in metabolic pathways

    No full text
    Abstract Background Understanding the genotype-phenotype map is fundamental to our understanding of genomes. Genes do not function independently, but rather as part of networks or pathways. In the case of metabolic pathways, flux through the pathway is an important next layer of biological organization up from the individual gene or protein. Flux control in metabolic pathways, reflecting the importance of mutation to individual enzyme genes, may be evolutionarily variable due to the role of mutation-selection-drift balance. The evolutionary stability of rate limiting steps and the patterns of inter-molecular co-evolution were evaluated in a simulated pathway with a system out of equilibrium due to fluctuating selection, population size, or positive directional selection, to contrast with those under stabilizing selection. Results Depending upon the underlying population genetic regime, fluctuating population size was found to increase the evolutionary stability of rate limiting steps in some scenarios. This result was linked to patterns of local adaptation of the population. Further, during positive directional selection, as with more complex mutational scenarios, an increase in the observation of inter-molecular co-evolution was observed. Conclusions Differences in patterns of evolution when systems are in and out of equilibrium, including during positive directional selection may lead to predictable differences in observed patterns for divergent evolutionary scenarios. In particular, this result might be harnessed to detect differences between compensatory processes and directional processes at the pathway level based upon evolutionary observations in individual proteins. Detecting functional shifts in pathways reflects an important milestone in predicting when changes in genotypes result in changes in phenotypes

    Additional file 1: Table S1. of Characterizing the roles of changing population size and selection on the evolution of flux control in metabolic pathways

    No full text
    This table shows the ratio of positive fitness change counts, negative fitness change counts, total positive fitness change, total negative fitness change and total fitness change per evolutionary simulation step. Table S2. This table shows the average fitness for the first 1000 generations of each simulation step, the average fitness for the second 1000 generations of simulation step and p-values of Mann-Whitney’s test comparing fitness values of the first and the second halves of the simulation step. Figure S1. The simplified pathway that was simulated is shown. This pathway contains features from glycolysis [26]. A constant concentration of compound A is converted to compound F and the steady state flux is measured. Figure S2. Schemes of the experiments with an explicit population and a fluctuating population size are shown. The schemes for experiments N1 (green), N2 (blue), N3 (yellow), N4 (red), N5 (purple), N6 (brown) are shown. Black lines correspond to the control experiments with population sizes 25, 50, 100, 150, and 225. Figure S3. Schemes of the experiments with a calculated fixation probability and with fluctuating population size. A. The schemes for experiments K1 (green), K2 (red), K3 (blue) are shown. B. The schemes for the experiments K3 (blue), K4 (yellow), K5 (purple) are shown. Black lines correspond to the control experiments with population size 100, 1000, 1,000,000. Figure S4. Schemes of the experiments with an explicit population and fluctuating asymptotic flux. S1 (green) and S2 (yellow). Black lines correspond to the control experiments with a set to 0.5, 1.0, 1.5, corresponding to flux amplitudes of 325, 650, and 975. (DOCX 9986 kb

    Additional file 1: Table S1. of Selection on metabolic pathway function in the presence of mutation-selection-drift balance leads to rate-limiting steps that are not evolutionarily stable

    No full text
    The initial values given to parameters in the system at the start of each evolutionary simulation where equilibrium is approached from above are shown. Table S2. The lengths of each enzyme, given in the number of amino acids, are shown. Table S3. The initial values given to kcat, kcatr, KM and KMr parameters in the system at the start of the evolutionary simulation when constrained with Haldane’s relationship are shown. Keq and ΔG0 for each reaction are also shown. Figure S1. The fitness value of the median individual demonstrating that the same point of mutation-selection balance is reached when simulations begin at a lower fitness. Figures S2–S4. The evolution of parameter values for the experiment which started from a lower fitness are shown. Figures S5–S19. The averaged median of parameters after the point of mutation-selection balance is shown. Figure S20. The rate of change in averaged median fitness across each of the simulations is shown for A) mutation only, B) selection on flux alone, C) selection on flux and against total expression cost, D) selection on flux and against a high concentration of a deleterious intermediate, and E) non-biological neutral mutation, selection on flux, and for the first reaction to be rate limiting. Blue denotes a positive rate of change and red denotes a negative rate of change. Figure S21. Average median fitness across each of the simulations is shown for A) mutation only, B) selection on flux alone, C) selection on flux and against total expression cost, D) selection on flux and against a high concentration of a deleterious intermediate, and E) non-biological neutral mutation, selection on flux, and for the first reaction to be rate limiting. Figures S22–S26. Complete linkage clustering of parameter values for each selective scheme are shown, resulting in the data in Fig. 4. (PDF 1185 kb
    corecore