The growing availability of genomic, transcriptomic, and metabolomic data has opened the door to the synthesis of multiple levels of information in biological research. As a consequence, there has been a push to analyze biological systems in a comprehensive manner through the integration of their interactions into mathematical models, with the process frequently referred to as “systems biology”. Despite the potential for this approach to greatly improve our knowledge of biological systems, the definition of mathematical relationships between different levels of information opens the door to diverse sources of error, requiring precise, unbiased quantification as well as robust validation methods. Failure to account for differences in uncertainty across multiple levels of data analysis may cause errors to drown out any useful outcomes of the synthesis. The application of a systems biology approach has been particularly important in metabolic modeling. There has been a concentrated effort to build models directly from genomic data and to incorporate as much of the metabolome as possible in the analysis. Metabolomic data collection has been expanded through the recent use of hydrogen Nuclear Magnetic Resonance (1H-NMR) spectroscopy for cell culture monitoring. However, the combination of uncertainty from model construction and measurement error from NMR (or other means of metabolomic) analysis complicates data interpretation. This thesis establishes the precision and accuracy of NMR spectroscopy in the context of cell cultivation while developing a methodology for assessing model error in Metabolic Flux Analysis (MFA).
The analysis of cell culture media via NMR has been made possible by the development of specialized software for the “deconvolution” of complex spectra, however, the process is semi-qualitative. A human “profiler” is required to manually fit idealized peaks from a compound library to an observed spectra, where the quality of fit is often subject to considerable interpretation. Work presented in this thesis establishes baseline accuracy as approximately 2%-10% of the theoretical mean, with a relative standard deviation of 1.5% to 3%. Higher variabilities were associated primarily with profiling error, while lower
variabilities were due in part to tube insertion (and the steps leading up to spectra acquisition). Although a human profiler contributed to overall uncertainty, the net impact did not make the deconvolution process prohibitively imprecise. Analysis was then expanded to consider solutions that are more representative of cell culture supernatant. The combination of metabolites at different concentration levels was efficiently represented by a Plackett-Burman experiment. The orthogonality of this design ensured that every level of metabolite concentration was combined with an equal number of high and low concentrations of all other variable metabolites, providing a worst-case scenario for variance estimation. Analysis of media-like mixtures revealed a median error and standard deviation to be approximately 10%, although estimating low metabolite concentrations resulted
in a considerable loss of accuracy and precision in the presence of resonance overlap. Furthermore, an iterative regression process identified a number of cases where an increase in the concentration of one metabolite resulted in increased quantification error of another. More importantly, the analysis established a general methodology for estimating the quantification variability of media-specific metabolite concentrations.
Subsequent application of NMR analysis to time-course data from cell cultivation revealed correlated deviations from calculated trends. Similar deviations were observed for multiple (chemically) unrelated metabolites, amounting to approximately 1%-10% of the metabolite’s concentration. The nature of these deviations suggested the cause to be inaccuracies in internal standard addition or quantification, resulting in a skew of all quantified metabolite concentrations within a sample by the same relative amount. Error magnitude was estimated by calculating the median relative deviation from a smoothing fit for all compounds at a give timepoint. A metabolite time-course simulation was developed to determine the frequency and magnitude of such deviations arising from typical measurement error (without added bias from incorrect internal standard addition). Multiple smoothing functions were tested on simulated time-courses and cubic spline regression was found to minimize the median relative deviation from measurement noise to approximately 2.5%. Based on these results, an iterative smoothing correction method was implemented to identify and correct median deviations greater than 2.5%, with both simulation and correction
code released as the “metcourse” package for the R programming language.
Finally, a t-test validation method was developed to assess the impact of measurement and model error on MFA, with a Chinese hamster ovary (CHO) cell model chosen as a case study. The standard MFA formulation was recast as a generalized least squares (GLS) problem, with calculated fluxes subject to a t-significance test. NMR data was collected for a CHO cell bioreactor run, with another set of data simulated directly from the model and perturbed by observed measurement error. The frequency of rejected fluxes in the simulated data (free of model error) was attributed to measurement uncertainty alone. The rejection of fluxes calculated from observed data as non-significant that were not rejected in the simulated data was attributed to a lack of model fit i.e. model error. Applying this method to the observed data revealed a considerable level of error that was not identified by traditional χ2 validation. Further simulation was carried out to assess the impact of measurement error and model structure, both of which were found to have a dramatic impact on statistical significance and calculation error that has yet to be addressed in the context of MFA